Deimos
Senior Site Reliability Engineer
Nairobi • Kenya
Closed for applications
Deimos
Senior Platform Engineer
Nairobi • Kenya
Closed for applications
MTN Group
Information Systems Auditor
Kampala • Uganda
Closed for applications
KEDA(KENYA) CERAMICS CO., LTD
Food & Bevarage Assistant
Kisumu • Kenya
Closed for applications
ALS Limited
ERJ135/145 Captains
Nairobi • Kenya
Closed for applications
ALS Limited
DHC8 Non-type Rated Captains
Nairobi • Kenya
Closed for applications
ALS Limited
C208 Captains
Nairobi • Kenya
Closed for applications
Tropical Bank Limited (Uganda)
Relationship Manager Bancassurance
Kampala • Uganda
Closed for applications
DFCU Bank
Sustainability Governance Manager
Kampala • Uganda
Closed for applications

Get personalised job alerts directly to your inbox!
Consolidated Bank of Kenya
Officer-Digital Banking
Nairobi • Kenya
Closed for applications
Top cities with open vacancies
Jobs in Nairobi, Jobs in Lagos, Jobs in Uyo, Jobs in Kampala, Jobs in Abuja, Jobs in Migori, Jobs in Maiduguri, Jobs in Thika, Jobs in Kilifi, Jobs in Port Harcourt, Jobs in Apapa, Jobs in Ondo, Jobs in Ikoyi, Jobs in Abeokuta, Jobs in Ibadan, Jobs in Lamu, Jobs in Kisumu, Jobs in Mombasa, Jobs in Umuahia, Jobs in EldoretCompanies hiring now
Federal Polytechnic, Ukana, Mama Ngina University College (MNUC), Media max networks, Rongo University (RU), Venite UniversityProfession (Mid-level)
Accounting, finance, banking, insurance,Administrative, clerical,Agriculture, fishing, forestry, wildlife,Business, strategic management,Construction,Customer support, client care,Design, arts,Electrical engineering,Energy, power,Engineering, architecture,Food, nutrition,General management, leadership,Government, community development, public services,Human resources,Information technology, software development, data,Installation, maintenance, repair,Legal,Manufacturing, operations, quality,Mechanical engineering,Media, communications, languages,Medical, health,Project, program management,Research, academy,Restaurant, hospitality, travel,Sales, marketing, promotion,Security,Skilled, manual labor,Sports, beauty, wellbeing,Teaching, training,Telecommunications,Transportation, logistics, driving,
Industry (Mid-level)
Aeronautics,Agriculture, fishing, forestry,Arts, design,Automotive,Banking, microfinance, insurance,Communications, media, radio, tv,Computers, software development and services,Construction, renovation, maintenance,Consulting, business support, auditing,Data/Research,Education, academic,Electronics,Energy, utilities, environment,Engineering, architecture,Entertainment, events,Finance & FinTech,Financial Services,Governmental,Health care, medical,Housekeeping, maintenance,Human resources, talent development, recruiting,Legal, accounting,Manufacturing,Marketing, advertising,Non-profit, social work,Outsourcing, leasing,Real estate,Restaurant, hospitality, travel,Retail, wholesale, FMCG,Security,Telecommunications,Transportation, logistics, storage,
© Fuzu Ltd
Deimos
Computers + 1 more
Description
Requirements
- Bachelor's degree in Computer Science, Information Technology, or a related field.
- 5+ years of experience in Software Engineering, SRE, DevOps, or Platform Engineering, with demonstrable ownership of reliability standards at a team or company level.
- Strong coding fluency: Proficiency in Python (or similar) with the ability to read, understand, reason about, and write production-grade automation code.
- Cloud & IaC: Hands-on experience with AWS, and a solid understanding of Infrastructure as Code (Terraform or CloudFormation).
- Deep Observability Knowledge: Demonstrable experience with monitoring tools (DataDog, Prometheus, ELK stack). Strong understanding of SRE concepts including Golden Signals, high-cardinality data handling, and error budget mathematics.
- Systems Thinking: Strong grasp of designing for scale and resilience, including graceful failure, circuit breaking, connection pooling, and multi-AZ deployments.
- Proven ability to define and drive reliability standards across multiple teams and drive a blameless post-mortem culture.
Responsibilities
- Enablement & RelOps Culture
- Implement the Observability Ladder: Guide teams from basic monitoring to high-signal metric tracking. Work with product teams to define SLAs, SLIs, and SLOs, and build dashboards that track specific error budgets.
- Empower Product Teams: Build frameworks and deployment tooling (e.g., CI/CD, internal tooling integrations) that allow teams to make data-driven decisions on deployment safety and automate rollbacks when error budgets are depleted.
- Champion Reliability: Drive a blameless post-mortem culture focused on actionable takeaways, system improvements, and measurable metrics (MTBF, MTTR).
- Standardised Alerting & On-Call: Continuously improve company-wide alerting and on-call frameworks to reduce alert fatigue, ensuring alerts are highly actionable and symptom-based.
- Disaster Recovery: Drive evolution of DR strategies from manual processes into fully automated runbooks-as-code, allowing teams to prove and improve service recoverability through autonomous, evidence-based testing.
- Eliminate Toil: Develop systems, automations, and tooling for pre- and post-deployment verification, ensuring our hands-off reliability vision becomes a production reality, via Python (or similar).
- Reliability-as-Code: Lead the drive to manage our entire reliability suite through IaC. Use Terraform to architect, deploy, and configure our observability stack including ELK, Grafana, Loki, Prometheus, and Tracing.
- Implement the Observability Ladder: Guide teams from basic monitoring to high-signal metric tracking. Work with product teams to define SLAs, SLIs, and SLOs, and build dashboards that track specific error budgets.
Start hiring with Fuzu
Recruit better talent faster - on your own or with our support.
Explore recruitment platformJob search tips from Fuzu
Selected articles on cover letters, CV structure, and interview preparation.