Information technology, software development, data Jobs

13 jobs found

Equity Bank Kenya

SRE Engineer

Nairobi

Kenya

Equity Bank Kenya

Devops Engineer

Nairobi

Kenya

CIC Insurance Group

Forensic Investigations Officer

Engineer

Kenya

Madison Insurance Group

Information Systems Auditor

Nairobi

Kenya

CIC Insurance Group

Head of Technology Reliability

Engineer

Kenya

CIC Insurance Group

Group Head of IT

Engineer

Kenya

Co-operative Bank

SOA Support Engineer

Nairobi

Kenya

Bank of Uganda

Forensics Laboratory Officer

Kampala

Uganda

Bank of Uganda

Cyber Crime & Forensics Officer

Kampala

Uganda

Get personalised job alerts directly to your inbox!

Centenary Bank (Uganda)

Business Intelligence Analyst

Kampala

Uganda

Country / Region

Seniority (Information technology, software development, data, Banking, microfinance, insurance)

© Fuzu Ltd

Equity Bank Kenya

Banking + 2 more

SRE Engineer

Job details

Contract Type

Description
Qualifications

KEY TECHNICAL SKILLS & COMPETENCIES

  • Elasticsearch, Logstash, Kibana (ELK Stack)
  • Microsoft Azure
  • Unix / Linux and Shell Scripting
  • SQL and database concepts
  • Monitoring and observability tools
  • Strong analytical, problem‑solving, and documentation skills

EXPERIENCE REQUIREMENTS

  • Minimum 2 years’ experience in a Site Reliability Engineering, DevOps, or Production Support role
  • Mandatory hands‑on experience with ELK Stack
  • Experience supporting banking or enterprise‑scale applications

ACADEMIC QUALIFICATIONS & CERTIFICATIONS

  • Bachelor’s degree in science, Engineering, Information Technology, or a related field
  • Nice to have: ELK, Azure, or other relevant cloud/observability certifications


Responsibilities


. ELK Engineering and Log Analytics

  • Install, configure, and maintain ELK stack components (Elasticsearch, Logstash, Kibana, Beats) across environments.
  • Design efficient dashboards, graphs, and visualizations that translate application logs into business‑readable insights.
  • Analyze application logs to identify trends, risks, and incidents affecting system performance and availability.
  • Develop customized reports, bar charts, and pie charts to support operational and business decision‑making.
  • Implement ELK‑triggered auto‑healing and remediation scripts to detect and resolve incidents proactively.

2. Toil Reduction and Automation

  • Identify repetitive, manual, and reactive operational tasks and eliminate them through automation.
  • Develop scripts and tools using languages such as Python, Bash, or Go to automate system maintenance and operational workflows.
  • Implement Infrastructure as Code (IaC) using tools such as Terraform or Ansible to ensure consistent, repeatable infrastructure provisioning.
  • Design and implement self‑healing systems capable of automatic recovery from common failures without human intervention.

3. Monitoring, Alerting, and Observability

  • Define and implement Service Level Indicators (SLIs), Service Level Objectives (SLOs), and Service Level Agreements (SLAs) in collaboration with business and development teams.
  • Build and maintain robust monitoring, logging, and observability solutions using tools such as ELK, Prometheus, Grafana, or equivalent platforms.
  • Configure intelligent, actionable alerts that minimize noise and false positives while ensuring rapid incident detection.
  • Continuously improve monitoring coverage and system visibility to support proactive operations.

4. Incident Response and Management

  • Participate in on‑call rotations to respond to critical system alerts and production incidents.
  • Diagnose, mitigate, and resolve incidents to restore services within agreed SLAs.
  • Conduct blameless post‑incident reviews to identify root causes and define preventative actions.
  • Develop and maintain runbooks and playbooks for common incident scenarios to improve response time and consistency.

5. Capacity Planning and Performance Optimization

  • Analyze historical system usage and trends to forecast future capacity requirements.
  • Perform system and database performance tuning in collaboration with development teams.
  • Conduct load and stress testing to identify bottlenecks before they impact production systems.
  • Ensure systems are cost‑efficient, scalable, and capable of supporting business growth.

6. Cross‑Functional Collaboration

  • Work closely with software development teams during solution design to ensure reliability, scalability, and operational readiness.
  • Promote a DevOps and SRE culture through shared ownership of system reliability (“You Build It, You Run It”).
  • Share knowledge, best practices, and documentation to uplift operational maturity across teams.



Start hiring with Fuzu

Recruit better talent faster - on your own or with our support.

Explore recruitment platform