Equity Bank Kenya

Banking + 2 more

SRE Engineer

Job details

Contract Type

Description
Qualifications

KEY TECHNICAL SKILLS & COMPETENCIES

  • Elasticsearch, Logstash, Kibana (ELK Stack)
  • Microsoft Azure
  • Unix / Linux and Shell Scripting
  • SQL and database concepts
  • Monitoring and observability tools
  • Strong analytical, problem‑solving, and documentation skills

EXPERIENCE REQUIREMENTS

  • Minimum 2 years’ experience in a Site Reliability Engineering, DevOps, or Production Support role
  • Mandatory hands‑on experience with ELK Stack
  • Experience supporting banking or enterprise‑scale applications

ACADEMIC QUALIFICATIONS & CERTIFICATIONS

  • Bachelor’s degree in science, Engineering, Information Technology, or a related field
  • Nice to have: ELK, Azure, or other relevant cloud/observability certifications


Responsibilities


. ELK Engineering and Log Analytics

  • Install, configure, and maintain ELK stack components (Elasticsearch, Logstash, Kibana, Beats) across environments.
  • Design efficient dashboards, graphs, and visualizations that translate application logs into business‑readable insights.
  • Analyze application logs to identify trends, risks, and incidents affecting system performance and availability.
  • Develop customized reports, bar charts, and pie charts to support operational and business decision‑making.
  • Implement ELK‑triggered auto‑healing and remediation scripts to detect and resolve incidents proactively.

2. Toil Reduction and Automation

  • Identify repetitive, manual, and reactive operational tasks and eliminate them through automation.
  • Develop scripts and tools using languages such as Python, Bash, or Go to automate system maintenance and operational workflows.
  • Implement Infrastructure as Code (IaC) using tools such as Terraform or Ansible to ensure consistent, repeatable infrastructure provisioning.
  • Design and implement self‑healing systems capable of automatic recovery from common failures without human intervention.

3. Monitoring, Alerting, and Observability

  • Define and implement Service Level Indicators (SLIs), Service Level Objectives (SLOs), and Service Level Agreements (SLAs) in collaboration with business and development teams.
  • Build and maintain robust monitoring, logging, and observability solutions using tools such as ELK, Prometheus, Grafana, or equivalent platforms.
  • Configure intelligent, actionable alerts that minimize noise and false positives while ensuring rapid incident detection.
  • Continuously improve monitoring coverage and system visibility to support proactive operations.

4. Incident Response and Management

  • Participate in on‑call rotations to respond to critical system alerts and production incidents.
  • Diagnose, mitigate, and resolve incidents to restore services within agreed SLAs.
  • Conduct blameless post‑incident reviews to identify root causes and define preventative actions.
  • Develop and maintain runbooks and playbooks for common incident scenarios to improve response time and consistency.

5. Capacity Planning and Performance Optimization

  • Analyze historical system usage and trends to forecast future capacity requirements.
  • Perform system and database performance tuning in collaboration with development teams.
  • Conduct load and stress testing to identify bottlenecks before they impact production systems.
  • Ensure systems are cost‑efficient, scalable, and capable of supporting business growth.

6. Cross‑Functional Collaboration

  • Work closely with software development teams during solution design to ensure reliability, scalability, and operational readiness.
  • Promote a DevOps and SRE culture through shared ownership of system reliability (“You Build It, You Run It”).
  • Share knowledge, best practices, and documentation to uplift operational maturity across teams.



Start hiring with Fuzu

Recruit better talent faster - on your own or with our support.

Explore recruitment platform

Don’t miss your chance to work at Equity Bank Kenya . Enter your email to start your application now