Moniepoint Incorporated

Finance & FinTech

Site Reliability Engineer

Job details

Contract Type

Description
  • Minimum of 3 years of experience supporting enterprise applications as an SRE or similar role with proficiency in writing code in Java, Go or Python

  • Good understanding of distributed systems concepts, microservices architecture and software design patterns.

  • Hands-on experience with Kubernetes. You have managed applications on a major cloud provider (GCP, AWS, or Azure), and can troubleshoot common container issues.

  • Experience setting up dashboards in Grafana and using APM tools like Datadog, New Relic, Signoz.You have a Solid understanding of metrics, logs, and traces.


Responsibilities
  • Participate in on-call rotations to detect and triage service and reliability issues across all environments. Act as the Incident Commander during major incidents: initiating war room or bridge calls, coordinating cross-functional teams, providing timely and clear status updates to all stakeholders.

  • Create and maintain meaningful dashboards and alerts. Work with development teams to instrument their code to ensure visibility.

  • Develop automation to eliminate manual and repetitive operational tasks (toil) related to reliability across both applications and infrastructure.

  • Implement and track Service Level Indicators (SLIs) and Service Level Objectives (SLOs) defined by the engineering leadership.

  • Investigate and resolve customer complaints escalated beyond L1 and L2 support, especially those involving performance, reliability, or complex system behavior.


Start hiring with Fuzu

Recruit better talent faster - on your own or with our support.

Explore recruitment platform

Don’t miss your chance to work at Moniepoint Incorporated. Enter your email to start your application now