Financial Services

Senior Site Reliability Engineer

Location

Contract Type

Minimum of 4 years of experience in SRE or Backend Engineering with a strong ability to write clean, performant, and tested code in Java, Go, Rust, or Python.
Deep understanding of distributed systems architecture and design patterns. You possess a strong command of microservices fundamentals, event-driven architectures, and the underlying principles required to build systems that scale.
Extensive experience with Google Cloud Platform (GCP) or similar cloud providers (AWS/Azure). You are proficient in running production workloads on Kubernetes (GKE/EKS) and troubleshooting cluster/infrastructure issues.

Participate in on-call rotations as the primary technical lead. Act as the Incident Commander during major severity incidents: initiating war rooms, coordinating cross-functional teams, and providing clear status updates.
Instrument code to expose high-cardinality metrics and distributed traces. Collaboratively define, measure, and defend Service Level Objectives (SLOs) and Error Budgets with product owners.
Write high-quality, production-ready code (in Java, Go, or Python) to build internal tooling, automation platforms, and self-healing mechanisms that eliminate manual operator intervention.
Partner with Product Engineering teams during the design phase to ensure new services are built with reliability, scalability, and observability patterns (circuit breakers, rate limiting, backpressure, fallback strategies) from day one.
Analyze system performance and traffic patterns to model future capacity needs. Conduct load testing and chaos engineering experiments to verify system resilience under failure conditions.

Recruit better talent faster - on your own or with our support.

Job search tips from Fuzu

Selected articles on cover letters, CV structure, and interview preparation.