More than 100,000 people have found their dream job through Fuzu.

CLOSED FOR APPLICATIONS

Site Reliability Engineer

Closing: Apr 25, 2024

This position has expired

Published: Mar 27, 2024 (31 days ago)

Job Requirements

Education:

Work experience:

Language skills:

Job Summary

Contract Type:

Sign up to view job details.

The Site Reliability Engineer is a Product Team member in the Infrastructure team. You will engineer, manage, and maintain our hosting platform and infrastructure, allowing secure and scalable hosting. You’ll also be central to the future development of our infrastructure services and support both internal teams and external partners. You will work with sys admins, DevOps engineers, and users of the CHT directly – this includes building new features, ensuring support, fixing bugs, testing applications, and ensuring we’re working on the most impactful things. You will work with a distributed team based around the world, and you will report to an Engineering Manager.


Requirements

  • Good understanding of DevOps concepts and best practices
  • Three years of experience with Kubernetes, with concrete results
  • Experience in one or more programming languages, preferably Javascript
  • Fluent in English and experience using it in a remote work environment, e.g., over video and text chats
  • Ability to work in a remote and culturally diverse team
  • Detective Skills: Terrific at troubleshooting and debugging.
  • Problem-solving skills
  • Linux system administration, monitoring, security best practices, networking, and logging.
  • You must have valid authorization to work in the country that you are based without requiring sponsorship.
  • Travel Requirement: Candidates should be aware that this role may entail up to 25% travel, including both domestic and international travel to various locations. Most of these locations are in East Africa, West Africa, or Nepal.
  • We will be reviewing applications continuously and encourage interested candidates to apply as early as possible.


Responsibilities

The Site Reliability Engineer is a Product Team member in the Infrastructure team. You will engineer, manage, and maintain our hosting platform and infrastructure, allowing secure and scalable hosting. You’ll also be central to the future development of our infrastructure services and support both internal teams and external partners. You will work with sys admins, DevOps engineers, and users of the CHT directly – this includes building new features, ensuring support, fixing bugs, testing applications, and ensuring we’re working on the most impactful things. You will work with a distributed team based around the world, and you will report to an Engineering Manager.


Requirements

  • Good understanding of DevOps concepts and best practices
  • Three years of experience with Kubernetes, with concrete results
  • Experience in one or more programming languages, preferably Javascript
  • Fluent in English and experience using it in a remote work environment, e.g., over video and text chats
  • Ability to work in a remote and culturally diverse team
  • Detective Skills: Terrific at troubleshooting and debugging.
  • Problem-solving skills
  • Linux system administration, monitoring, security best practices, networking, and logging.
  • You must have valid authorization to work in the country that you are based without requiring sponsorship.
  • Travel Requirement: Candidates should be aware that this role may entail up to 25% travel, including both domestic and international travel to various locations. Most of these locations are in East Africa, West Africa, or Nepal.
  • We will be reviewing applications continuously and encourage interested candidates to apply as early as possible.


  • Proactive Monitoring and Team Support
  • Proactively monitor performance and reliability of production Medic systems
  • Produce status pages consumable by non-technical users
  • Consult on technical needs for larger-scale deployments, including local hosting, scalability, etc
  • Provide remote troubleshooting support to active deployments as needed
  • Prioritize urgent troubleshooting problems in live instances
  • Identify possible production problems by checking through or reviewing the issues that have been reported
  • Follow up and investigate questions asked on Slack channels and the CHT forum
  • Keeping in contact with Core Devs and QA teams
  • Provide technical information, explain processes, clarify interactions when requested and ensure proper documentation.
  • Manage upgrades and upgrade processes on production instances.
  • Automate deployments to increase testability and reliability.
  • Automate deployment monitoring and alerting
  • Support scaling – Proactively seek new technologies or implementations that solve current problems better or more efficiently
  • Troubleshooting – Prioritize and provide remote troubleshooting support to active deployments as needed.
  • Documentation – Write technical information, explain processes, clarify interactions when requested, and ensure proper documentation.
  • Support shifts—Work dedicated support tasks (not on-call) once every three weeks, primarily assisting other internal teams or external partners.


Applications submitted via Fuzu have 32% higher chance of getting shortlisted.

Don’t miss your chance to work at Medic Mobile . Enter your email to start your application now