More than 100,000 people have found their dream job through Fuzu.

Site Reliability Engineer, Observability Platform

Closing: May 31, 2023

1 day remaining

Published: May 24, 2023 (6 days ago)

Job Requirements

Education:

Work experience:

Language skills:

Job Summary

Contract Type:

Sign up to view job details.

You may be a fit to this role if you have some of these inclinations:

  • Experience with Kubernetes deployment and management
  • Experience with Elastic Cloud, Loki, Fluentd, Promtail or other logging systems tools
  • Experience with Promethus and/or Thanos deployment management
  • Think about systems: edge cases, failure modes, behaviors, specific implementations.
  • Experience with or exposure to Bigquery and general cloud providers such as GCP and AWS
  • Know your way around Linux and the Unix Shell.
  • Know what is the use of configuration management systems like Terraform, Chef and/or Ansible.
  • Have an urge to collaborate and communicate asynchronously.
  • Have an urge to document all the things so you don’t need to learn the same thing twice.
  • Have an enthusiastic, go-for-it attitude. When you see something broken, you can’t help but fix it.
  • Have an urge for delivering quickly and effectively, and iterating fast.
  • Share our values, and work in accordance with those values.
  • Ability to use GitLab

Projects you could work on:

  • Coding infrastructure automation with Chef, Ansible, Terraform, and GitLab CI/CD
  • Improving our Prometheus Monitoring or building new metrics
  • Helping release managers deploy and fix new versions of GitLab-EE.
  • Plan, prepare for, and execute the migration of GitLab.com from virtual machines running on Google Cloud to cloud-native container-based deployments with Kubernetes using Google Kubernetes Engine.
  • Develop a relationship with a product group, define their SLAs, share GitLab.com data on those SLAs and improve their reliability


Responsibilities

You may be a fit to this role if you have some of these inclinations:

  • Experience with Kubernetes deployment and management
  • Experience with Elastic Cloud, Loki, Fluentd, Promtail or other logging systems tools
  • Experience with Promethus and/or Thanos deployment management
  • Think about systems: edge cases, failure modes, behaviors, specific implementations.
  • Experience with or exposure to Bigquery and general cloud providers such as GCP and AWS
  • Know your way around Linux and the Unix Shell.
  • Know what is the use of configuration management systems like Terraform, Chef and/or Ansible.
  • Have an urge to collaborate and communicate asynchronously.
  • Have an urge to document all the things so you don’t need to learn the same thing twice.
  • Have an enthusiastic, go-for-it attitude. When you see something broken, you can’t help but fix it.
  • Have an urge for delivering quickly and effectively, and iterating fast.
  • Share our values, and work in accordance with those values.
  • Ability to use GitLab

Projects you could work on:

  • Coding infrastructure automation with Chef, Ansible, Terraform, and GitLab CI/CD
  • Improving our Prometheus Monitoring or building new metrics
  • Helping release managers deploy and fix new versions of GitLab-EE.
  • Plan, prepare for, and execute the migration of GitLab.com from virtual machines running on Google Cloud to cloud-native container-based deployments with Kubernetes using Google Kubernetes Engine.
  • Develop a relationship with a product group, define their SLAs, share GitLab.com data on those SLAs and improve their reliability


  • Be on an on-call (PagerDuty) rotation to respond to incidents that impact GitLab.com availability, and provide support for service engineers with customer incidents.
  • Use your on-call shift to prevent incidents from ever happening.
  • Run our infrastructure with Chef, Ansible, Terraform, GitLab CI/CD, and Kubernetes.
  • Build monitoring that alerts on symptoms rather than on outages.
  • Document every action so your findings turn into repeatable actions and then into automation.
  • Use the GitLab product to run GitLab.com as a first resort and improve the product as much as possible
  • Improve operational processes (such as deployments and upgrades) to make them as boring as possible.
  • Design, build and maintain core infrastructure that enables GitLab scaling to support hundreds of thousands of concurrent users.
  • Debug production issues across services and levels of the stack.
  • Plan the growth of GitLab’s infrastructure.


Applications submitted via Fuzu have 32% higher chance of getting shortlisted.

Don’t miss your chance to work at GitLab. Enter your email to start your application now