InterIntel Technologies

Computers + 1 more

Site Reliabity Engineer Intern

Closed for applications
Job details

Contract Type

Description

Required Knowledge, Qualification and Experience

  • Bachelor's Degree in Computer Science, Information Technology, or a related field.
  • Some exposure in Kubernetes and Cloud networking.
  • some experience with monitoring and observability tools.
  • Good exposure managing production systems in cloud environments.
  • Some exposure in implementing and managing CI/CD pipelines and utilizing tools like Jenkins, GitLab CI/CD, or equivalent.
  • Some exposure with cloud platforms (AWS, Azure, Google Cloud) and containerization tools like Docker and Kubernetes.
  • Basic hands-on exposure to monitoring and metrics systems such as Prometheus.
  • Basic familiarity with dashboarding and visualization tools such as Grafana. Foundational understanding of log aggregation systems such as Loki.
  • Familiarity with Linux environments and basic system commands. Exposure to scripting concepts using Python, Bash, or similar languages
  • Foundational knowledge of Artificial Intelligence (AI) and good exposure with Al agents; relevant certifications in Al or related disciplines will be an added advantage.



Send resume and portfolio with subject SITE RELIABITY ENGINEER INTERN to the Emil provided.


Responsibilities
  • Assist in design, implement, and continuously improve system reliability, availability, and performance by assisting in defining and monitoring SLIS,
  • SLOS, and error budgets across all assigned platforms.
  • Support in building and managing a robust monitoring and observability framework using Prometheus, Grafana, and Loki to track latency, traffic, errors, system health, and user impact.
  • Assist in automating infrastructure provisioning, scaling, and configuration management using Infrastructure as Code principles with Terraform and Kubernetes to ensure consistency, scalability, and disaster recovery readiness.
  • Participate in incident response processes, including detection, escalation, resolution, communication, and conducting blameless postmortems to prevent recurrence.
  • Assist in reduce manual operational workload through automation, scripting, and process optimization to improve efficiency and release velocity.
  • Support in ensuring high availability and performance of business- critical systems.
  • Collaborate with Engineering, Product, and DevOps teams to assist in improving deployment safety, capacity planning, cost optimization, and system scalability.
  • Support in ensuring high availability and performance of business- critical systems.
  • Assist in establishing alerting strategies and reliability standards that minimize alert fatigue while ensuring rapid detection and resolution of production issues.


Start hiring with Fuzu

Recruit better talent faster - on your own or with our support.

Explore recruitment platform

Don’t miss your chance to work at InterIntel Technologies. Enter your email to start your application now