IT Event Management Engineer
Advocacy, Reporting and Data Intern
Data Analyst Associate
Environment Statistics and Analysis
Senior .Net Developer
Data System And Analysis Consultant, Pme Kenya, Nairobi, 7 Months, Remote, Open Only For National Consultants #592756
Senior React Developer
Senior Software Engineer (Remote)
Senior Advisor / Software Developer / Web Application Developer

Get personalised job alerts directly to your inbox!
Data Engineer
Companies hiring now
Food For Education, International Rescue Committee, Rainforest Alliance, UNEP, World VisionProfession (Non-profit, social work)
Industry (Information technology, software development, data)
Seniority (Information technology, software development, data, Non-profit, social work)
© Fuzu Ltd
Non-profit + 1 more
Description
The IT Event Management Engineer is responsible for designing, implementing, and continuously improving global event detection and alerting systems that ensure proactive identification and resolution of IT issues before they impact users or services. This position operates at the intersection of engineering, automation, and service management, driving reliability, consistency, and operational excellence across hybrid cloud and on-prem environments.The role is foundational to achieving a proactive, data-driven IT operations model—helping transition the organization from reactive firefighting to predictive service assurance
Education & Certifications
- Bachelor’s degree in Information Technology, Computer Science, Engineering, or related field
- ITIL Foundation (minimum); ITIL Intermediate or ITIL 4 Managing Professional is an advantage
- Relevant certifications in cloud platforms (Azure, AWS) or monitoring tools are desirable
Experience
- 5+ years’ experience in IT Operations, Monitoring, or Event Management roles
- Proven experience working with enterprise monitoring and event management tools (e.g., OpsBridge, Azure Monitor, AWS CloudWatch, Site24x7)
- Experience integrating monitoring tools with ITSM platforms for automated incident management
- Hands-on experience in automation and scripting (e.g., PowerShell, Python, Ansible)
- Exposure to hybrid environments (cloud + on-prem infrastructure)
Responsibilities
1. Event Detection, Ingestion & Correlation
• Design and maintain event ingestion pipelines from multiple monitoring sources (e.g., Azure Monitor, AWS CloudWatch, network devices, applications, SaaS systems).
• Develop correlation logic and rules to identify related alerts and minimize redundant or noisy notifications.
• Maintain event taxonomies and classification standards to ensure consistent event tagging, severity, and categorization across systems.
2. Automation, Orchestration & Remediation
• Build and maintain automation scripts and workflows to automatically detect and remediate known issues (e.g., restarting services, clearing caches, resizing disks).
• Integrate event management systems with ITSM platforms (e.g., ServiceNow, SMAX) to auto-create and route incidents with contextual data.
• Participate in AIOps initiatives—leveraging predictive analytics and machine learning models to forecast incidents and anomalies.
3. Standardization, Templates & Documentation
• Develop standard operating procedures (SOPs), runbooks, and knowledge articles for consistent event triage and escalation processes.
• Create event configuration templates (e.g., for threshold settings, escalation rules, integration blueprints) to ensure monitoring practices are repeatable and scalable.
• Maintain a Monitoring and Event Management Playbook outlining governance, workflows, and automation frameworks.
• Document integration patterns, naming conventions, and API schema mappings to enable faster onboarding of new systems.
• Ensure all documentation is version-controlled, accessible via Confluence or SharePoint, and updated as systems evolve.
4. Operational Effectiveness & Continuous Improvement
• Conduct routine health checks on event management systems to ensure optimal performance, data accuracy, and integration stability.
• Analyze event and incident data trends to identify gaps, redundancies, or opportunities for improvement.
• Partner with Service Desk, Cloud, and Network teams to optimize event thresholds, escalation rules, and notification logic.
• Drive continuous improvement initiatives—reducing false positives and increasing actionable alerts through tuning and refinement.
5. Governance & Quality Assurance
• Enforce standardized event handling procedures across global and regional IT teams.
• Support governance reviews and audits to demonstrate compliance with ITIL Event Management and ITOM standards.
• Ensure event management aligns with organizational KPIs such as system uptime, MTTD, MTTR, and service reliability targets.
• Support the development of a Monitoring Maturity Framework to assess and elevate event management capabilities globally.
6. Collaboration & Knowledge Transfer
• Partner with technical teams to integrate new services and platforms into the event management ecosystem.
• Conduct knowledge transfer sessions, workshops, and training for regional IT and operations staff to embed best practices.
• Participate in Agile ceremonies and cross-functional collaboration to align event management efforts with larger transformation initiatives.
• Act as a subject matter expert (SME) for monitoring and observability in project design and operational readiness reviews.
7. Tool Evaluation & Innovation
• Evaluate emerging monitoring and observability tools to identify opportunities for consolidation or modernization.
• Support proof-of-concept activities, benchmark tool performance, and provide recommendations for technology roadmaps.
• Contribute to architectural discussions regarding hybrid monitoring strategies, especially in multi-cloud contexts.
Start hiring with Fuzu
Recruit better talent faster - on your own or with our support.
Explore recruitment platformJob search tips from Fuzu
Selected articles on cover letters, CV structure, and interview preparation.