Jobs
My ads
My job alerts
Sign in
Find a job Career Tips Companies
Find

Incident manager

London
Airtel Africa
Incident manager
Posted: 2 October
Offer description

Responsibilities

* Serve as the primary technical point of contact during critical incidents, ensuring rapid resolution and minimal business impact.
* Lead and coordinate cross-functional teams (engineering, support, operations) during incident response, including root cause analysis, mitigation strategies, and post-mortem reviews.
* Monitor service health using tools such as CloudWatch, OpenSearch, Kibana, Grafana, and proactively identify potential issues before they impact customers.
* Troubleshoot and debug production issues in web architecture, microservices, and cloud environments.
* Manage and maintain system reliability by implementing best practices in observability, monitoring, and alerting.
* Collaborate closely with Software Development, Infrastructure, and Operations teams to improve incident response processes and system resilience.
* Manage incidents related to AWS services such as EC2 S3 RDS, DynamoDB, Aurora, Redis, Memcache, Kafka, SNS, SQS, OpenSearch, and Elasticsearch.
* Use Agile tools (Jira, Confluence) to track incident tickets, document resolutions, and maintain a clear audit trail.
* Oversee system and application deployments, supporting automation pipelines (Jenkins, Git).
* Perform Linux/Unix administration tasks as needed during incident investigation and resolution.
* Continuously update and refine incident response playbooks, runbooks, and SOPs.
* Provide regular incident reports to leadership, including root cause analysis and long-term corrective actions.


Requirements

* Proven experience as an Incident Manager, Site Reliability Engineer (SRE), or Technical Operations Lead in cloud-native and microservices-based environments.
* Strong understanding of web architecture and microservices development principles.
* Deep hands-on experience with AWS Cloud Services: Compute (EC2 Lambda), Storage (S3), Databases (DynamoDB, RDS, Aurora), Messaging (Kafka, SNS, SQS), Caching (Redis, Memcache), Search (OpenSearch, Elasticsearch).
* Expertise in Agile tools: Jira, Confluence, Git, Jenkins.
* Strong Linux / Unix system administration skills, including troubleshooting and performance tuning.
* Strong analytical skills with expertise in debugging complex distributed system issues.
* Experience with monitoring and observability tools like CloudWatch, Grafana, Nagios, and Kibana.
* Excellent communication and leadership skills to manage cross-functional incident response teams.
* Experience in writing detailed post-incident reports and driving continuous improvement.
* Strong scripting skills (Python, Bash, or similar) to automate diagnostic or remediation tasks.
#J-18808-Ljbffr

Apply
Create E-mail Alert
Job alert activated
Saved
Save
Similar job
Incident manager
London
Wilson James
Incident manager
Similar job
Incident manager
London
Capgemini
Incident manager
Similar job
Incident manager
London
N Consulting Limited
Incident manager
See more jobs
Similar jobs
Management jobs in London
jobs London
jobs Greater London
jobs England
Home > Jobs > Management jobs > Incident manager jobs > Incident manager jobs in London > Incident Manager

About Jobijoba

  • Career Advice
  • Company Reviews

Search for jobs

  • Jobs by Job Title
  • Jobs by Industry
  • Jobs by Company
  • Jobs by Location
  • Jobs by Keywords

Contact / Partnership

  • Contact
  • Publish your job offers on Jobijoba

Legal notice - Terms of Service - Privacy Policy - Manage my cookies - Accessibility: Not compliant

© 2025 Jobijoba - All Rights Reserved

Apply
Create E-mail Alert
Job alert activated
Saved
Save