Job Description Key Responsibilities Infrastructure Management: Design, build, and maintain scalable cloud infrastructure to support production, staging, and development environments. Infrastructure as Code (IaC): Implement and manage infrastructure using tools Automation & CI/CD: Develop and maintain CI/CD pipelines to automate application build, testing, and deployment processes. Monitoring & Observability: Implement and maintain monitoring, logging, and alerting systems (Prometheus, Grafana, ELK, Datadog, etc.) to ensure high availability and performance. Security & Compliance: Manage cloud security and ensure compliance with company and industry standards. Collaboration: Work with developers to create infrastructure solutions that accelerate software delivery and improve reliability. On-Call Responsibility: Serve as the primary on-call engineer for P1 (critical production) issues. You will respond to high-priority incidents impacting system availability or business-critical services. On-call duties are limited to critical issues only, supported by monitoring, alerting, and documented escalation procedures. Appropriate support and compensation will be provided. Cost Optimisation & Reliability: Monitor and optimise cloud resource usage to maintain cost efficiency and system stability.