Cloud Infrastructure Engineer
Role Profile
The Cloud Infrastructure Engineer plays a critical role in designing, operating, automating and continuously improving XCM's cloud platform. Working across Microsoft Azure, Kubernetes, networking, security and platform operations, you will ensure that XCM's technology platforms remain secure, resilient, scalable and cost-effective.
This is a hands-on engineering role focused on cloud-native infrastructure, automation, platform reliability and operational excellence. You will be responsible for the infrastructure that underpins XCM's products and services, helping to deliver highly available platforms capable of supporting continued business growth.
Overview
XCM is on a cloud-first journey, transitioning from legacy infrastructure towards a modern, highly automated cloud platform. The Cloud Infrastructure Engineer will play a key role in delivering this strategy.
The role is responsible for the design, implementation and operation of XCM's cloud infrastructure, Kubernetes platforms, shared services and supporting operational processes. You will work closely with development, data, security and product teams to ensure our platforms remain secure, resilient, scalable and efficient.
A strong focus on Infrastructure as Code, automation, observability, reliability engineering and continuous improvement is essential. Success in this role requires both strategic thinking and a willingness to be hands-on when solving problems and supporting colleagues.
What You'll Be Responsible For
Cloud Platform Engineering
- Lead the administration, optimisation and governance of XCM's Microsoft Azure environments.
- Design and implement secure, scalable and resilient cloud infrastructure solutions.
- Manage Azure networking, compute, storage, identity and platform services.
- Support multi-region cloud deployments and disaster recovery strategies.
- Drive platform standardisation and cloud architecture best practices.
- Manage cloud capacity planning, performance optimisation and cost management.
- Lead initiatives that improve platform scalability, resilience and operational maturity.
- Support the ongoing migration and retirement of legacy infrastructure and services.
Kubernetes & Platform Operations
- Operate and evolve Kubernetes platforms supporting XCM products and services.
- Ensure Kubernetes environments remain secure, highly available and operationally robust.
- Manage shared platform services deployed across environments.
- Support platform scaling, resource management and performance optimisation.
- Develop and maintain recovery, failover and business continuity processes.
- Work closely with engineering teams to ensure platform capabilities align with product requirements.
Infrastructure as Code & Automation
- Own and evolve Infrastructure as code practices across the organisation.
- Develop and maintain Terraform, Helm and related automation frameworks.
- Ensure environments can be consistently provisioned, scaled and recovered through code.
- Automate infrastructure deployment, patching, upgrades and operational processes wherever practical.
- Reduce operational overhead through tooling, scripting and self-service capabilities.
- Champion automation-first approaches across infrastructure operations.
Reliability Engineering & Operations
- Ensure the availability, performance and reliability of critical business systems and services.
- Own monitoring, alerting, logging and observability platforms for the cloud infrastructure and networks.
- Define and monitor operational health metrics and service objectives.
- Lead incident response and resolution activities for infrastructure and platform services.
- Conduct root cause analysis and implement preventative improvements following incidents.
- Coordinate maintenance activities, upgrades and infrastructure changes.
- Maintain operational procedures, standards and technical documentation.
Security & Governance
- Work closely with security teams to maintain secure infrastructure standards.
- Ensure cloud and infrastructure platforms comply with organisational security policies and industry best practices.
- Support ISO27001, Cyber Essentials+, GDPR and audit-related activities.
- Coordinate patching, vulnerability management and remediation activities.
- Continuously improve cloud security posture through proactive monitoring and governance.
Leadership & Project Delivery
- Lead cloud and infrastructure projects from planning through implementation.
- Provide technical leadership and subject matter expertise across cloud and platform technologies.
- Contribute to infrastructure strategy, roadmaps and technology planning.
- Collaborate closely with engineering, data, security and business teams.
- Support colleagues across the organisation and contribute wherever required to achieve team objectives.
What We're Looking For
- Strong experience in cloud infrastructure, platform engineering or infrastructure leadership roles.
- Strong hands-on Microsoft Azure administration and architecture experience.
- Strong experience operating production Kubernetes environments.
- Experience designing and managing Infrastructure as Code solutions using Terraform or similar technologies.
- Strong understanding of cloud networking, security and identity management.
- Experience with monitoring, observability and operational tooling.
- Experience delivering cloud transformation and infrastructure modernisation programmes.
- Strong troubleshooting, analytical and problem-solving skills.
- Experience supporting highly available, business-critical production systems.
- Excellent communication and stakeholder management skills.
Technology Exposure Should Include
- Microsoft Azure
- Azure Networking
- Azure Virtual Machines
- Azure Entra ID
- Azure Security & Governance
- Azure Backup & Disaster Recovery
- Azure Monitor & Log Analytics
- Kubernetes
- Helm
- Terraform
- PowerShell and Python
- GitOps and CI/CD Tooling
- VMware (desirable)
- Windows Server
- Linux Administration
- SQL Administration
- PostgreSQL
- ClickHouse
- Kafka
- Networking, Firewalls, VPNs, DNS and Certificates
Ideal Characteristics
- Strong ownership mentality and accountability for outcomes.
- Highly proactive with a focus on identifying issues before they become incidents.
- Demonstrates a continuous improvement mindset.
- Comfortable balancing strategic planning with hands-on technical delivery.
- Able to work independently while remaining highly collaborative.
- Strong problem-solving skills and persistence when dealing with complex technical challenges.
- Dependable, adaptable and willing to contribute wherever needed in a fast-paced environment.
- Strong attention to detail and commitment to operational excellence.
- Passionate about automation, efficiency and reducing manual effort.
- Committed to continuous learning and professional development.
What Success Looks Like
- Secure, resilient and highly available cloud platforms supporting XCM's products and services.
- Successful retirement of legacy infrastructure and continued cloud transformation.
- Improved levels of infrastructure automation and operational efficiency.
- Reliable, repeatable environment provisioning through Infrastructure as Code.
- Reduced operational overhead through automation and process improvement.
- Strong platform observability, monitoring and incident management practices.
- Proactive identification and resolution of performance, reliability, security and cost optimisation opportunities.
- Effective support for business growth through scalable and dependable technology platforms.
Why This Role Is Exciting
- Opportunity to shape and influence XCM's cloud-first technology strategy.
- Operate and evolve modern cloud-native platforms running at scale.
- Work extensively with Azure, Kubernetes, Infrastructure as Code and automation technologies.
- Lead infrastructure transformation initiatives with direct business impact.
- Collaborate with highly skilled teams across engineering, data, security and operations.
- Play a key role in building the next generation of XCM's technology platform.