What you’ll be doing
1. Implement and optimise CI/CD pipelines, automation frameworks, and infrastructure-as-code solutions using AWS, GitOps, and container technologies.
2. Design, develop, and troubleshoot large-scale distributed systems across on-prem and cloud environments, ensuring reliability and scalability.
3. Lead performance and scale testing, monitoring, and analysis to improve system stability, security, and efficiency.
4. Drive automation initiatives to eliminate manual toil, reduce detection and resolution times, and enhance operational resilience.
5. Proactively identify and mitigate risks, perform root cause analysis, and implement preventive measures following incidents.
6. Champion best practices in Site Reliability Engineering, mentor team members, and share knowledge on emerging trends and technologies.
7. Collaborate across organisational boundaries to deliver improvements aligned with broader SRE initiatives.
Experience you'll have
Mandatory:
8. A deep understanding of full-stack monitoring solutions, such as Dynatrace, to ensure current end-to-end performance and trends of owned CDO Applications.
9. Strong proficiency in one or more programming languages (e.g. Java, Python).
10. Experience with cloud platforms (AWS, Azure, or GCP).
11. Solid understanding of software architecture, design patterns, and microservices.
12. Familiarity with CI/CD tools and DevOps practices.
Desirable:
13. AIOps fundamentals (cross-domain telemetry ingestion, event correlation, topology/context building, and remediation augmentation).
14. Agentic/autonomous observability skills (using intelligent agents to detect anomalies, correlate signals, and trigger guarded remediations to cut MTTR).
15. AI-assisted alerting & noise reduction (designing contextual, business impact aware alerts; prioritisation via ML).
Skills you'll need
16. Troubleshooting
17. Infrastructure Configuration
18. Service Assurance
19. Application Performance Monitoring & Alerting
20. Computer Networking
21. System Administration
22. Programming/Scripting
23. Artificial Intelligence Operations (AIOps)
24. Server Architecture
25. Cloud Computing
26. Continuous Integration/Continuous Deployment Automation & Orchestration
27. Systems Integration
28. Project/Programme Management
29. Incident Management
30. Decision Making
31. Growth Mindset
32. Inclusive Leadership
Our leadership standards
Looking in:
Leading inclusively and Safely
I inspire and build trust through self-awareness, honesty and integrity.
Owning outcomes
I take the right decisions that benefit the broader organisation.
Looking out:
Delivering for the customer
I execute brilliantly on clear priorities that add value to our customers and the wider business.
Commercially savvy
I demonstrate strong commercial focus, bringing an external perspective to decision-making.
Looking to the future:
Growth mindset
I experiment and identify opportunities for growth for both myself and the organisation.
Building for the future
I build diverse future-ready teams where all individuals can be at their best.
Benefits
33. An annual on-target bonus of 10% (personal and company multipliers).
34. BT Pension scheme: minimum 5% employee contribution, BT contribution 10%.
35. Exclusive colleague discounts on our latest and greatest BT broadband packages.
36. 50% off EE mobile pay monthly or SIM only plans, and 50% discount for friends and family on EE SIM only plans.
37. Discounted EE TV, including TNT Sport and the NOW Entertainment membership.
38. There’s also great support for working parents, including pay whilst on maternity, adoptive, and paternity leave.
39. 25 days annual leave (not including bank holidays), increasing with service.
40. Volunteering days, so you can give back to your local community.
41. Brand new electric vehicle salary sacrifice arrangement, known as ‘My EV’.