What you’ll be doing
1. Own the Private Cloud “EC.3” Capacity Management Platform – act as the single accountable owner for capacity planning, forecasting, modelling, and optimisation across the VMware-based Enterprise Cloud v3 environment.
2. Define and Deliver the Capacity Roadmap – translate business demand and programme milestones into a prioritised backlog of features and automation, using Agile delivery practices.
3. Implement SRE Guardrails – establish SLIs, SLOs, and error budgets for infrastructure -related reliability; ensure proactive risk management
4. Develop Forecasting Models – build accurate short-, medium-, and long-term capacity forecasts using telemetry and scenario analysis to prevent saturation and ensure headroom.
5. Automate Capacity Workflows – reduce manual toil by creating scripts, policies, and integrations for rightsizing, placement, and quota enforcement using PowerCLI, APIs, and IaC.
6. Maintain Real-Time Telemetry & Dashboards – provide a single source of truth for utilisation, trends, and optimisation opportunities through VMware Aria Operations (vROps) and reporting tools.
7. Optimise Cost and Efficiency – align with FinOps principles to deliver show back/chargeback reporting, identify waste, and implement cost-saving measures without compromising reliability.
8. Integrate with ITSM & Governance – ensure ServiceNow CMDB accuracy, automate request fulfilment, and maintain compliance with capacity policies and audit requirements.
9. Collaborate Across Teams – work closely with Architecture, Programme Delivery, Finance, and Operations to align capacity decisions with strategic objectives and risk appetite.
10. Continuously Improve – evolve the capacity management capability through iterative enhancements, stakeholder feedback, and adoption of emerging best practices.
Leadership Accountabilities
11. Vision & Strategy – Define and communicate the long-term vision for capacity management on EC.3, ensuring alignment with business objectives and technology strategy.
12. Ownership & Accountability – Act as the single point of accountability for capacity planning, forecasting, and optimisation across the VMware platform.
13. Influence & Stakeholder Engagement – Build strong relationships with senior stakeholders, program leads, and cross-functional teams to drive decisions and secure buy-in.
14. Agile Leadership – Champion Agile ways of working, ensuring backlog prioritisation, iterative delivery, and continuous improvement of the capacity capability.
15. Reliability Governance – Embed SRE principles into leadership decisions, balancing innovation with risk management through SLIs, SLOs, and error budgets.
16. Financial Stewardship – Lead cost optimisation initiatives aligned with FinOps principles, ensuring efficient use of resources and transparent reporting.
17. Team Enablement – Mentor and guide engineers and analysts, fostering a culture of automation, data-driven decision-making, and operational excellence.
18. Change Leadership – Drive adoption of new processes, tools, and automation across teams, ensuring smooth transitions and minimal disruption.
19. Executive Communication – Provide clear, concise updates on capacity health, risks, and roadmap progress to senior leadership and governance boards.
20. Continuous Improvement – Lead retrospectives and postmortems to identify systemic improvements and embed lessons learned into future planning.
Key Decisions
21. Capacity Headroom Policy – Define minimum thresholds for CPU, memory, and storage across clusters to ensure reliability and performance.
22. Forecasting Approach – Select and implement the models and tools used for short-, medium-, and long-term capacity planning.
23. Automation Priorities – Decide which manual processes to automate first (e.g., rightsizing, placement, quota enforcement) to reduce toil and improve efficiency.
24. SLO & Error Budget Targets – Set reliability objectives for capacity-related metrics and determine acceptable risk levels for change management.
25. Optimisation Strategy – Choose cost-saving measures (e.g., rightsizing, decommissioning, reserved capacity) while balancing performance and resilience.
26. Tooling & Integration Choices – Determine which platforms (e.g., VMware Aria Operations, ServiceNow, Power BI) and scripts will form the core of the capacity management capability.
27. Governance & Compliance Controls – Establish policies for capacity requests, approvals, and audit readiness.
28. Reporting & Communication Cadence – Decide how often and in what format capacity health, risks, and forecasts are shared with stakeholders.
29. Change Freeze & Risk Mitigation – Make calls on when to pause non-essential changes based on capacity risk or error budget breaches.
30. Continuous Improvement Roadmap – Prioritise enhancements to forecasting accuracy, automation coverage, and stakeholder experience.
Skills & Experience Required for the Role
Essential:
31. Deep VMware Expertise – hands-on experience with vSphere, vCenter, vSAN, NSX-T, and VMware Aria Operations (vROps) for capacity analytics and optimisation.
32. Capacity Planning & Forecasting – ability to model demand, headroom, and growth scenarios using telemetry and data-driven methods.
33. Automation & Scripting – proficiency in PowerCLI, Python, and API integrations to automate rightsizing, placement, and quota enforcement.
34. Agile Delivery Skills – experience managing backlogs, writing user stories, and delivering incremental improvements through sprints and ceremonies.
35. SRE Practices – strong understanding of SLIs, SLOs, error budgets, and reliability engineering principles applied to infrastructure capacity.
36. Observability & Analytics – ability to design dashboards and alerts for utilisation, saturation, and optimisation opportunities.
37. FinOps Awareness – knowledge of cost optimisation, show back/chargeback models, and unit economics for infrastructure services.
38. Governance & Compliance – familiarity with ITSM tools (e.g., ServiceNow), CMDB data integrity, and audit-ready processes.
39. Stakeholder Engagement – excellent communication and influencing skills to align capacity decisions with business priorities.
40. Continuous Improvement Mindset – proactive approach to evolving processes, reducing toil, and adopting emerging best practices.
Experience you’d be expected to have
41. Proven track record in capacity management for large-scale VMware environments (vSphere, vCenter, vSAN, NSX-T).
42. Hands-on experience with VMware Aria Operations (vROps) or similar tools for capacity analytics, forecasting, and optimisation.
43. Automation and scripting expertise using PowerCLI, Python, and API integrations to reduce manual toil and enforce policies.
44. Agile delivery experience, including backlog management, sprint planning, and stakeholder engagement for platform capabilities.
45. Site Reliability Engineering (SRE) practices applied to infrastructure—defining SLIs/SLOs, managing error budgets, and improving reliability.
46. Performance engineering knowledge, including CPU/memory/storage utilisation, contention analysis, and headroom policies.
47. Cost optimisation and FinOps alignment, with experience in show back/chargeback models and unit economics for infrastructure services.
48. ITSM and governance experience, particularly ServiceNow CMDB integration and compliance with audit requirements.
49. Cross-functional collaboration, working with architecture, programme delivery, finance, and operations teams to align capacity decisions with strategic objectives.
50. Continuous improvement mindset, with a history of evolving processes, implementing automation, and driving operational excellence.
Benefits
51. On target 10% on target bonus
52. BT Pension scheme, minimum 5% Employee contribution, BT contribution 10%
53. From January 2025, equal family leave: receive 18 weeks at full pay, 8 weeks at half pay and 26 weeks at the statutory rate. It’s for all parents, no matter how your family is made up.
54. Enhanced women’s health support: including help with menopause symptoms, cancer screenings, period care and more.
55. 25 days annual leave (not including bank holidays), increasing with service
56. 24/7 private virtual GP appointments for UK colleagues
57. 2 weeks carer’s leave
58. World-class training and development opportunities
59. Option to join BT Shares Saving schemes.