This role is only available in our Manchester location
The Government Digital Service (GDS) is the digital centre of government. We are responsible for setting, leading and delivering the vision for a modern digital government.
Our priorities are to drive a modern digital government, by:
joining up public sector services
harnessing the power of AI for the public good
strengthening and extending our digital and data public infrastructure
elevating leadership and investing in talent
funding for outcomes and procuring for growth and innovation
committing to transparency and driving accountability
We are home to the Incubator for Artificial Intelligence (I.AI), the world-leading GOV.UK and at the forefront of coordinating the UK's geospatial strategy and activity. We lead the Government Digital and Data function and champion the work of digital teams across government.
We're part of the Department for Science, Innovation and Technology (DSIT) and employ more than 1,000 people all over the UK, with hubs in Manchester, London and Bristol.
The Government Digital Service is where talent translates into impact. From your first day, you'll be working with some of the world's most highly-skilled digital professionals, all contributing their knowledge to make change on a national scale.
Join us for rewarding work that makes a difference across the UK. You'll solve some of the nation's highest-priority digital challenges, helping millions of people access services they need
Job description
This is an exciting opportunity to be a part of the Technical Service Desk team for the One Login programme. Reporting into the Lead Service Operations Manager, the Major Incident Manager will play a critical role in ensuring that the One Login service is operating as intended. Having responsibility for keeping Reliant Parties and internal stakeholders informed of events, actions and opportunities that are likely to impact their day-to-day activities, providing an essential interface with IT operational staff, Service Continuity and other supporting referral groups. The Major Incident Manager is responsible for leading the response to high-impact incidents, ensuring rapid restoration of services and minimising business disruption, being accountable for the maintenance of service resilience policy, guidance and co-ordination.
As a Major Incident Manager you'll:
take ownership of major incidents from detection through resolution, assess incident severity, determine business impact, and initiate the major incident process
lead technical bridges, ensuring efficient collaboration and clear direction, co-ordinate cross functional teams across the One Login programme
ensure timely escalation to senior leadership when required; and act as the primary point of contact during major incidents
provide timely updates to customers (internal and external), vendors and senior staff
maintain accurate incident logs, timelines and communication records and document incident timelines, root causes, impacts and recovery steps
produce Post Incident Reports (PIRs) with actionable recommendations
have the ability to analyse data and graphs to identify service anomalies
review and improve incident management processes, workflows and SLAs and develop and maintain major incident procedures and runbooks
draft, review and maintain a service resilience guidance document to ensure a consistent approach across the programme
maintain a current view of service resilience risks and raising for SCS review and approval on a six monthly basis
coordinate testing and exercising of service resilience plans across the live service
participate in an on-call rotation to provide after hours support as needed
Person specification
We are interested in people who have:
a proven track record of working in a Critical National Infrastructure (or comparable scale, profile, risk, complexity etc) live service environment and running bridge calls/war rooms during outages
experience of reviewing, optimising and taking forward process improvements on a service that is comparable (i.e. scale, profile, risk, etc) to One Login
a demonstrable track record of leading resilience teams, co-ordinating the response to major incidents, ensuring relevant prioritisation, focus on restoring the service and effective stakeholder engagement
facilitated resilience workshops for technical and non-technical teams boosting organisation wide readiness. Worked collaboratively in a group, actively networking with others. Adapted feedback to ensure it is effective and lasting
Root Cause Analysis (RCA) familiarity with the ability to document incidents and conduct Post Incident Reviews (PIRs), analyse and assess the impact of change, document change requests and action changes from change requests
the ability to remain calm under pressure with excellent leadership and decision making. Manage service components to ensure they meet business needs and key performance indicators (KPIs)
an understanding of the core technical concepts related to the role, an awareness of cloud computing and key components on which we build modern digital services
take accountability for issues that occur and be proactive in searching for potential problems effectively consulting specialists where required, strong analytical and problem solving abilities
Please note that this role requires SC clearance, which would normally need 5 years' UK residency in the past 5 years. This is not an absolute requirement, but supplementary checks may be needed where individuals have not lived in the UK for that period. This may mean your security clearance (and therefore your appointment) will take longer or, in some cases, not be possible.
DSIT cannot offer Visa sponsorship to candidates through this campaign. DSIT holds a Visa sponsorship licence but this can only be used for certain roles and this campaign does not qualify.