Salary: £40,000 - 70,000 per year Requirements:
* Proven experience delivering BCDR programmes within cloud-native or hybrid cloud environments (Azure/AWS/GCP)
* Strong understanding of high availability architectures
* Knowledge of data backup/restore strategies
* Familiarity with distributed systems resilience
* Experience with AI/ML operationalisation considerations
* Proficiency in orchestration platforms (Kubernetes/Microservices/Serverless)
* Experience with resilience features of major cloud providers (regions, zones, failover models, managed services)
* Demonstrable experience building, testing, and assuring BC/DR frameworks
* Knowledge of industry standards such as ISO 22301, ISO 27031, NIST SP 800 34, and UK regulatory expectations for operational resilience
* Track record of delivering resilience artefacts for audit and risk review
Responsibilities:
* Assess the platforms ability to meet defined availability, resilience, and service continuity requirements
* Validate clear and defensible RTO (Recovery Time Objective) and RPO (Recovery Point Objective) definitions aligned to business criticality
* Review architectural decisions and operational controls related to high availability, failover, redundancy, and disaster recovery
* Identify single points of failure across the end-to-end solution, covering AI agents and inference services, orchestration and workflow engines, data pipelines, storage, and backup strategies, cloud services, APIs, and 3rd party dependencies
* Conduct structured failure analysis and propose mitigation strategies
* Evaluate and enhance BC/DR plans, procedures, and supporting documentation
* Ensure runbooks are complete, actionable, and test-ready, including steps covering AI specific failure modes and model or agent recovery
* Produce updated BC/DR playbooks suitable for both technical and operational audiences
* Design and oversee BC/DR testing, including failover, restore, and resilience simulations
* Validate backup and restore integrity, especially for ML/AI model artefacts and metadata
* Ensure testing meets organisational policy, industry standards, and regulatory expectations
* Produce credible resilience evidence to satisfy governance, audit, risk management, and compliance functions
* Contribute to submission materials for internal approvals prior to production deployment
* Map controls to applicable frameworks (e.g., ISO 22301, FCA/PRA operational resilience guidelines, NIST, cloud provider best practices)
* Deliver resilience assurance reports identifying compliance against RTO/RPO and availability requirements
* Maintain a Single Points of Failure Register with a remediation plan
* Update BC/DR Plans and Technical Runbooks (AI-inclusive)
* Create a Test Plan & Test Evidence Pack covering DR scenarios and outcomes
* Prepare a Production Readiness Resilience Pack for governance/audit sign off
* Provide recommendations for architectural or operational improvements
Technologies:
* AI
* AWS
* Azure
* Cloud
* GCP
* Support
* Kubernetes
* Serverless
* microservices
More:
We are seeking a Business Continuity and Disaster Recovery (BCDR) Specialist to provide expert guidance and hands-on support in ensuring the resilience and operational continuity of our complex solution platform. This hybrid role requires you to work in our Reading office two days a week. At Investigo, we pride ourselves on simplifying recruitment processes and are committed to your success, as it is a cornerstone of our business. As part of The IN Group, we have been connecting talent since 2003, offering a collaborative environment and a chance to be part of an award-winning team.
last updated 4 week of 2026