Description
BT is industrialising network operations through data‑driven AIOps and a cloud‑based Common Data Fabric (CDF). The CDF ingests vendor performance data, governs schemas and KPIs, and powers analytics; AIOps adds multivariate anomaly detection (AD) and explainable, topology‑aware root cause analysis (RCA) so Ops teams detect earlier, diagnose faster, and understand service impact with confidence. The target stack is AWS‑native and includes streaming (MSK/Flink), data processing (EMR/Spark), governed storage and metadata (S3, DynamoDB, Aurora/Redshift/Iceberg), and ML/MLOps with SageMaker; RCA complements this with graph reasoning in Neptune and agentic explainability via Bedrock.
What You’ll Be Doing
* Use Terraform/CloudFormation to deploy reusable modules for all AWS components (MSK, Flink, EMR, S3, DynamoDB, Aurora, EC2,Neptune, SageMaker, networking, IAM).
* Implement automated CI/CD pipelines using GitLab/Jenkins for IaC, services, data pipelines, ML models, and inference deployments.
* Enforce deployment safety with safeguards: approvals, unit/integration tests, automated rollbacks, compliance checks.
* Implement multi‑AZ VPC architectures with private subnets, NAT, routing, endpoint policies, and secure on‑prem connectivity (Direct Connect/VPN).
* Engineer governed multi‑layer data storage using S3 (raw/curated), DynamoDB (metadata/state), Aurora/RDS (relational stores), Redshift/Iceberg (analytical stores).
* Apply advanced AWS security practices including: IAM least‑privilege, KMS encryption, secrets management, CloudTrail governance, Config rules, GuardDuty/Security Hub integration.
* Build end‑to‑end observability across pipelines using: CloudWatch metrics/logs/alarms, OpenTelemetry for distributed tracing (where applicable), data quality and pipeline SLIs/SLOs.
* Develop runbooks, playbooks, synthetic tests, chaos testing, failover simulations, and incident triage workflows.
* Operate AWS Neptune graph clusters for network topology representation.
* Promote software releases to pre‑production and production environments and ensure a smooth transition to ASG.
* Define standards for documentation, change management and quality gates that reduce MTTR and improve platform reliability.
Essential Skills / Experience
* Full understanding of AWS resilience patterns, AWS backup, and how to manage incident response with break glass approaches as needed.
* Full detailed knowledge of Lambda, including runtimes, packaging, storage, concurrency, retries and dead‑letter queues.
* Extensive hands‑on experience with AWS CLI and working knowledge of python.
* Strong AWS experience, particularly with Compute, Data bases and Streaming services like MSK, Flink, S3, DynamoDB, Aurora/RDS, Redshift, Lambda etc.
* Sound understanding of VPCs, Route tables, IGW, NAT, ACL’s, Endpoints, Transit Gateway, DNS, VPC endpoints.
* Experience designing observability for serverless systems (logs/metrics/traces) and implementing distributed tracing and dashboards using open standards and AWS tooling like Cloudwatch.
* Understanding of how IAM roles and policies work.
* Ability to perform deep dive using CloudWatch & X‑Ray for troubleshooting (covering logs, metrics, alerts, traces, filters, agents etc.)
Desirable Skills / Experience
* Familiarity with event‑driven architectures (Step Functions, EventBridge, queues), and “shift‑left” quality practices (automated testing, policy‑as‑code, guardrails).
* Comfortable leading incident deep‑dives and contributing to permanent fixes.
* Relevant AWS certifications.
* Pragmatic engineer who automates wherever possible to reduce toil.
* Strong analytical thinking and structured problem‑solving skills.
Our Package
* 10% on target annual bonus
* Access to an online private GP 24/7 for you and your immediate family
* Market‑leading paid carers leave with up to 2 weeks off
* Equalised maternity, paternity, and adoption leave – 18 weeks’ full pay and 8 weeks’ half pay
* Discounted EE and BT products, including mobile and broadband
* Market leading Pension scheme – 5% from you and 10% from us
* Holiday purchase scheme
#J-18808-Ljbffr