What you’ll be doing
1. Design and implement end to end MLOps workflows using AWS SageMaker, including:
• SageMaker Pipelines for training and orchestration
• SageMaker Feature Store for feature management
• SageMaker Model Registry for model versioning and approvals
• SageMaker Experiments for lineage and metadata tracking
2. Enable consistent promotion of models across environments (dev / test / pre-prod / prod)
3. Implement automated retraining strategies triggered by data or performance changes
4. Implement and mature an MLOps framework covering code/data/model versioning, automated testing, release governance, rollback strategies and environment promotion controls.
5. Apply security by design across SageMaker workloads by adopting IAM least privilege roles for training, pipelines and endpoints and ensuring Network isolation using VPC attached SageMaker resources.
6. Implement model monitoring (e.g. data quality, model quality, bias drift, feature attribution drift) and alerting driving automated responses such as retraining triggers and controlled redeployments.
7. Put in place drift detection, evaluation routines, and model performance reporting; partner with data science to define thresholds, baselines and acceptance criteria.
7. Define standards for documentation, change management and quality gates that reduce MTTR and improve platform reliability.
8. Partner with data scientists to productionise notebooks and experiments into managed pipelines.
9. Build scalable inference solutions using SageMaker real time and serverless endpoints.
Experience Required
8. Degree in Computer Science/Engineering (or equivalent practical experience leading production cloud/ML platforms).
9. AWS certifications strongly preferred (at least one of these) :
1. DevOps Engineer Professional
2. Machine Learning Engineer – Associate
3. AI Practitioner for GenAI fundamentals
10. Knowledge of data governance, lineage, and model explainability practices
11. Strong hands-on experience with MLOps practices: CI/CD, versioning (code/data/model), release governance, and production monitoring.
12. Strong AWS experience, particularly with Amazon SageMaker for ML deployment and monitoring including drift/quality monitoring approaches.
13. Experience designing observability for serverless systems (logs/metrics/traces) and implementing distributed tracing and dashboards using open standards and AWS tooling.
The skills you’ll need
Data ModellingDevOpsAPIs/Web Service IntegrationProgramming/ScriptingData PrivacyBig Data ProcessingCloud ComputingArtificial Intelligence/Machine LearningPerformance MonitoringData AnalysisData ManagementArtificial Intelligence (AI) EthicsAlgorithm DesignData Model ManagementMachine Learning Operations (MLOps)Decision MakingGrowth MindsetInclusive LeadershipDataOps
Our leadership standards
Looking in:
Leading inclusively and Safely
I inspire and build trust through self-awareness, honesty and integrity.
Owning outcomes
I take the right decisions that benefit the broader organisation.
Looking out:
Delivering for the customer
I execute brilliantly on clear priorities that add value to our customers and the wider business.
Commercially savvy
I demonstrate strong commercial focus, bringing an external perspective to decision-making.
Looking to the future:
Growth mindset
I experiment and identify opportunities for growth for both myself and the organisation.
Building for the future
I build diverse future-ready teams where all individuals can be at their best.