Description and Requirements
This role is open for the Edinburgh, Scotland location only. Candidates must be based there, as the position requires working from the office at least three days per week (3:2 hybrid policy).
The Lenovo AI Technology Center (LATC)—Lenovo’s global AI Center of Excellence—is driving our transformation into an AI-first organization. We are assembling a world-class team of researchers, engineers, and innovators to position Lenovo and its customers at the forefront of the generational shift toward AI. Lenovo is one of the world’s leading computing companies, delivering products across the entire technology spectrum, spanning wearables, smartphones (Motorola), laptops (ThinkPad, Yoga), PCs, workstations, servers, and services/solutions. This unmatched breadth gives us a unique canvas for AI innovation, including the ability to rapidly deploy cutting-edge foundation models and to enable flexible, hybrid-cloud, and agentic computing across our full product portfolio. To this end, we are building the next wave of AI core technologies and platforms that leverage and evolve with the fast-moving AI ecosystem, including novel model and agentic orchestration & collaboration across mobile, edge, and cloud resources. This space is evolving fast and so are we. If you’re ready to shape AI at a truly global scale, with products that touch every corner of life and work, there’s no better time to join us.
Summary
Lenovo is seeking a highly skilled AI Infrastructure Engineer / AI Operations Engineer to join our growing team. This critical role will focus on designing, building, and maintaining the infrastructure and tools necessary for efficient AI model development, deployment, and operation. Your expertise will enable our data scientists and engineers to focus on high-priority tasks while ensuring seamless operation of AI models in production. If you are passionate about making Smarter Technology For All, come help us realize our Hybrid AI vision!
Responsibilities:
AI Platform Engineering & Operations
1. Design, deploy, and maintain scalable Kubernetes/OpenShift based AI and ML platforms, supporting diverse AI/ML and cloud native workloads.
2. Implement and manage GitOps-driven platform configuration using ArgoCD and Helm.
3. Proficient in Linux system administration, including package management, user/group management, file system navigation, shell scripting Bash), and system configuration systemd, networking).
MLOps & Model Lifecycle Management
4. Build and automate ML pipelines using KubeFlow Pipelines, Tekton, and Python SDKs.
5. Support deployment and serving of AI/ML models using KServe, Knative, and NVIDIA Triton (where applicable).
6. Integrate model registry, workflow automation, and end to end ML lifecycle tooling.
Automation, Observability & Reliability
7. Develop automation using Python, Ansible, Terraform, and CI/CD pipelines.
8. Implement monitoring and alerting with Prometheus, Grafana, and AlertManager for AI workloads and platform health.
9. Optimise the AI platform for performance, reliability, and scalability.
Cloud & Infrastructure Integration
10. Deploy and operate hybrid/multi cloud Kubernetes environments across AWS, GCP, and on prem infrastructure.
11. Implement identity, RBAC, and enterprise security integrations Azure AD, LDAP).
Collaboration & Customer Success
12. Work across AI engineering, DevOps, data science, and platform teams to ensure smooth operation and feature delivery.
13. Provide technical guidance to stakeholders and support customer deployments in production environments.
Required Qualifications:
14. Bachelor’s degree in Computer Science, Engineering, or related field.
15. 8+ years of DevOps / Cloud Native engineering experience, with major focus on Kubernetes and containerised workloads.
16. Deep expertise with Kubernetes / OpenShift administration, including cluster configuration, operators, networking, and security.
17. Strong experience with GitOps, including ArgoCD and Helm.
18. Hands on experience with MLOps tooling, for example KServe, Kubeflow, Tekton, Knative, and ML pipeline automation.
19. Proficiency in Python, Bash scripting, and automation frameworks (Ansible, Terraform).
20. Experience with cloud platforms including AWS/GCP/Azure.
21. Strong observability experience with Prometheus & Grafana or similar.
22. Excellent problem solving, communication, and stakeholder engagement skills
Bonus Points
23. Experience with Red Hat OpenShift AI ecosystem.
24. Knowledge of model serving patterns Triton ensemble models, OCI artifact based LLM serving).
25. Certifications such as CKA, CKS, GCP ACE, AWS SAA, or Red Hat OpenShift specialsations.
26. Experience with data engineering or AI/ML workflow orchestration.
27. Deployment and management of scaled CI/CD monorepo patterns.
28. Deployment and management of click to deploy Internal Developer Portals such as Backstage
What we offer:
29. Opportunities for career advancement and personal development
30. Access to a diverse range of training programs
31. Performance-based rewards that celebrate your achievements
32. Flexibility with a hybrid work model (3:2) that blends home and office life
33. Electric car salary sacrifice scheme
34. Life insurance
#LATC