Machine Learning Operations Engineer - AI
We are seeking a practical and automation-driven engineer to build and maintain the infrastructure powering our advanced AI systems. In this role, you will act as the bridge between data science and engineering teams to ensure machine learning models are deployed, monitored, and scaled efficiently.
The Role
* Infrastructure & CI/CD: Design and maintain MLOps infrastructure, establishing best practices for CI/CD, model testing, and versioning.
* Pipeline Development: Manage automated pipelines for training and deploying LLM-based systems, including RAG and Graph RAG approaches.
* Cloud Operations: Optimize cloud infrastructure for ML workloads, specifically focusing on platforms like Amazon Bedrock.
* Automation: Utilize Infrastructure as Code (IaC) principles to automate the provisioning of ML environments.
* Monitoring: Implement robust logging to track model performance, drift, and data quality in production.
Requirements
* Experience: 3+ years in MLOps, DevOps, or Software Engineering with a focus on ML infrastructure.
* Technical Skills: Proficiency in Python and experience with containerization (Docker) and orchestration (Kubernetes).
* Cloud Expertise: Deep experience with a major cloud provider (AWS, GCP, or Azure) and associated ML services like SageMaker or Vertex AI.
* AI Knowledge: Ability to implement infrastructure for complex systems leveraging vector stores and graph databases.
* Tooling: Familiarity with IaC tools such as Terraform or CloudFormation
#J-18808-Ljbffr