Social network you want to login/join with:
You will be part of a team designing and building a Gen AI virtual agent to support customers and employees across multiple channels. You will build and run LLM-powered agentic experiences, owning the design, orchestration, MLOps, and continuous improvement.
Responsibilities:
* Design & build client-specific GenAI/LLM virtual agents
* Enable the orchestration, management, and execution of AI-powered interactions through purpose-built AI agents
* Design, build, and maintain robust LLM-powered processing workflows
* Develop cutting-edge testing suites related to bespoke LLM performance metrics
* Implement CI/CD pipelines for ML/LLM: automated build, train, validate, deploy for chatbots and agent services
* Use Infrastructure as Code (Terraform/CloudFormation) to provision scalable cloud environments for training and real-time inference
* Monitor model and service health through observability practices: drift detection, hallucination checks, SLOs, and alerting
* Serve at scale with containerization, auto-scaling (e.g., Kubernetes), ensuring low-latency inference
* Maintain data & model versioning with a central model registry, ensuring lineage and rollback capabilities
* Deliver a live performance dashboard tracking intent accuracy, latency, error rates, and develop retraining strategies
* Collaborate with product, engineering, and client stakeholders to foster innovation around frameworks/models
Qualifications / Experience:
* Relevant primary degree, MSc or PhD preferred
* Expertise in mathematics, classical ML algorithms, and deep knowledge of LLMs (prompting, fine-tuning, RAG, evaluation)
* Hands-on experience with AWS and Azure ML services (e.g., Bedrock, SageMaker, Azure OpenAI, Azure ML)
* Strong engineering skills: Python, APIs, containers, Git, CI/CD (GitHub Actions, Azure DevOps), IaC (Terraform, CloudFormation)
* Experience with scalable serving infrastructure: containerized, auto-scaling environments (e.g., Kubernetes)
* Workflow automation across the ML lifecycle, from data ingestion to model deployment
* Development of live performance dashboards and documented retraining strategies
* Experience with Kubernetes, inference optimization, caching, vector stores, and model registries
* Excellent communication skills, stakeholder management, and ability to write clear technical documentation and runbooks
Personal Attributes:
* Integrity, stakeholder management, project management, familiarity with Agile methodologies, automation, data visualization, and analysis
#J-18808-Ljbffr