Join to apply for the AI Platform Engineer role at Bright.
We are building an ambitious internal AI Platform to power Bright's next generation of AI‑driven products and services. This Kubernetes‑hosted platform provides teams across the organisation with the tools to build, deploy, and observe AI‑powered applications without managing complex infrastructure themselves.
Location
Warwick
Employment Type
Full Time
Department
AI
Key Responsibilities
* Core Platform Services:
o Observability & Experimentation: Enhancing Langfuse for LLM tracing, evaluation, and experimentation capabilities.
o Developer Self‑Service: Building and improving Backstage as an internal developer portal for platform discoverability.
o LLM Operations: Deploying and maintaining LiteLLM proxy, Langflow runtime, and other core LLM services.
o Monitoring & Logging: Implementing platform‑wide monitoring (Prometheus/Grafana) and logging infrastructure (Loki).
* Security & Compliance:
o LLM Ops Security: Implementing guardrails (LlamaGuard, Azure Guardrails) and security controls.
o GDPR & PII Management: Building automated PII detection, minimisation strategies, and compliance tooling.
o Incident Response: Establishing security incident response procedures for LLM operations.
* Infrastructure & Reliability:
o Kubernetes Operations: Managing AKS clusters, implementing reliable deployment tooling via ArgoCD.
o Infrastructure as Code: Productionising infrastructure with Terraform, eliminating manual configuration.
o Autoscaling & Performance: Implementing workload management and autoscaling for AI services.
o Storage Solutions: Migrating from self‑hosted MinIO to managed Azure Blob Storage.
* Applications Support:
o RAG (Retrieval‑Augmented Generation) applications – Ask IPASS and Ask UK Pay Centre.
o Document processing applications (BrightCapture).
o Employee onboarding automation (Oscar).
o Internal AI assistant (Bright GPT).
Essential Skills & Experience
* 2‑4 years experience with cloud infrastructure, preferably Azure.
* Hands‑on experience with Kubernetes (AKS experience is a plus).
* Infrastructure as Code: Terraform or similar IaC tools.
* CI/CD: Experience with GitOps workflows and tools such as ArgoCD, GitHub Actions.
* System programming: Proficiency in Python or Go; shell scripting essential.
* Linux & Containers: Solid understanding of Docker and container orchestration.
Desirable Experience
* Exposure to LLM technologies or AI/ML infrastructure.
* Experience with observability tools (Prometheus, Grafana, Loki).
* Knowledge of Helm, Helmfile, and Kustomize for Kubernetes deployments.
* Understanding of security best practices and compliance requirements (GDPR).
* Developer portal platforms such as Backstage.
* Backend‑as‑a‑Service platforms (Supabase or similar).
* Application programming experience with .NET or TypeScript.
Team Structure & Reporting
* Reports to Head of AI.
* Works closely with two senior/principal platform engineers.
* Collaborates with application development teams, product managers, and security/compliance stakeholders.
* Team size: Small, full‑stack AI team covering development, DevOps, operations, and support.
What Success Looks Like
In your first 3 months:
* Contribute to multiple platform epics from the roadmap.
* Understand the architecture of the AI platform and navigate the codebase.
* Successfully deploy services to Kubernetes clusters.
* Participate in on‑call rotation and troubleshoot platform issues.
In your first 6 months:
* Independently own epics and drive them to completion.
* Contribute to architectural decisions and technical direction.
* Improve platform reliability, observability, or developer experience.
* Mentor junior engineers or onboard new team members.
Technical Stack
* Infrastructure: Azure (AKS, Blob Storage, Cognitive Services), Kubernetes, Terraform.
* Platform Services: LiteLLM, Langflow, Langfuse, Supabase, Open Web UI, Backstage.
* Observability: Prometheus, Grafana, Loki, Langfuse tracing.
* CI/CD: ArgoCD, GitHub Actions, Helmfile.
* Languages: Python, Go, Shell scripting.
* Security: Azure Guardrails, LlamaGuard, PII detection tooling.
Benefits
* Competitive salary.
* Performance‑based bonus.
* 25 days annual leave.
* Health Insurance.
* Company pension.
* Company events.
* Free food onsite.
* On‑site parking.
* Referral programme.
* Sick pay.
* Wellness programmes.
#J-18808-Ljbffr