Overview
Role: Staff AI Backend Engineer AWS / Node.js / TypeScript
Company: Serve First CX
Location: hybrid from Milton Keynes (min 3 days/month in office)
Team: Engineers across UK, US, India, and Philippines
Reports to: CTO (US)
Works closely with: Head of Engineering (India)
Why Serve First
We’re a scrappy, well-funded (£4.5 million seed closed) AI startup turning raw customer feedback into real-time insight for businesses that care about CX. Our 2025 roadmap is ambitious: break apart our Node.js monolith into microservices, double our AI-driven workflows, and harden infrastructure for 100× traffic. Everyone ships. Everyone is on-call. Bureaucracy is nil. Velocity is high.
What You'll Do
* Break up the monolith: Define service boundaries and lead the transition to a microservices architecture. Implement REST + SQS communication between services, containerized via ECS Fargate. You'll design services that scale, not snowball.
* Own AI integration: Build features using OpenAI APIs today, and pave the path for tomorrow: private model deployment, vector DBs, prompt orchestration frameworks, and usage monitoring. Lead the evolution toward multi-model support, caching layers, and Bedrock/RAG-native infrastructure.
* Build and scale AI infra: Design training/inference workflows. Spin up model-serving infra on AWS (Bedrock, SageMaker, or container-based). Help make our AI systems observable, secure, and cost-efficient. You’ll apply DevOps instincts to support LLM-powered production systems at scale.
* Architect the future of our AI platform: Build composable infrastructure for experimentation, scale, and optional self-hosting. Define boundaries between orchestration and inference, expose tracing and prompt history, and make systems that the team can iterate on without chaos.
* Champion testing and correctness: Define and enforce robust testing strategies: unit, integration, and load. Design systems that are testable by default, with clear mocks, interface contracts, and fast CI.
* Estimate and scope work: Own delivery for complex features, break them down into milestones, identify hidden risks, and clearly communicate tradeoffs. We don't over-spec; we trust senior engineers to lead the build and help shape the spec.
* Make it observable: Design systems with telemetry: structured logs, metrics, traces, and alerting. Help us make LLM behavior debuggable and traceable, from token usage to prompt mutation.
* Think like a secure infra engineer: We handle sensitive customer data. Make security a first-class concern in system design, including PII handling, IAM design, secrets management, rate limiting, and GDPR-readiness.
* Ship production-ready backend code: You'll work primarily in Node.js/TypeScript, using MongoDB (Mongoose), Redis, and job schedulers (cron/EventBridge).
* Design cloud-flexible infra: Keep our infrastructure cloud-agnostic. We’re AWS-first today, but use modular Terraform so we can pivot to GCP for customer workloads if needed.
* Mentor, review, and raise the bar: Lead code reviews, pair with engineers, and mentor the team. Help reinforce best practices and know when to lean on AI tools (and when not to).
Must-Haves
* 8+ years of backend engineering, with deep experience building distributed systems using Node.js/TypeScript on AWS.
* System design fluency, especially in event-driven, autoscaling architectures.
* Production LLM experience: Shipped features using OpenAI, Claude, or similar APIs. You understand token limits, prompt shaping, cost tradeoffs, and context handling.
* Infra-aware AI developer: You\'ve helped stand up inference infra via Bedrock, SageMaker, or containerized flows and you care about performance, cost, and traceability.
* Testing mindset: You design with testability in mind, and are confident in setting up or extending CI pipelines for coverage across microservices.
* MongoDB & Redis: You can tune indexes, optimize queries, and debug performance issues at the DB level.
* Terraform & Docker: You can bootstrap infra from scratch and debug cloud deployment issues without waiting on a DevOps team.
* Clear communicator: You write well, document clearly, and thrive in async-first workflows.
* Security + compliance basics: Comfortable designing within GDPR/SOC2 requirements, with a good grasp of secure architecture patterns.
* Estimation and delivery: You\'ve scoped, built, and delivered complex backend features with minimal PM handholding.
Nice-to-Haves
* Background in CX, survey, or analytics SaaS.
* Bedrock, LangChain, or RAG experience.
* Experience with LLMOps: prompt versioning, feedback loops, prompt/response telemetry.
* GCP infra exposure or portability experience.
* React familiarity or empathy with frontend engineers.
* Incident response and blameless postmortem experience.
What We Offer
* Competitive salary (band shared at offer stage)
* Standard UK pension
* 20 days holiday + public holidays
* Generous hardware/kit budget
* High autonomy, massive scope
* Personal and Professional Development budget
* Additional Perks
Seniority level
* Mid-Senior level
Employment type
* Full-time
Job function
* Engineering and Information Technology
Industries
* Research Services
#J-18808-Ljbffr