Jobs
My ads
My job alerts
Sign in
Find a job Career Tips Companies
Find

Platform site reliability engineer - member of technical staff

London
Callosum
Site reliability engineer
€70,000 a year
Posted: 12 June
Offer description

About Us

The last era of AI scaled on a single bet: bigger models, more identical chips, more data. As problems grow more complex and the requirements of intelligence more diverse, that bet is breaking down. Real-world problems are heterogeneous: no single model or chip can solve them alone. The next era of AI requires heterogeneity at the infrastructure level - diverse models on diverse chips, each with distinct strengths, co-evolving into systems of capability that move the Pareto frontier of what is possible. That's what we are building.

Callosum is the Intelligent Systems Company. We started from questioning what actually creates intelligence. We believe there is no single answer, but rather a system-level solution. We co-evolve models, workflows, and silicon together to show that intelligence does not come from a single component, but it emerges from the diversity of co-optimised mechanisms working together and aware of each other. Heterogeneity will define the next era of compute, and is a principle that holds in biological, neuronal, and economic systems alike.

In early 2026 we launched with results showing orders of magnitude improvements in performance, and this is only the beginning. Agentic AI is the future of how intelligence is deployed: multi-step, long-horizon, and operating in changing environments. These systems are inherently heterogeneous, and can only be as powerful as the infrastructure that runs them.

We are engineers and scientists based in London, working together across the full depth of the stack. We are curious, intellectually honest, and building what doesn't exist yet. If you thrive on uncharted territory and are energised by the scale of the challenge, we'd love to hear from you.


About the Role

Our platform is the production system our customers route real traffic through. When it degrades, their product degrades. You will own its operational health end-to-end: SLOs, observability, incident response, deployment discipline, and capacity planning across heterogeneous compute backends. As the platform scales to millions of concurrent requests across heterogeneous compute, the work shifts from building the operational foundation to defending it under conditions most teams never encounter.

You’ll define "production-grade" means for a platform at the centre of a fast-growing company, and own it end to end. The reliability practice is yours to build. You will work closely with the platform team, setting the technical direction. You will work closely with the hardware and orchestration teams to expose heterogeneous backends reliably through the platform.


Who You'll Build

* Service-level objectives, monitoring, alerting, and observability.
* On-call and incident response: runbooks, escalation, blameless postmortems, and follow-through.
* Capacity planning and the operational side of running across heterogeneous compute backends.


What You Bring

* Strong SRE or production-engineering background running customer-facing systems at scale.
* Fluency with modern operational tooling: observability stacks, container orchestration, infrastructure-as-code, CI/CD.
* Experience owning incident response and driving reliability improvements.
* Compliance execution experience.


What Sets You Apart

* Open-source contributions to relevant infrastructure, or production systems whose scale and complexity you can speak to in detail
* Early-stage and AI-native company experience.


What We Offer

* Competitive Salary, determined by skills and experience
* Equity & Ownership
* Private healthcare
* We offer Visa sponsorship and relocation benefits to hire the best in the world
* We work in person at our London office. You'll have the tools, space and setup to do your best work, and if you have specific needs, just tell us

We're committed to building an inclusive workplace where everyone feels welcome, and believe in equal opportunities for all.

#J-18808-Ljbffr

Apply
Create E-mail Alert
Job alert activated
Saved
Save
Similar job
Senior site reliability engineer
London
Partnerize
Site reliability engineer
€90,000 a year
Similar job
Devops/site reliability engineer, junior/mid/senior (m/f/*)
London
Quaisr Limited
Site reliability engineer
€67,500 a year
Similar job
Site reliability engineer (sre) - data platform in london - apple inc.
London
Golang Works
Site reliability engineer
See more jobs
Similar jobs
Engineering jobs in London
jobs London
jobs Greater London
jobs England
Home > Jobs > Engineering jobs > Site reliability engineer jobs > Site reliability engineer jobs in London > Platform Site Reliability Engineer - Member of Technical Staff

About Jobijoba

  • Career Advice
  • Company Reviews

Search for jobs

  • Jobs by Job Title
  • Jobs by Industry
  • Jobs by Company
  • Jobs by Location
  • Jobs by Keywords

Contact / Partnership

  • Contact
  • Publish your job offers on Jobijoba

Legal notice - Terms of Service - Privacy Policy - Manage my cookies - Accessibility: Not compliant

© 2026 Jobijoba - All Rights Reserved

Apply
Create E-mail Alert
Job alert activated
Saved
Save