Jobs
My ads
My job alerts
Sign in
Find a job Career Tips Companies
Find

Senior software engineer, ml platform (stability & infrastructure)

London
Isomorphic Labs
Software engineer
€85,000 a year
Posted: 23h ago
Offer description

Senior or Principal Software Engineer, ML Platform (Stability & Infrastructure)


Your Impact

We are building the largest foundation models in biotech and applying them immediately to cure disease. You will play a pivotal role in ensuring the reliability and scalability of the foundations that make this possible.

As a Principal Engineer, you will lead the efforts to harden our systems, ensuring our groundbreaking AI is built on an unshakeable base, working closely with the research team and the Applied ML teams to ensure the infrastructure is stable, reliable and can operate with more data and larger models as we grow.


What You Will Do

* You will own the end-to-end strategy for platform reliability, with a specific focus on our accelerator (GPU/TPU) infrastructure and workload orchestration. You will move between high-level architectural design and hands‑on systems engineering to eliminate friction in the researcher experience.
* Lead the reliability work for our global job scheduler. You will design and implement a robust \"test harness\" to safely validate infrastructure upgrades without impacting live research.
* Architect and optimize our next‑generation inference services. You will solve core scaling limits, ensuring high‑throughput performance and feature parity across our model serving stack.
* Overhaul our logging and monitoring systems to provide radical visibility. You will build proactive alerting and telemetry that identifies systemic failures before they impact research workflows.
* Improve our internal CI/CD stability, targeting a significant reduction in failure rates and significantly faster feedback loops for the engineering organization.
* Contribute to core technical decisions on tooling and architectural design while partnering with science, product, and operations teams to align infrastructure with biotech R&D cycles.


Skills And Qualifications


Essential

* Proven experience in architecting and managing large‑scale AI/ML workloads in a production environment.
* Expertise in cloud compute design, specifically within Google Cloud Platform (GCP).
* Significant experience deploying and managing complex workloads within Kubernetes (GKE).
* Professional familiarity with NVIDIA GPU generations and the intricacies of high‑performance compute.
* Strong programming skills and a \"reliability‑first\" approach to software development.


Nice to Have

* A career history that spans both ML Software Engineering and Infrastructure SRE roles.
* Experience leading multi‑disciplinary projects and navigating complex stakeholder requirements in a fast‑paced environment.
* Familiarity with workload scheduling, ML efficiency research, and hardware benchmarking.
* Experience with Google TPU generations and specialized ML‑driven R&D cycles.


Hybrid Working

It’s hugely important for us to share knowledge and build strong relationships with each other, and we find it easier to do this if we spend time together in person. This is why we follow a hybrid model, and you would require you to be able to come into the office 3 days a week (currently Tuesday, Wednesday, and one other day depending on which team you’re in). If you have additional needs that would prevent you from following this hybrid approach, we’d be happy to talk through these if you’re selected for an initial screening call.


Equal Employment Opportunity

We are committed to equal employment opportunities regardless of sex, race, religion or belief, ethnic or national origin, disability, age, citizenship, marital, domestic or civil partnership status, sexual orientation, gender identity, pregnancy or related condition (including breastfeeding) or any other basis protected by applicable law. If you have a disability or additional need that requires accommodation, please let us know.

#J-18808-Ljbffr

Apply
Create E-mail Alert
Job alert activated
Saved
Save
Similar job
Senior software engineer
Chertsey
Redline
Software engineer
£80,000 a year
Similar job
Software engineer
London
Permanent
Not For Profit People
Software engineer
Similar job
Graduate software engineer - top 4 uk university graduates
London
REVYBE IT RECRUITMENT LIMITED
Software engineer
£70,000 a year
See more jobs
Similar jobs
It jobs in London
jobs London
jobs Greater London
jobs England
Home > Jobs > It jobs > Software engineer jobs > Software engineer jobs in London > Senior Software Engineer, ML Platform (Stability & Infrastructure)

About Jobijoba

  • Career Advice
  • Company Reviews

Search for jobs

  • Jobs by Job Title
  • Jobs by Industry
  • Jobs by Company
  • Jobs by Location
  • Jobs by Keywords

Contact / Partnership

  • Contact
  • Publish your job offers on Jobijoba

Legal notice - Terms of Service - Privacy Policy - Manage my cookies - Accessibility: Not compliant

© 2026 Jobijoba - All Rights Reserved

Apply
Create E-mail Alert
Job alert activated
Saved
Save