Site Reliability Engineer | Cambridge (Hybrid) | Competitive Salary + Benefits
We’re hiring for a well-established, fast-moving tech company based in Cambridge that’s developing complex, high-availability software used globally. They’re scaling up and looking to bring in a Site Reliability Engineer to help ensure their platform stays reliable, secure, and fast—even as it grows.
You’ll join a small, close-knit team working across infrastructure, tooling, and DevOps practices—solving interesting challenges and improving how their systems run at scale.
🔧 What you’ll be doing:
Investigating and resolving tricky issues across distributed systems
Root cause analysis and implementing long-term fixes
Building tools and automation to boost reliability, performance, and scalability
Supporting production environments and helping respond to incidents
Working closely with developers to improve how apps are deployed and monitored
Tech stack includes Python, Linux, AWS, Ansible, Prometheus, PostgreSQL, Node.js, Elasticsearch✅ What they’re looking for:
Strong problem-solving mindset with attention to detail
Experience working with cloud-based infrastructure and modern DevOps tools
Good understanding of how web applications are built and run
Clear communication and ability to work well in fast-moving teams
A technical degree (2:1 or above), from a top university💬 Bonus if you have:
Hands-on experience with any of the following: AWS, Node.js, Prometheus, PostgreSQL, Python, Ansible
A passion for digging into complex systems and making them better