Join to apply for the Reliability Engineer role at Abcam.
For over 25 years, Abcam has been providing tools that enable faster breakthroughs in critical areas such as cancer, neurological disorders, infectious diseases, and metabolic disorders. We believe that to continue making progress, we need to work together and bring our unique perspectives to make an impact on the world. This community needs people like you—dedicated, agile and audacious—to truly drive science forward.
Role Summary
We are seeking a highly motivated Reliability Engineer to join our team. As a Reliability Engineer, you will play a crucial role in ensuring the stability, performance, and reliability of our production systems. Your responsibilities will include proactively identifying and resolving technical issues, leading major incident responses, and implementing best practices for system reliability. You will work closely with cross‑functional teams to develop and maintain robust monitoring and automation solutions. This position reports directly to the Global Reliability Manager.
In This Role, You Will Have The Opportunity To
* Shape system reliability at scale by monitoring performance, spotting trends, and preventing issues before they impact users.
* Take charge during critical moments, leading major incident responses and driving rapid service restoration.
* Solve complex problems for the long term, collaborating across teams to implement robust, sustainable solutions.
* Automate and innovate, building tools and processes that streamline operations and reduce manual work.
* Drive continuous improvement, using data insights and post‑incident learnings to make systems more resilient every day.
The Essential Requirements Of The Job Include
* Automation & Scripting: Ability to code repeatable tasks using PowerShell, Bash, or Python, and familiarity with infrastructure‑as‑code tools such as Terraform and configuration management tools such as Puppet.
* Cloud & Infrastructure: Strong knowledge of AWS Cloud services, networking, security, and storage solutions both on‑premises and on the cloud.
* Reliability & Scalability: High‑level understanding of High Availability, Disaster Recovery, scalability solutions, and web infrastructure troubleshooting using logs.
* Monitoring & Incident Management: Proficiency with monitoring dashboards (Grafana, Humio, CloudWatch) and incident management tools like ServiceNow and PagerDuty.
* Database & Pipelines: Good understanding of SQL Server, Oracle, PostgreSQL (including DML), and familiarity with CI/CD pipelines such as GitLab CI.
It would be a plus if you also possess previous experience in
* EKS troubleshooting knowledge
* Application support experience
* Linux OS troubleshooting experience
* Oracle Cloud Infrastructure knowledge
Participate in an on‑call rotation to provide 24/7 support for critical systems and respond to incidents as needed.
Join our winning team today. Together, we’ll accelerate the real‑life impact of tomorrow’s science and technology. We partner with customers across the globe to help them solve their most complex challenges, architecting solutions that bring the power of science to life.
For more information, visit www.danaher.com.
Seniority level
Not Applicable
Employment type
Full‑time
Job function
Engineering and Information Technology
Referrals increase your chances of interviewing at Abcam by 2x.
#J-18808-Ljbffr