Senior SRE/DevOps Engineer role sits within Arm's Global Storage team, supporting large-scale storage platforms used by engineering and HPC workloads across on-premises and cloud environments. This role focuses on making storage platforms reliable, observable, and easier to operate. It includes reducing manual work through automation, building practical tooling, and helping teams use storage services effectively at scale. Working with colleagues across multiple regions, the role contributes to resolving issues, addressing root causes, and maintaining stable, well-performing systems that support Arm's technology development.
Responsibilities
* Maintain the reliability, availability, and performance of storage platforms used by engineering teams.
* Contribute to incident response, investigation, and problem resolution.
* Apply service reliability measures such as SLOs and SLIs where appropriate.
* Build and maintain infrastructure using Terraform and Ansible.
* Develop automation and Python-based tools to support operations and system insight.
* Use AI-based tooling to assist with monitoring, anomaly detection, and analysis.
* Develop simple agent-based workflows to support operational decision-making.
* Enhance monitoring and alerting to provide clear visibility of system behaviour.
* Work with engineering and security teams to maintain secure and well-managed systems.
* Maintain accurate documentation and share knowledge across the team.
Qualifications
* Experience working with production systems using DevOps or similar engineering practices.
* Experience with Infrastructure as Code tools such as Terraform or configuration tools such as Ansible.
* Ability to develop automation or tooling using a programming language such as Python.
* Experience supporting reliable and scalable systems in an operational environment.
Nice-to-Have Skills and Experience
* Experience with large-scale storage platforms (file or object) or HPC environments.
* Familiarity with AWS, GCP, or Azure.
* Exposure to CI/CD or Git-based workflows.
* Experience using or integrating AI/ML or agent-based tooling in operations.
* Understanding of identity, access control, and security practices.
* Experience with platforms such as LakeFS.
* Awareness of service management approaches (e.g. ITIL).
Equal Opportunity
Arm is an equal opportunity employer, committed to providing an environment of mutual respect where equal opportunities are available to all applicants and colleagues. We are a diverse organization of dedicated and innovative individuals, and don't discriminate on the basis of race, color, religion, sex, sexual orientation, gender identity, national origin, disability, or status as a protected veteran.
Benefits
Salary Range: £73,500 - £99,500 per year
#J-18808-Ljbffr