Site Reliability Engineer (DevOps)
*UK Enhanced DV clearance essential*
Start: ASAP
Duration: initial 12-month contract
Pay: inside IR35, negotiable
Location: full time on site in central London (5-days in office)
Role Description:
In this role you’ll be at the forefront of delivering enhanced reliability, performance, and quality to a key national security customer. Joining a growing team, you’ll help create a culture of continuous improvement and play a pivotal role in revolutionising how systems are developed and supported. This role combines operational support with software engineering, allowing you to design tools and applications that monitor and improve system health. As part of a wider programme, you'll be integral to supporting the customer's critical mission.
Key Responsibilities:
* Support and maintain critical services, enhancing the availability, performance, and stability of core mission applications.
* Participate in the 24/7 on-call rota (one week in 5 with overtime rate TBC), supporting production systems outside business hours, with additional on-call allowances and overtime benefits.
* Focus on automation to reduce manual operations work (e.g. incident tickets, on-call) to improve efficiency.
* Collaborate with development teams, advising on best practices for system design and implementation.
* Design and deploy monitoring tools to provide intelligent insights into system health, customising tools where necessary.
* Understand the relationship between software and infrastructure, ensuring systems are scalable and resilient to failure.
* Participate in the wider DevOps/SRE community, sharing knowledge and best practices across the organisation.
Key Skills & Experience:
* Experience or enthusiasm for software development in web technologies and object-oriented programming.
* Familiarity with database technologies such as Oracle SQL, MongoDB, or Postgres.
* Proficiency with Linux and Windows command lines (e.g. Bash, PowerShell).
* Experience with monitoring large systems using tools like Grafana, Prometheus, ELK, and Splunk.
* Knowledge of Agile methodologies and tools like Atlassian.
* Strong troubleshooting skills across various levels of the application stack.
* Familiarity with ITIL processes.
* Experience with microservices architectures and container platforms like Docker, Kubernetes, and OpenShift.
* A passion for learning new technologies and solving complex problems.
* Awareness of emerging tech trends and tools in the SRE space.
Interested in this role? Please apply directly to this advert with an updated CV to be considered for the role.