Senior Site Reliability Engineer (Observability)
Location: London/UK (Remote)
Contract: 12 Months Initial
Day rate : £55 Per Hour - £62 Per Hour Inside IR35
Job Overview
We are looking for a Senior Site Reliability Engineer with strong experience in Observability, Monitoring and Distributed Systems to support large-scale cloud infrastructure supporting millions of devices globally. The role focuses on building and scaling monitoring, logging and alerting platforms to ensure high availability and performance of cloud services.
Responsibilities
* Design, deploy and scale observability platforms
* Manage and scale Prometheus monitoring systems
* Deploy and maintain large Elasticsearch clusters
* Build and maintain data pipelines using Kafka
* Develop alerting and monitoring frameworks
* Automate infrastructure using Terraform and Ansible
* Develop tools and scripts using Python, Go, Ruby or Bash
* Work with Linux systems (Debian/Ubuntu)
* Participate in on-call rotation
* Improve system reliability, performance and scalability
Required Skills
* 5+ years experience in Site Reliability Engineering / DevOps
* Strong Linux systems experience
* Observability and Monitoring tools experience
* Prometheus, Grafana, ELK Stack (Elasticsearch, Logstash, Kibana)
* Kafka
* Terraform / Infrastructure as Code
* Ansible / Configuration Management
* Programming experience (Python, Go, Ruby or Bash)
* Distributed systems and cloud infrastructure experience
This is an urgent vacancy where the hiring manager is shortlisting for an interview immediately. Please apply with a copy of your CV or send it khushboo. Co. uk
Randstad Technologies is acting as an Employment Business in relation to this vacancy.