Site Reliability Engineer
Company: TwinStream – City Of Bristol, England, United Kingdom
Security Clearance: Eligible for SC/DV Clearance
Base pay range: Not disclosed. Please direct message job poster.
About the role
Our cross-domain services are used in high-profile government organisations. The demand for these services continues to grow in both scope and scale. We are seeking an experienced Site Reliability Engineer to help satisfy that demand. As an SRE, you will be responsible for ensuring the availability, performance and cost effectiveness of these services. You will be working with multiple feature development teams and the BAU/Support team to define and evolve our cloud & on-prem infrastructure & delivery pipelines, improving system observability, demonstrating performance and capacity improvements and proactively identifying and mitigating reliability risks.
Key Responsibilities
* Collaborate with Software Engineers to improve reliability and performance in their subsystems
* Partner with System Administrators in automating toil and eliminating alerts
* Evolve observability and monitoring capabilities to identify and solve problems before they impact the business
* Support development environments to help us achieve our delivery and quality goals
* Research and evaluate technologies, tools and services to influence buy‑vs‑build decisions
* Develop expertise in diverse technical and business domains
* Expand your knowledge of the technical stacks used
Skills & Experience Required
* Experience using modern configuration management tools such as Ansible, Chef or similar
* Experience working with Terraform
* Experience working with docker containers & container orchestration tools such as Kubernetes, OpenShift or Docker Swarm
* Experience both using and maintaining CI/CD tools such as Jenkins or similar
* Experience with monitoring tools such as InfluxDB, Prometheus or Grafana
* Experience in event‑driven integration with MQ messaging (RabbitMQ or similar AMQP solution)
* Good understanding of relational databases and SQL
* Linux command line, administration and shell scripting
* Working knowledge of network security protocols
* Experience using, developing with and maintaining cloud hosting services (ideally AWS EC2, RDS, S3, Lambda)
* Industry experience writing well‑tested code in one of our platform languages (Java, Go, Python or similar)
* Knowledge of cross domain principles & technologies
* Experience of working in a service management environment
* Practical applications of using observability patterns in previous systems
* Creating and monitoring system availability metrics and using those to drive work that reduces downtime
Seniority level
* Mid‑Senior level
Employment type
* Full‑time
Job function
* Information Technology
* IT Services and IT Consulting
* IT System Operations and Maintenance
Referrals increase your chances of interviewing at TwinStream by 2x.
#J-18808-Ljbffr