Site Reliability Engineer (SRE) / Platform Engineer
Role Purpose
As part of a cross-functional team, you will design, implement, and maintain secure, scalable, and highly available infrastructure and applications. This role focuses on diagnosing complex system issues, refining monitoring and reporting, and driving best practices across infrastructure and reliability engineering.
Key Responsibilities
1. Collaborate with cross-functional teams to implement internal applications effectively.
2. Diagnose and resolve complex system performance and reliability issues.
3. Enhance system monitoring and reporting capabilities.
4. Identify and implement architectural and operational best practices in partnership with other SREs and engineering teams.
5. Apply Infrastructure-as-a-Service (IaaS) principles and implementation methods.
6. Use diagnostic tools (dumps, traces, logs) to troubleshoot complex problems.
7. Foster innovation by encouraging and managing new ideas within the team.
Preferred Skills & Experience
8. Bachelor’s degree in Computer Science, Mathematics, or equivalent experience.
9. Strong technical knowledge of Linux systems.
10. Proficiency in at least one high-level programming language (, Python).
11. Experience with relational databases (, Postgres, SQL Server).
12. Proven ability to design, build, and maintain high-performance, highly available environments, preferably on AWS (VPCs, security groups, RDS, S3, EC2, ECS, EKS).
13. Understanding of security engineering and best practices.
14. Scripting skills in bash or similar.
15. Advanced knowledge of configuration management tools (, Puppet, Chef, Ansible).
16. Expertise in CI/CD best practices and tooling.
17. Proficiency with Git.
18. Containerisation experience (Docker on Linux).
19. Advanced knowledge of Infrastructure as Code tools (Terraform, CloudFormation, Ansible).
20. Monitoring and observability experience with tools like Grafana, Elastic, StatusCake, PagerDuty.
21. Strong ability to align technical work with business objectives.
Desirable Skills
22. Familiarity with Agile delivery methodologies.
23. Deep understanding of major system components (Linux, Networking, Storage, Databases).
24. Knowledge of security tooling and practices.
25. Ability to communicate technical concepts clearly to non-technical audiences.
26. Experience administering scalable, cloud-native applications.
Location
Sutton, UK
Trading as TEKsystems. Allegis Group Limited, Maxis 2, Western Road, Bracknell, RG12 1RT, United Kingdom. No. 2876353. Allegis Group Limited operates as an Employment Business and Employment Agency as set out in the Conduct of Employment Agencies and Employment Businesses Regulations 2003. TEKsystems is a company within the Allegis Group network of companies (collectively referred to as "Allegis Group"). Aerotek, Aston Carter, EASi, Talentis Solutions, TEKsystems, Stamford Consultants and The Stamford Group are Allegis Group brands.