Senior Site Reliability Engineer
Role Description: As a Senior Site Reliability Engineer you will play a pivotal role in raising awareness and driving adoption of SRE methodologies. This is a hands‑on engineering role where you will design, build, and optimise automation frameworks, observability tools, and incident response mechanisms. Engaging with storage, data, and other product teams. You will act as a trusted advisor, providing strategic guidance and consultative support to help teams improve reliability, scalability, and efficiency.
This team will establish a Centre of Excellence to enhance and promote SRE best practices.
* Proficiency in Programming and Scripting – Expertise in Python, PowerShell, or Go for automating routine tasks and system deployments.
* Incident Management and Troubleshooting – Manage incidents effectively, troubleshoot issues swiftly, and perform root cause analysis to prevent future incidents.
* Systems Engineering and Automation – Deep understanding of systems engineering, operating systems, networking, and cloud infrastructure. Proficiency in automation tools is crucial for maintaining system reliability at scale.
* Influential Communication Skills – Communicate effectively with team members and stakeholders, ensuring alignment, inspiring and motivating them to embrace new mindsets, cultures, and SRE working practices. This skill is crucial for driving meaningful change and fostering a collaborative environment where innovative ideas can thrive.
* Knowledge of Cloud Computing – Familiarity with cloud platforms and services, which is increasingly important as more infrastructure moves to the cloud.
* Strong Problem‑Solving Abilities – Approaching problems methodically and finding effective solutions, vital for maintaining system reliability.
Purpose of the role: To apply software engineering techniques, automation, and best practices in incident response to ensure the reliability, availability, and scalability of the systems, platforms, and technology through them.
* Availability, performance, and scalability of systems and services through proactive monitoring, maintenance, and capacity planning.
* Resolution, analysis and response to system outages and disruptions, and implement measures to prevent similar incidents from recurring.
* Development of tools and scripts to automate operational processes, reducing manual workload, increasing efficiency, and improving system resilience.
* Monitoring and optimisation of system performance and resource usage, identifying and addressing bottlenecks, and implementing best practices for performance tuning.
* Collaboration with development teams to integrate best practices for reliability, scalability, and performance into the software development lifecycle, and work closely with other teams to ensure smooth and efficient operations.
* Stay informed of industry technology trends and innovations, and actively contribute to the organization's technology communities to foster a culture of technical excellence and growth.
* Contribute or set strategy, drive requirements and make recommendations for change.
* Plan resources, budgets, and policies; manage and maintain policies/processes; deliver continuous improvements and escalated breaches of policies/procedures.
Seniority level
* Associate
Employment type
* Contract
Job function
* Information Technology
Referrals increase your chances of interviewing at eTeam by 2x
Get notified about new Senior Site Reliability Engineer jobs in Glasgow, Scotland, United Kingdom.
#J-18808-Ljbffr