Social network you want to login/join with:
Site Reliability Engineer, Preston, Lancashire
Client: Ranger Technical Resources
Location: Preston, Lancashire, United Kingdom
Job Category: Other
EU work permit required: Yes
Job Views: 2
Posted: 31.05.2025
Expiry Date: 15.07.2025
Job Description:
Site Reliability Engineer #2494
Position Summary:
Our partner, an innovative PaaS company specializing in remote monitoring and network management solutions, is looking for a Site Reliability Engineer to help ensure the reliability, scalability, and performance of critical infrastructure and applications. In this role, you’ll build and maintain highly available systems, support and optimize CI/CD pipelines, and determine optimal solutions for the company’s products. You’ll collaborate with development, DevOps, and other teams to maintain high uptime, security, and user experience standards for millions of endpoints.
Experience and Education:
* Bachelor's or higher degree in Computer Science, Information Systems, Information Technology, or a related technical field/experience.
* 7+ years of experience in Site Reliability Engineering, DevOps, Infrastructure, or related roles.
* Deep understanding of AWS and its modules and services.
* Strong background in Linux administration and troubleshooting.
* Experience in implementing and managing CI/CD pipelines and Infrastructure as Code (IaC) solutions.
* Experience with monitoring and observability tools to proactively manage system health.
Skills and Strengths:
* AWS (Amazon Web Services)
* Auto Scaling
* Fargate
* Route53
* Observability tools (New Relic, DataDog, Splunk)
* Scripting (Ansible, Bash, Python, Go)
* CI/CD
Primary Job Responsibilities:
* Design and support EC2/ECS/EKS/Fargate environments for high availability and fault tolerance.
* Implement advanced AWS features (Route53, ALB/NLB, multi-region setups) to ensure global reliability.
* Maintain and optimize existing CI/CD pipelines and deployment processes to streamline software delivery and reduce risks.
* Collaborate with Development, QA, and DevOps teams to integrate best practices into build and release processes.
* Implement, manage, and improve monitoring tools to proactively detect and resolve system issues.
* Administer and optimize Linux-based servers and applications for stability, performance, and security.
* Implement and manage containerization solutions to enhance scalability and efficiency.
* Apply security best practices across AWS environments, ensuring compliance and safeguarding infrastructure.
* Develop automated incident response mechanisms and self-healing solutions to minimize downtime.
* Diagnose and resolve infrastructure, networking, and application performance issues.
* Design and maintain backup, failover, and disaster recovery strategies to ensure business continuity.
* Create real-time monitoring dashboards and alerting systems for system health and performance tracking.
* Work with development teams to optimize infrastructure for cost efficiency and performance.
#J-18808-Ljbffr