LocationHove East Sussex, England., United Kingdom# Site Reliability Engineer (SRE) at N Consulting LtdLocationHove East Sussex, England., United KingdomSalary£80000 - £85000 /yearJob TypeContractDate PostedNovember 20th, 2025Apply Now**Role : Site Reliability Engineer (SRE)****Location: Hove, UK**Work Mode : Hybrid SRE will play a pivotal role in driving the modernization of IT operations by implementing observability practices and automating toil. This position requires a deep understanding of Site Reliability Engineering (SRE) principles, modern observability tools, and automation techniques to ensure scalability, reliability, and efficiency in IT systems. This role requires a strategic thinker with hands-on expertise who can lead modernization efforts while fostering a culture of reliability and innovation.**Primary Responsibilities:*** Work closely with Product Engineering team and implement strategies for modernizing IT operations enhancing observability and toil reduction.* Architect and deploy observability platforms to monitor system health, performance, and reliability effectively.* Propose & drive strategies for AI-driven alerting and proactive anomaly detection to reduce MTTD & MTTR.* Develop and enforce SRE best practices, including Service Level Objectives (SLOs), Service Level Indicators (SLIs), and Error Budgets.* Establish & create AIOPS roadmap for improving operational efficiency.* Lead efforts to automate repetitive tasks (toil) using scripting, orchestration tools, and AI/ML-based solutions.* Drive toil automation initiatives for automated incident responses & self-healing automation for achieving autonomous operations.* Collaborate with cross-functional teams to ensure systems are scalable, resilient, and maintainable.* Drive incident management and root cause analysis processes through automation, ensuring continuous improvement to enable autonomous operations.* Partner with engineering, architecture, and product teams to enable shift-left engineering practices ensuring reliability.* Mentor and guide teams on adopting SRE principles and tools.* Advocate for a culture of reliability, automation, and continuous improvement across the organization.**Key Skills:*** Strong expertise in implementing Site Reliability Engineering (SRE) principles.* Advanced knowledge of establishing observability using tools – **Dynatrace & Datadog (primary skills).*** Proficiency in automation & scripting using **Python** & **Ansible** **(primary skills).*** Strong experience with cloud platforms – **AWS** & **Azure (primary skills).*** Solid understanding of containerization and orchestration tools like **Docker** and **Kubernetes**.* Proficiency in cloud native distributed systems & microservices architecture.* Exposure to AI/ML techniques for predictive analytics and automated problem resolution.* Familiarity with CI/CD pipelines & enabling automated release & deployment engineering solutions.* Good to have experience with chaos engineering tools like **Gremlin** or **Chaos Monkey** and implementing automation frameworks for resilience tracking.* Ability to manage and prioritize multiple projects in a fast-paced environment.* Strong interpersonal and communication skills to work effectively across teams.* Excellent problem solving, analytical thinking, and adaptability.* Strategic mindset balancing engineering excellence with business priorities. **Preferred Qualifications:*** 12+ years of experience in IT operations, SRE, or DevOps roles.* Proven track record of SRE experience in implementing observability and automation solutions in large-scale environments.* Certifications in cloud platforms, observability tools & other SRE related areas. #J-18808-Ljbffr