Site Reliability Engineer role at N Consulting Global, located in Hove, UK. Hybrid work mode. The SRE will drive modernization of IT operations through observability practices and automation to ensure scalability, reliability, and efficiency.
Primary Responsibilities
* Work closely with the Product Engineering team to modernize IT operations and enhance observability.
* Architect and deploy observability platforms to monitor system health, performance, and reliability.
* Lead AI‑driven alerting and proactive anomaly detection initiatives to reduce MTTD and MTTR.
* Develop and enforce SRE best practices, including SLOs, SLIs, and error budgets.
* Establish an AIOPS roadmap to improve operational efficiency.
* Automate repetitive tasks using scripting, orchestration tools, and AI/ML solutions.
* Drive incident management and root‑cause analysis processes through automation for continuous improvement.
* Collaborate cross‑functionally to ensure systems are scalable, resilient, and maintainable.
* Mentor teams on adopting SRE principles and tools.
* Advocate for a culture of reliability, automation, and continuous improvement.
Key Skills
* Site Reliability Engineering principles.
* Observability with Dynatrace and Datadog.
* Automation and scripting with Python and Ansible.
* Cloud platforms: AWS and Azure.
* Containerization and orchestration: Docker and Kubernetes.
* Cloud native distributed systems and microservices architecture.
* AI/ML techniques for predictive analytics and automated problem resolution.
* CI/CD pipelines and automated release and deployment engineering solutions.
* Chaos engineering tools such as Gremlin or Chaos Monkey.
* Prioritization and communication skills.
Preferred Qualifications
* 12+ years of experience in IT operations, SRE, or DevOps roles.
* Proven track record of SRE experience in implementing observability and automation solutions in large‑scale environments.
* Certifications in cloud platforms, observability tools, or other SRE related areas.
#J-18808-Ljbffr