Position
Site Reliability Engineer (SRE) | Sunderland (Hybrid) | Full-time
Responsibilities
* System Reliability & Availability Hero: Guardian of uptime, ensuring critical systems always available, meeting SLAs. Lead incident management, investigate root causes.
* Monitoring & Alerting Maestro: Set up and maintain monitoring systems (Dynatrace). Create alerting that preemptively detects problems and define key metrics for system health.
* Incident Response Ace: Resolve incidents quickly to minimize downtime. Conduct root cause analysis to prevent recurrence.
* Automation Whizz: Automate repetitive tasks using Terraform, Git, TeamCity; build efficient CI/CD pipelines.
* Capacity Planning Pro: Scale systems to meet demand, optimize resource usage, forecast future needs.
* Performance Optimiser: Tune databases, improve response times, run load and stress tests to handle peak periods.
* Infrastructure Guru: Manage AWS cloud resources, ensure scalability, cost‑effectiveness, resilience, and develop disaster recovery plans.
* Collaboration King/Queen: Work closely with development teams, embed reliability into new features, champion service ownership, provide feedback for operational improvement.
* Security & Compliance Captain: Integrate security best practices, ensure adherence to regulations, protect production environments.
* Documentation Dynamo: Produce clear, concise documentation for infrastructure, procedures, runbooks.
* Continuous Improvement Enthusiast: Seek new technologies and improved practices to enhance reliability, performance, and efficiency.
If you are an experienced SRE who thrives on building reliable, scalable, and efficient systems and enjoys a collaborative environment, we would like to hear from you.
#J-18808-Ljbffr