Social network you want to login/join with:
Operations Site Reliability Engineer, Bristol
Client: Recognition One
Location: Bristol, United Kingdom
Job Category: Other
EU work permit required: Yes
Job Views: 4
Posted: 08.05.2025
Expiry Date: 22.06.2025
Job Description:
Face a variety of demanding technical challenges across diverse disciplines, working directly with one of our largest and most influential clients to make a significant impact. This unique opportunity will unveil new possibilities in a rapidly evolving field. Are you an expert troubleshooter with a passion for innovation? This could be your chance.
The position is the last line of infrastructure support, far beyond technical customer support. It’s about solving the trickiest problems in the business that directly impact thousands of users within the largest global companies. You’ll often coordinate with product engineering and external partners like Google Cloud, as well as write automation and documentation to enable others to fix recurring problems.
Primary Responsibilities:
* Be part of a critical operations team responsible for monitoring, availability, and performance of production services.
* Respond to stakeholder requests within agreed timescales or SLOs.
* Drive automation to reduce failures, manual tasks, and improve overall application performance and availability.
* Perform systems administration activities to ensure smooth operation of applications across multiple platforms.
* Coordinate and communicate with impacted stakeholders per incident management process.
* Demonstrate ownership of events and incidents until resolution.
* Conduct daily shift handovers to peers and management across multiple geographies.
* Support maintenance activities impacting production applications.
* Support critical systems handling sensitive and proprietary data.
* Create, maintain, and update troubleshooting and support documentation.
* Contribute to planning of application/infrastructure releases and configuration changes.
* Administer and maintain all production environments.
* Patching and upgrading existing applications.
* Provide feedback and coaching to upstream teams (internal and vendors) to reduce escalations and improve customer experience.
Professional Experience Required:
* A degree in Systems Engineering, Computer Science, or related fields with relevant experience preferred.
* 5+ years of experience administering Linux systems.
* Hands-on experience with various Linux distributions.
* 2+ years operational experience with AWS or Google Cloud Platform.
* Experience with automation platforms to automate repetitive tasks.
* Familiarity with deployment tools such as Ansible Tower and Jenkins.
* Experience deploying to large, global infrastructure.
* Proficiency with orchestration/configuration tools like Ansible and Terraform.
* Strong knowledge of networking, packet tracing, latency, and throughput issues.
* Thorough understanding of HTTP(S), SMTP, TLS/SSL, DNS, LDAP, Kubernetes, and Docker.
* Experience in system/application administration in high-availability, large-scale environments.
* Proficiency in at least one scripting language such as Perl, shell, Ruby, or Python.
* Experience tuning and optimizing monitoring systems.
Personal Skills:
* A strong team player, quick to learn new technologies, adaptable, with a focus on delivery.
* Excellent troubleshooting and problem-solving skills.
* Ability to work calmly under pressure.
* Interest in security.
* Effective communicator at all organizational levels.
This role includes participation in weekend and holiday on-call support as required.
#J-18808-Ljbffr