Who we are looking for
A Junior Site Reliability Engineer, who will improve system reliability, observability and performance through strong engineering and assist with incident resolution and operational excellence.
Supported by our site reliability engineering team, you will work to integrate reliability and observability practices into the Software Development Life Cycle (SDLC). With support from central teams, you will help foster a culture where these principles are integral to development. Your contributions will ensure our systems meet user demands and enhance overall performance.
You will ensure the health, performance and availability of critical systems, directly impacting operational efficiency. Using your engineering expertise, you will implement solutions that enhance reliability, including instrumentation with tools such as OpenTelemetry, improving logging practices, and developing features for maintainability. You will also assist in creating tools and automation for effective service management.
This role is eligible for inclusion in the Company’s hybrid working from home policy.
Preferred Skills, Qualifications and Experience
* Passion for contemporary software development practices.
* Production experience, or demonstrable knowledge of Python or Golang.
* Keen interest in industry trends, particularly platform engineering.
* Interest in automation and orchestration platforms such as Ansible and Jenkins.
* Ability to problem solve, with excellent verbal and written communication skills.
* Strong team player.
Main Responsibilities
* Developing bespoke in house tooling using a range of technologies to aid IT Operations colleagues in completion of their duties.
* Working with automation and orchestration platforms to automate manual activity and reduce toil.
* Building sophisticated dashboards using a range of telemetry data and dashboarding technologies such as Grafana, Splunk and New Relic.
* Maintaining and administering existing monitoring and analytic toolsets.
* Working with IT Operations to provide and support the use of critical tooling that will enable increasing levels of value to the Business.
* Driving initiatives to enhance system reliability and observability, both within the team and across the department, fostering a culture of continuous improvement.
“By applying to us you are agreeing to share your Personal Data in accordance with our Recruitment Privacy Policy - http://www.bet365careers.com/privacypolicy.pdf"