Jobs
My ads
My job alerts
Sign in
Find a job Career Tips Companies
Find

Senior site reliability engineer - linux / hpc

London
Microsoft
Site reliability engineer
Posted: 4 October
Offer description

Senior Site Reliability Engineer - Linux / HPC

Join to apply for the Senior Site Reliability Engineer - Linux / HPC role at Microsoft

Microsoft is on a mission to empower every person and every organization on the planet to achieve more, and the Azure cloud is at the forefront of this mission.

Are you interested in working for one of the most exciting products in Microsoft Azure, passionate about exceeding customer expectations and advancing Microsoft's cloud first strategy? Azure Customer Experience (CXP) team is searching for a customer obsessed Customer Experience Engineer to work on a HPC environment, that can drive reliability engineering excellence and embody our culture of inclusiveness, growth-mindset, and unwavering dedication to diversity.

We are a fast-paced agile team in a start-up like culture where you are empowered to help shape the future. Our “no dead-ends”, “whatever it takes”, “biased for action”, “make it better than ever” philosophy ensures that every customer can realize their full potential through the Microsoft Cloud.

Responsibilities

* Collaborating closely with the existing Engineering teams on building and enhancing tooling and automation solutions for faster resolution of issues impacting SLO’s and averting incidents altogether when possible.
* Collaborating with the customers to understand their pain points around Supportability and SLO attainment and formulate strategies for addressing recurring issues in a sustainable way.
* Communicate on a deeply technical level and be the single point of contact for interfacing with a large enterprise customer, for handling service escalations and driving the issues to resolution.
* Ability to design and implement any changes to service telemetry for the automation to consume if it is not already available.
* Enhancing customer facing experience by proactive alerting based on utilisation, trends, resource health, etc.
* Analyse data and provide operational insights into customer experience to Design and Product teams, so that we can design features with Supportability in mind.

Qualifications

* In-depth technical experience in software engineering, network engineering, or systems administration
* Operational experience in improving Service Reliability, Availability and Performance
* Ability to deal with the ambiguity associated with working in a fast-paced environment
* Systematic problem-solving approach, coupled with effective communication skills and a sense of curiosity
* Expertise in analysing, troubleshooting, and automating root cause analysis and mitigation of incidents impacting large-scale distributed systems.
* Ability to travel to customer site on a regular basis in South West UK

Microsoft is an equal opportunity employer. Consistent with applicable law, all qualified applicants will receive consideration for employment without regard to age, ancestry, citizenship, color, family or medical care leave, gender identity or expression, genetic information, immigration status, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran or military status, race, ethnicity, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable local laws, regulations and ordinances.

#J-18808-Ljbffr

Apply
Create E-mail Alert
Job alert activated
Saved
Save
Similar job
Site reliability engineer
London
Jam Recruitment
Site reliability engineer
Similar job
Site reliability engineer
London
Fruition Group
Site reliability engineer
Similar job
Site reliability engineer, cloud security
London
Miro
Site reliability engineer
See more jobs
Similar jobs
Microsoft recruitment
Microsoft jobs in London
Engineering jobs in London
jobs London
jobs Greater London
jobs England
Home > Jobs > Engineering jobs > Site reliability engineer jobs > Site reliability engineer jobs in London > Senior Site Reliability Engineer - Linux / HPC

About Jobijoba

  • Career Advice
  • Company Reviews

Search for jobs

  • Jobs by Job Title
  • Jobs by Industry
  • Jobs by Company
  • Jobs by Location
  • Jobs by Keywords

Contact / Partnership

  • Contact
  • Publish your job offers on Jobijoba

Legal notice - Terms of Service - Privacy Policy - Manage my cookies - Accessibility: Not compliant

© 2025 Jobijoba - All Rights Reserved

Apply
Create E-mail Alert
Job alert activated
Saved
Save