Jobs
My ads
My job alerts
Sign in
Find a job Career Tips Companies
Find

Site reliability engineer

Stanford Rivers
Ziprecruiter
Site reliability engineer
Posted: 18h ago
Offer description

Job Description

Job Title: Site Reliability Engineer (SRE)

Location: New Jersey / New York

Tenure: 3 Years

Position Overview:

We are seeking an experienced Site Reliability Engineer (SRE) to join our team supporting Goldman Sachs. This role will focus on developing automation tools, improving operational efficiency, and ensuring infrastructure reliability. Key areas of responsibility include capacity management, SDLC support, observability, and incident management.

Key Responsibilities:

1. Infrastructure Capacity Management
2. Forecast demand and conduct capacity planning across application infrastructure.
3. Continuously optimize resource utilization.
4. Maintain production environments and manage Business Continuity Planning (BCP).
5. Define acceptable downtime or failure thresholds to ensure high availability and resiliency.
6. Build and maintain SRE infrastructure including tools, scripts, and integration with core engineering platforms (e.g., Prometheus, Grafana).

Automation Process Improvement

1. Develop and maintain automation tools to streamline infrastructure management.
2. Reduce manual intervention through process automation.

Observability Monitoring

1. Define metrics and thresholds; build frameworks to capture metrics, trends, and generate alerts.
2. Monitor alerts generated by the observability framework and coordinate remediation based on pre-agreed schedules.

Incident Management

1. Serve as a bridge between support and engineering teams to improve incident response.
2. Manage incident follow-ups, post-mortems, and implement corrective actions.

Performance Reliability

1. Work with engineering teams to optimize system performance based on continuous feedback.
2. Ensure adherence to Service Level Objectives (SLOs) and Service Level Indicators (SLIs).

Qualifications:

1. 5+ years of SRE experience (senior role); 3+ years SRE experience acceptable for supporting roles.
2. Strong expertise in infrastructure management, observability tools (Prometheus, Grafana), and automation scripting.
3. Hands-on experience in capacity planning, performance optimization, and incident response.
4. Proven track record in building and maintaining high-availability systems.
5. Strong collaboration skills with cross-functional teams.
#J-18808-Ljbffr

Apply
Create E-mail Alert
Job alert activated
Saved
Save
Similar job
Site reliability engineer, ml infrastructure, large models sre
London
Google
Site reliability engineer
Similar job
Site reliability engineer
London
Wheely
Site reliability engineer
Similar job
Staff site reliability engineer, emea
London
Ditto
Site reliability engineer
See more jobs
Similar jobs
Ziprecruiter recruitment
Ziprecruiter jobs in Essex
Engineering jobs in Essex
jobs Essex
jobs Stanford Rivers
jobs England
Home > Jobs > Engineering jobs > Site reliability engineer jobs > Site reliability engineer jobs in Essex > Site Reliability engineer

About Jobijoba

  • Career Advice
  • Company Reviews

Search for jobs

  • Jobs by Job Title
  • Jobs by Industry
  • Jobs by Company
  • Jobs by Location
  • Jobs by Keywords

Contact / Partnership

  • Contact
  • Publish your job offers on Jobijoba

Legal notice - Terms of Service - Privacy Policy - Manage my cookies - Accessibility: Not compliant

© 2025 Jobijoba - All Rights Reserved

Apply
Create E-mail Alert
Job alert activated
Saved
Save