Top Skills
* Cloud Infrastructure & Automation
o Design and manage scalable systems on platforms GCP.
o Use Infrastructure as Code (IaC) tools such as Terraform.
* Performance & Reliability Engineering
o Experience in capacity planning, performance tuning, and predictive analytics.
o Knowledge of distributed systems and high-availability architectures.
* Monitoring & Observability
o Proficiency with APM tools like Dynatrace, New Relic, or AppDynamics.
o Proactive incident detection.
* Programming & Scripting
o Strong coding skills in Python, Go, or Java for automation and reliability improvements.
Experience Required
Minimum 4+ years of experience in the specific skill set (SRE). Overall IT experience of 6–8+ years.
Job Description
As we expand our customer deployments to build software that improves our customer’s experience, we are seeking an experienced SRE to bring fresh ideas and demonstrate a unique and informed viewpoint to our business. The ideal candidate will be someone who enjoys collaborating with a cross‑functional team to develop real‑world solutions and positive user experiences at every interaction.
As an SRE, you will work with leading edge technologies both on‑premise and in the cloud. Automation and superior software quality/performance and resiliency will be your mindset. You will be an expert resource in software and operational high‑performance design patterns and support different development, architecture and operational teams from start to finish to create scalable and resilient solutions.
Responsibilities
* Support development, architecture and operational teams for performance/capacity related issues associated with complex multi‑tier distributed platforms during the SDLC and postproduction.
* Support/coordinate new Build/Run initiatives prior to production and assure product readiness including infrastructure recommendations, software/script development, load/chaos testing, optimization, SLO definition, capacity planning, and observation/alerting.
* Review services, applications and identify bottlenecks. Identify opportunities to improve performance and scale.
* Perform new POCs for newer technologies and architectural patterns to help teams make informed decisions.
* Define new SLOs for services and applications to meet non‑functional SLA requirements defined by the business.
* Work to reduce/minimize ongoing runtime costs through efficient throttling/queuing/pooling/autoscaling across application and infrastructure tiers.
* Proactively identify anomalies and opportunities in platforms in production to achieve greater performance/scale and recommend to impacted teams for future planning.
* Define performance quality gates and support canary development CI/CD scenarios around performance for teams.
Required Skills and Qualifications
* Experience supporting/troubleshooting large scale multi‑tier distributed on‑premise and cloud applications
* Experience architecting, developing and setting up new infrastructure solutions for GCP cloud leveraging terraform/on‑premise applications
* Experience in Capacity Planning or Performance Engineering and leveraging predictive analytics to determine needed scaling patterns for platforms
* Experience programming in languages such as Java, NodeJS, Go, Python and JavaScript
* Experience in Web Development and/or Web Service creation
* Demonstrable cross‑functional knowledge with systems, storage, networking, security, and databases.
* Experience using APM tools such Dynatrace, New Relic or AppDynamics.
Preferred Qualifications
* Experienced Architect in GCP, Kubernetes, and serverless
* Collaborate with development team to define infrastructure requirements and implement scalable and resilient cloud architecture using terraform.
* Experience in migrating legacy applications to cloud‑native architecture
* Strong understanding of Spring Framework
* Experienced in performance tracing/profiling using Google Developer Tools
* Experience with SQL and database scaling/replication schemes
* Familiar with tools used for front‑end analysis such as Lighthouse, Page Speed Metrics, Webpage Test, GTMetrix and browser developer tools.
* Experience using MongoDB/Atlas, Oracle OCI, Postgres, GCP Cloud SQL
* Experience with AngularJS, React and Vue
* Experience tuning/optimizing runtime environments for Java (JVMs), Nodejs and Python for the best performance
* Experience with DevOps/Quality gating concepts, Canary deployments and automation associated with CI/CD deployments.
* Experience in Enterprise Architecture integration patterns and domain model driven design addressing proper separation of concerns for an application/microservices and core web services.
* Experience using observability tools like Dynatrace or any APM tool is a must.
* Experience using cloud profiling tools and JVM tools like JProfiler/Java Flight Recorder.
* Experience in Testing methodologies and metrics using tools like JMeter, NeoLoad, LoadRunner or other.
* Systematic problem‑solving approach, coupled with strong communication skills and a sense of ownership and self‑drive
* Experience with CI/CD methodologies and having Agile/DevOps mindset
* A passion for automation with a desire to eliminate toil whenever possible
* A proactive approach to spotting problems, areas for improvement, and performance bottlenecks
* Experience using GIT and industry build tools
Equal Employment Opportunity
Diverse Lynx LLC is an Equal Employment Opportunity employer. All qualified applicants will receive due consideration for employment without any discrimination. All applicants will be evaluated solely on the basis of their ability, competence and their proven capability to perform the functions outlined in the corresponding role. We promote and support a diverse workforce across all levels in the company.
#J-18808-Ljbffr