About The Role
A Site Reliability Engineering (SRE) combines software and systems engineering to build and run large-scale, distributed, fault-tolerant systems.
The SRE's culture of diversity, intellectual curiosity, continual learning, problem solving and openness is key to its success.
Our SRE's are passionate about ensuring the services, both internal and external have extreme reliability, uptime appropriate to users' needs, resiliency, architectural simplicity and continually seek to find improvements.
You will have the opportunity to deal with complex challenges, while using your expertise in coding, algorithms, complexity/forensic analysis and various system design strategies to solve those challenges.
Much of your software development focus will be on optimising existing systems on Public clouds like GCP, Azure, building new GCP/Azure PaaS and IaaS solutions for services currently not running on public clouds and eliminating manual work through automation wherever feasible. You will work with people from a wide variety of backgrounds, experiences and perspectives.
We encourage collaboration and mentoring people in DevOps, Chaos engineering and other positive practices. Self-direction is important, and we strive to create an environment that supports this through trust and mentorship.
Role Specialties
A role should not define you, therefore we prefer to work with people that have well-honed skills, experience and knowledge in varying specialisms. That is hard to portray in a job description, so we have listed a few areas that would be key to the role, and you would be expected to be strong in some of them:
• Define SRE Strategy
• Define Error budgets and execution model.
• Cost Management model across Dev/test and operation maintenance (FinOps)
• Strategize
- Automated Unit/Integration/Load Testing
- Performance Testing & Monitoring
- Logging, Monitoring & Alerting
- Cloud Security
• Container Image Management & Security
• Networks & Service Mesh
• API Gateway Management
• GCP/AWS/Azure Policy Management
- IAM - Identity & Access Management
• Bringing SRE mindset for the areas of
- CI/CD Automation (Build & Release)
- Micro Services develop
- Application services