Job Description
About the Role:
Please note: It’s a fully onsite role (5 days a week in the office)
We’re looking for an experienced Sr. DevOps/Site Reliability Engineer to build and optimize scalable, resilient cloud infrastructure. You’ll partner with development teams to improve automation and CI/CD, while also owning incident response and system reliability. This includes monitoring, troubleshooting, and ensuring our services remain highly available and performant.
A day in life of our Sr. DevOps/SRE:
1. Respond to monitoring alerts, participate in incident calls, and guide them to resolution.
2. Collaborate with software development teams to facilitate their daily operations.
3. Design, configure, and optimize CI/CD pipelines.
4. Build, monitor, and maintain a resilient and scalable infrastructure.
5. Maintain documentation for processes, architectures, and configurations
Qualifications
Who we are looking for:
6. Strong analytical and troubleshooting skills.
7. Hands-on experience with AWS CloudOps.
8. Understanding of cloud security best practices and industry standards.
9. Participate in an on-call rotation schedule.
10. Minimum of 7 years in a DevOps / SRE role.
11. 7 years working with Linux and Windows systems.
12. 3 years of advanced knowledge in Terraform module development.
13. 3 years of production experience with Docker and Kubernetes (EKS).
14. 5 years expertise in AWS services (EC2, RDS, S3, ElastiCache, WAF, CDN, Route 53).
15. Experience in cloud networking (Transit Gateway, subnets, routing, security groups).
16. Strong knowledge of Jenkins and GitLab.
17. Hands-on experience configuring IIS, NGINX, or other web servers.
18. Proficient with monitoring solutions (Zabbix, Prometheus, Grafana, etc.).