Job Description:
* As an SRE, you'll collaborate closely with Application Development and Operations teams to build and maintain scalable systems.
* Your core focus will be to automate processes and ensure the highest levels of service reliability, specifically by reducing manual effort (TOIL).
* You'll bring a strong passion for continually improving the reliability, availability, and performance of our services..
* * Primary Skill – Experience with cloud platforms Primarily in AWS Cloud (e.g., AWS, GCP, Azure) and Container Orchestration (e.g., Kubernetes, Docker).
* Proficiency in Monitoring and Logging Tools: Datadog, Splunk, Dynatrace, AppDynamics, Prometheus, Grafana, ELK Stack (Elasticsearch, Logstash, Kibana), Cloude Watch, Gremlin, Thousand Eyes.
* Terraform, Jenkins, GitLab CI, PostgreSQL, Redis, Kong API.
* Infrastructure skills, Networking and Security Skills, AWS (Atlas), ECS Based internal tooling
* Lucidchart, PlantUML
* Secondary Skill –SNOW, Jira, Shell Script, Linux, bitbucket, Akamai, DevOps.