Halian Technology is seeking an experienced Site Reliability Engineer for a full-time opportunity within our client’s Platform Engineering team, based remotely in the U.S.
We’re looking for a technically skilled and automation-driven individual with strong experience in cloud infrastructure, and observability tools to help scale our client’s services to millions of endpoints globally. This is an exciting opportunity to work at the core of platform reliability and infrastructure automation within a fast-growing SaaS company.
Key Responsibilities:
* Diagnose and resolve complex application and infrastructure issues across distributed systems.
* Participate in 24x7 on-call rotation, sprint planning, and SCRUM ceremonies.
* Perform root cause analysis (RCA) and create technical documentation and SOPs.
* Develop scripts and tools to automate infrastructure provisioning and application deployment.
* Implement best practices for observability and monitoring using tools like New Relic, DataDog, or Splunk.
* Influence design decisions to ensure scalable, secure architecture and high availability.
Key Requirements:
* 5+ years in Site Reliability Engineering and/or DevOps roles.
* Strong Linux administration and scripting skills.
* Hands-on experience with AWS core services (EC2, ECS, Route53, Fargate, etc.).
* Proficient in infrastructure-as-code (CloudFormation, Terraform, Helm, Ansible).
* Experience with containers, microservices, Kubernetes, and CI/CD pipelines.
* Strong communication skills and a passion for automation and reliability.
Apply now to be part of a leading IT automation platform, and help drive the next generation of scalable and secure infrastructure!