Are you a passionate Site Reliability Engineer? We’re hiring for a company specialized in distributed systems, content delivery, and video streaming at scale. This fast-growing tech company is transforming in-transit entertainment with an intelligent caching platform that enables airlines and cruise lines to deliver personalized, high-quality video content — even without internet access. Join a global team building the next-generation content delivery system for aircraft and maritime environments.
Location: 100% Remote
Type: Full-time
Contract type: B2B
Your responsibilities
* Design, deploy, and maintain Kubernetes-based infrastructure using Terraform and Infrastructure-as-Code principles
* Lead software deployment efforts for two new international content delivery sites
* Build and optimize observability systems (metrics, logging, alerting) to monitor service health and performance
* Collaborate with engineering teams to develop and automate CI/CD pipelines (GitLab CI, Argo CD, etc.)
* Operate and improve service mesh technology (e.g., Istio) to ensure secure, reliable service-to-service communication
* Troubleshoot production systems with a focus on distributed services and networking (HTTP/S, DNS, QUIC)
* Contribute to post-incident reviews, root cause analyses, and long-term stability initiatives
* Participate in on-call rotations for incident response and site uptime
Our requirements
* 3+ years of experience in Site Reliability Engineering, DevOps, or Cloud Infrastructure roles
* Strong hands-on experience with:
* Kubernetes (Helm, Operators, workload and networking management)
* Terraform and other Infrastructure-as-Code tools
* Containers and orchestration at scale
* CI/CD systems (e.g., GitLab CI, Argo CD)
* Observability tools like Prometheus, Grafana, Loki, Alertmanager
* Familiarity with service meshes such as Istio or Linkerd
* Deep understanding of networking protocols (TCP/IP, HTTPS, DNS, QUIC)
* Experience with distributed systems principles (consistency, fault tolerance, horizontal scaling)
* Ability to diagnose and resolve production issues effectively in high-availability systems
* Bachelor’s degree in Computer Science or equivalent professional experience
Optional
* Experience with performance tuning in Go, Rust, or C/C++
* Background in content delivery, video/media platforms, or caching technologies
* Proven contributions to reliability engineering or developer platform improvements
#J-18808-Ljbffr