About the role
We’re hiring a Founding Infrastructure Engineer to build and secure the platform foundations for an AI-enabled product used by real customers. This is a hands-on role with significant ownership over reliability, scalability, security, and developer velocity.
You’ll join a small engineering team where speed matters, pragmatism wins, and infrastructure is treated as a core product.
What you’ll do
You will take the lead on the systems that keep the product stable, fast, and safe as usage grows:
* Own cloud infrastructure end-to-end
* Design, operate, and continuously improve a production AWS environment with infrastructure-as-code, focusing on resilience, performance, and cost control.
* Enable multi-region and isolated environments
* Build patterns for deploying fully separated customer environments across multiple regions, including support for single-tenant setups when required.
* Build and evolve CI/CD workflows
* Develop streamlined pipelines that allow the team to ship frequently and safely using modern tooling (e.g., GitHub Actions, containers, progressive deployments).
* Implement observability and reliability standards
* Set up strong monitoring and alerting across logs, metrics, and traces so issues are detected early and resolved quickly.
* Improve performance under demanding workloads
* Tune autoscaling, routing, load balancing, and runtime behaviour to meet latency and throughput expectations, especially under compute-heavy usage.
* Raise the bar on operational maturity
* Establish best practices for incident response, fault tolerance, deployment safety, and infrastructure hygiene.
* Partner with application engineers
* Work closely with product and engineering teammates to ensure the platform supports rapid iteration without sacrificing stability.
🛠️ What you bring
As a founding infrastructure & security engineer, you’ve solved difficult technical problems and delivered systems that perform reliably in production, demonstrated through past roles, notable projects, or open-source contributions.
Must haves
* 5+ years owning cloud infrastructure in production environments (AWS preferred)
* Strong expertise in Infrastructure as Code (Terraform or equivalent)
* A proven record of building and maintaining CI/CD pipelines
* Deep understanding of cloud networking, autoscaling, and load balancing
* Solid security fundamentals across IAM, network controls, and incident response practices
Preferred haves
* Experience with ECS or container orchestration platforms
* Familiarity with distributed worker / job systems (e.g., Celery, Sidekiq, Resque, SQS, etc.)
* Experience with observability stacks (e.g., Datadog, Prometheus, Grafana, Sentry)
* Strong Python skills
Nice to haves
* Experience supporting AI/ML workloads in production
* Background in offensive security, CTFs, or security research