Requirements
* Practical experience developing in Go
* Familiarity with cloud services (AWS preferred)
* Experience managing or developing in Linux environments
* Understanding of CI/CD principles
* Strong experience of Kubernetes (k8s) development and deployment
* (Desirable) Experience developing Kubernetes Controllers
* (Desirable) Experience with Infrastructure as Code (IaC) tools (e.g. Terraform/OpenTofu)
* (Desirable) Experience with GitHub Actions
* (Desirable) Experience with distributed HPC systems
* (Desirable) Experience with modern observability tooling (e.g. Prometheus)
* (Desirable) Knowledge of Python/C++ (or similar language)
What the job involves
* Join our dynamic Software Infrastructure team and take a pivotal role in scaling and managing our infrastructure
* You will develop essential tools and services that empower our broader software team. Your contributions will enhance the build, test, deployment, and productisation processes of our Machine Learning Software components. Work with our High-Performance Computing (HPC) AI platforms and gain invaluable experience in distributed system
* The Software Infrastructure team provides critical platforms and services for software development teams across the business. Our responsibilities include managing the CI platform and services, build engineering, component integration, and packaging and release systems
* We operate in squads, fostering a culture of service ownership and empowerment for our engineers. We focus on long-term engineering solutions and strive to eliminate toil wherever possible
* Develop, own, and maintain tools and services to support the software org
* Deploy and maintain Kubernetes infrastructure to develop, test, and scale Graphcore hardware and its software stack
* Manage our Cloud Infrastructure using tools such as Terraform
#J-18808-Ljbffr