Graphcore is one of the world’s leading innovators in Artificial Intelligence compute.
It develops hardware, software, and systems infrastructure to enable the next generation of AI breakthroughs and promote widespread AI adoption across industries.
As part of the SoftBank Group, Graphcore belongs to an elite family of companies responsible for transformative technologies. Their shared vision is to enable Artificial Super Intelligence and make its benefits accessible to all.
Graphcore’s teams are diverse, comprising AI research specialists, silicon designers, software engineers, and systems architects, fostering a culture of continuous learning and innovation.
Job Summary
We seek a Staff Engineer for our Cloud Development Team to develop and deploy services. Collaborating with Platform Engineering, Data Centre Operations, and Product Development teams, you will deploy services on our advanced AI systems, including in-house hardware and off-the-shelf servers, switches, and storage solutions. This hands-on role requires a strong background in cloud infrastructure, Infrastructure-as-Code deployment, networking, and storage systems. Experience in IT, data centres, cloud providers, or orchestration/cloud services is desirable.
The Platform Engineering Team at Graphcore
We integrate Graphcore products into large-scale AI solutions for internal and external customers, often working with pre-release hardware and software, requiring comfort with unproven components.
Responsibilities and Duties
* Develop and operate end-user services on private clouds, supporting internal users and translating requirements into deployed services.
* Build automation for metrics collection and analysis to identify and report issues, working with users and engineering teams.
* Maintain and operate AI system fleets in private clouds in collaboration with Data Centre Operations Engineers.
* Configure and test new hardware and systems using Continuous Deployment and Infrastructure-as-Code in data centres.
* Integrate third-party hardware solutions into our Cloud Reference Design in partnership with vendors.
Skills and Experience
* Bachelor's degree or equivalent in a relevant field.
* Proven software engineering or IT experience with a track record of delivering results.
* Experience working within AGILE and SCRUM frameworks.
* Strong Linux scripting skills (bash, python, awk, sed).
* Linux system administration experience (Ubuntu, RHEL).
* Experience with version control systems (preferably Git).
* Familiarity with CI/CD pipelines (GitLab, GitHub).
* Understanding of cloud service technologies (APIs, virtualization, networks, storage, resource management).
* Experience with Infrastructure-as-Code tools (Terraform, Ansible, Packer).
* Experience with container management (Docker).
* Knowledge of monitoring and observability tools (Grafana, Prometheus, ElasticSearch, Loki).
* Good communication and end-user support skills.
* Ability to work independently on critical infrastructure with minimal oversight.
Desirable Skills
* Experience with OpenStack cloud platforms.
* Managing production Kubernetes clusters.
* Python3 programming with classes and inheritance.
Graphcore offers a competitive salary, flexible working, generous leave, private medical insurance, pension contributions, and a commitment to diversity and inclusion. Note: Applicants must have the right to work in the UK; visa sponsorship is not available at this time.
#J-18808-Ljbffr