CommonAI CIC is a non-profit membership organisation, founded on a belief in collaborative engineering for the safe and responsible development of foundational AI technologies. A place where AI startups, enterprises large and small, public sector bodies and academia can share resources and knowledge, to codevelop and grow businesses, fast.
We support technology-focused start ups, each with unique data management challenges, and are seeking an experienced Storage Architect to help them design, deploy and maintain high-performance storage systems for their AI and data-driven workloads. The successful candidate will combine deep experience architecting and managing distributed, cloud, and tiered storage solutions with strong Linux and automation skills.
In this role you will:
* Design, implement, and maintain storage platforms that support large-scale AI and data pipelines
* Manage distributed storage systems such as Ceph, Lustre, or BeeGFS.
* Oversee tiered storage architectures, optimizing data movement across high-performance, object, and archival tiers.
* Ensure data integrity, availability, and security across on-premises and cloud environments.
* Develop automation and monitoring tools using Bash, Python, or similar scripting languages.
* Manage and secure container images and related storage used for AI and ML workloads.
* Integrate storage systems with public cloud services (AWS, Azure, GCP) and hybrid environments.
* Troubleshoot complex storage and data flow issues, collaborating closely with AI platform and infrastructure teams.
* Contribute to ongoing architecture improvements, performance tuning, and capacity planning.
To be considered candidates should meet most of the following requirements:
* Strong Linux system administration background.
* Proven experience installing, configuring, and maintaining Ceph clusters or similar technologies in a production environment.
* Familiarity with distributed filesystems (e.g., Lustre, BeeGFS) and cloud-based storage services (e.g. EC2).
* Experience with tiered storage management and lifecycle data policies.
* Scripting and automation proficiency (e.g. Bash, Python, Terraform/OpenTofu, Ansible).
* Understanding of data security best practices and compliance considerations.
* Experience working with container technologies (e.g. Docker, Kubernetes) and image storage registries.
* Strong analytical, troubleshooting, communication and documentation skills.
We also value:
* Knowledge of GPU compute environments or AI training infrastructure.
* Experience with monitoring and observability tools (Prometheus, Grafana, etc.).
* Contributions to open-source storage, data management, or infrastructure projects.
* Familiarity with object storage systems (S3, RADOS Gateway, MinIO, etc.).
* A collaborative and supportive work environment.
* The opportunity to have a high impact in a growing organisation.
* Competitive salary package and pension.
* Professional development opportunities.
* Networking opportunities with influential people from across the tech sector and academia.
* A vibrant office environment located a few minutes walk away from Cambridge train station.
CommonAI CIC is an equal opportunity employer and is committed to creating an inclusive and diverse workplace.
#J-18808-Ljbffr