Are you interested in helping us craft exceptional experiences for our clients that deliver genuine social impact? Are you ready to join a small and experienced team of innovators and make a significant contribution to a fast-growing technology company?
About the Company
Our client is a fast-paced startup with the goal of making the world a more inclusive place for the Deaf community. As a technology-for-good organisation, they are using AI to create sign language translations across video, transportation and website platforms.
The company is expanding rapidly across these sectors, providing an exciting opportunity to grow its technical team. By joining as a ML Systems Engineer, you will be at the forefront of expansion into new markets, helping shape infrastructure strategy and ensuring systems remain scalable, secure and at the cutting edge of technology.
The Role
We are looking for an ML Systems Engineer to help design and optimise the systems that power real-time AI video generation.
The company’s models generate sign language video using generative AI pipelines deployed on GPU infrastructure across both cloud and on-prem environments. A key challenge is reducing generation latency and maximising GPU utilisation so the system can deliver real-time video streams.
You will work across the full ML inference stack, from model optimisation to deployment infrastructure, ensuring models run efficiently in production environments.
This role is ideal for engineers who enjoy performance optimisation, distributed systems, and building production ML infrastructure.
Responsibilities
* ML Inference Optimisation
* Profile and optimise deep learning models used for sign language video generation
* Reduce inference latency using techniques such as quantisation, pruning, mixed precision, and kernel optimisation
* Improve GPU utilisation and throughput across inference pipelines
* Work closely with ML researchers to ensure models are production-ready
* ML Infrastructure & Deployment
* Build and maintain scalable model serving systems
* Deploy and operate inference services on GPU clusters
* Design autoscaling infrastructure to meet real-time SLAs
* Contribute to model deployment pipelines, versioning, and rollback strategies
* Performance Engineering
* Develop benchmarking frameworks for tracking inference performance
* Identify bottlenecks across the ML pipeline and eliminate latency hotspots
* Implement performance monitoring and alerting for production systems
* Evaluate new hardware accelerators and inference runtimes
The infrastructure currently includes:
* Python, Go and Rust production services
* PyTorch-based generative models
* GPU inference workloads
* Kubernetes clusters (cloud and on-prem)
* AWS infrastructure including SageMaker
* Real-time streaming systems using protocols such as HLS, LL-HLS, RTMP and SRT
* 3+ years of experience in ML systems engineering, ML infrastructure, or backend systems
* Strong programming skills in Python (Rust is a plus)
* Experience working with production ML models
* Experience optimising ML inference performance
* Familiarity with containerised systems such as Docker and Kubernetes
* Strong debugging, profiling, and performance analysis skills
* Interest in building latency-critical systems
* Experience with inference optimisation tools such as TensorRT, ONNX, or similar frameworks
* Experience with model serving systems such as Triton, TorchServe, or Ray Serve
* Familiarity with GPU architecture and performance optimisation
* Experience working with video, graphics, or real-time streaming systems
* Experience deploying ML workloads at scale
* Experience contributing to open-source ML infrastructure projects
Why Join This Company
* Work on technology that directly improves accessibility for Deaf communities
* Help build one of the first real-time AI sign language generation systems
* Join a small, experienced engineering team solving challenging technical problems
* Opportunity to take ownership of critical systems as an early engineering hire
* Work across a modern ML infrastructure stack
* 24 days’ holiday plus bank holidays and company pension scheme
* Competitive compensation and high-value equity packages
* Opportunity to work on cutting-edge technologies and be involved in the early stages of a high-growth business
Hours
This is a full-time position with normal virtual office hours of 9am to 6pm, although flexibility is offered to suit reasonable personal circumstances. What matters most is strong collaboration, meeting agreed milestones and delivering high-quality work.
Please note you must have the right to work and live full-time in the UK when applying for this position.
Equality and Diversity
The company is committed to eliminating discrimination and encouraging diversity within its team. The aim is to build a workforce that is truly representative of all sections of society, where every employee feels respected and able to give their best.
A culture of encouragement and support has been created to enable employees to focus on what they want to achieve for successful career development. Work-life policies and flexible working practices help employees feel more in control of their personal and professional lives.
Any qualified applicants who are native sign language users are guaranteed an interview.
#J-18808-Ljbffr