Systems Research Engineer (AI Infrastructure) - Edinburgh - Global Leading Firm
As large language models reshape the foundational software stack, next-generation AI-native infrastructure is redefining how large-scale models are trained, served, and deployed. We are driving innovation in AI infrastructure and agent-oriented serving architectures to define future large-scale data centre and distributed AI systems.
Key Responsibilities
* Distributed Systems R&D: Architect and implement distributed system components for AI workloads across heterogeneous clusters (CPU, GPU, accelerators).
* Performance Optimisation: Conduct deep profiling and performance tuning of large-scale inference pipelines, focusing on KV cache management and memory scheduling.
* Scalable Serving Infrastructure: Develop frameworks for multi-tenant, low-latency, and fault-tolerant AI serving, researching techniques for cache sharing and data locality.
* Research & Publications: Translate novel designs into publishable contributions for leading systems and ML venues (e.g., OSDI, SOSP, EuroSys, NeurIPS).
* Cross-Team Collaboration: Communicate technical insights to multidisciplinary teams and align on long-term infrastructure strategy.
Qualifications
* Education: BSc, MSc, or PhD in Computer Science, Electrical Engineering, or a related field.
* Systems Expertise: Strong knowledge of Operating Systems, Distributed Systems, and AI inference serving.
* Technical Stack: Proficiency in C/C++ for systems development and Python for research prototyping.
* Hands-on Experience: Experience with LLM serving frameworks, distributed cache optimisation, and performance profiling tools.
* Preferred: A track record of publications in top-tier systems or ML conferences and practical experience in load balancing or resource scheduling for inference clusters.