Job Description
About Us
Turing is one of the world’s fastest-growing AI companies, pushing the boundaries of AI-assisted software development. Our mission is to empower the next generation of AI systems to reason about and work with real-world software repositories. You’ll be working at the intersection of software engineering, open-source ecosystems, and frontier AI.
Project Overview
We're building high-quality evaluation and training datasets to improve how Large Language Models (LLMs) interact with realistic software engineering tasks. A key focus of this project is curating verifiable software engineering challenges from public GitHub repository histories using a human-in-the-loop process.
Why This Role Is Unique
1. Collaborate directly with AI researchers shaping the future of AI-powered software development.
2. Work with high-impact open-source projects and evaluate how LLMs perform on real bugs, issues, and developer tasks.
3. Influence dataset design that will train and benchmark next-gen LLMs.
4. What does day-to-day look like:
5. Review and compare 3–4 model-generated code responses for each task using a structured ranking system.
6. Evaluate code diffs for correctness, code quality, style, and efficiency.
7. Provide clear, detailed rationales explaining the reaso...