Jobs
My ads
My job alerts
Sign in
Find a job Career Tips Companies
Find

Freelance agent evaluation engineer

Birmingham (West Midlands)
Freelance
Mindrift
Engineer
Posted: 27 April
Offer description

Please submit your CV in English and indicate your level of English proficiency. Mindrift connects specialists with project-based AI opportunities for leading tech companies, focused on testing, evaluating, and improving AI systems. Participation is project-based, not permanent employment. What this opportunity involves We're building a dataset to evaluate AI coding agents - how well a model handles real-world developer tasks. You'll create challenging tasks and evaluation criteria within realistic simulated environments: Build realistic developer environments - a virtual company with codebase, infrastructure, and context (tickets, docs, conversations) that forms a believable development history Design tasks from intermediate states of these environments - craft the prompt, define what "solved" means, and ensure the task is solvable by an AI agent Write tests that verify agent solutions - accept all valid approaches and reject incorrect ones, neither too strict nor too lenient Iterate on tasks and tests based on QA feedback - review agent solutions, analyze failures, and refine until the evaluation is fair and robust What this is NOT Not data labeling Not prompt engineering Not writing code from scratch - the agent writes most of the code; you guide and evaluate What we look for 5 years in software development Core stack: Python (FastAPI), JavaScript/TypeScript (React), Docker, Postgres, Kafka, Redis Experience writing tests (functional, integration) English proficiency - B2 Why this is hard Frontier models are already good at coding. Creating a task that genuinely challenges the best models is non-trivial. You need to deeply understand where models fail and what scenarios reveal the difference between a good and a bad solution. Tasks have many valid solutions - writing tests that accept all correct solutions and reject incorrect ones is harder than it sounds. How it works Apply → Pass qualification(s) → Join a project → Complete tasks → Get paid Effort estimate Tasks for this project are estimated to take 20 hours to complete, depending on complexity. This is an estimate and not a schedule requirement; you choose when and how to work. Tasks must be submitted by the deadline and meet the listed acceptance criteria to be accepted. Compensation Up to $50/hr equivalent, depending on level and pace. Tasks are estimated at ~20 hours each; you set your own schedule.

Apply
Create E-mail Alert
Job alert activated
Saved
Save
Similar job
Test rig engineer
Wolverhampton (West Midlands)
Trescal
Engineer
Similar job
Senior pega devops engineer
Birmingham (West Midlands)
DWP Digital
Engineer
£80,000 a year
Similar job
Engineer gas & heating systems
Nuneaton
Mitchell Maguire
Engineer
£50,000 a year
See more jobs
Similar jobs
Engineering jobs in Birmingham (West Midlands)
jobs Birmingham (West Midlands)
jobs West Midlands
jobs England
Home > Jobs > Engineering jobs > Engineer jobs > Engineer jobs in Birmingham (West Midlands) > Freelance Agent Evaluation Engineer

About Jobijoba

  • Career Advice
  • Company Reviews

Search for jobs

  • Jobs by Job Title
  • Jobs by Industry
  • Jobs by Company
  • Jobs by Location
  • Jobs by Keywords

Contact / Partnership

  • Contact
  • Publish your job offers on Jobijoba

Legal notice - Terms of Service - Privacy Policy - Manage my cookies - Accessibility: Not compliant

© 2026 Jobijoba - All Rights Reserved

Apply
Create E-mail Alert
Job alert activated
Saved
Save