Freelance agent evaluation engineer

Manchester

Freelance

Postaladdress Uk

Engineer

€33,000 - €30,855 a year

Posted: 28 April

Offer description

Job Overview

Mindrift connects specialists with project‑based AI opportunities for leading tech companies, focused on testing, evaluating, and improving AI systems. Participation is project‑based, not permanent employment.

Responsibilities

You’ll develop a dataset to evaluate AI coding agents by creating challenging tasks and evaluation criteria within realistic simulated environments:

* Build virtual companies following a high‑level plan – codebase, infrastructure, and context (conversations, documentation, tickets) that form a realistic environment with development history.
* Assemble and calibrate tasks from intermediate states of the virtual company: craft the prompt, define evaluation criteria, and ensure the task is solvable and the evaluation is fair.
* Design tasks set in isolated environments – emulations of a developer’s workstation: a Linux machine with development tools (terminal, CLI), MCP servers (repository, task tracker, messenger, documentation, etc.), and a real web application codebase.
* Write tests that accept all correct solutions and reject incorrect ones – neither too strict (breaking on valid approaches) nor too lenient (passing bad ones).
* Iterate with an AI agent on tests – verifying they catch real problems, don’t miss bad solutions, and don’t break on good ones.
* Review code written by agents, analyze why an agent failed or succeeded, and design edge cases and adversarial scenarios.
* Iterate based on feedback from expert QA reviewers who score your work on quality criteria.

What this is NOT

* Data labeling
* Prompt engineering
* Writing code from scratch – the agent writes most of the code; you guide and evaluate.

Qualifications

* Degree in Computer Science, Software Engineering, or related fields.
* 5+ years in software development, primarily Python (FastAPI, pytest, async/await, subprocess, file operations).
* Background in full‑stack development, with experience building React‑based interfaces (JavaScript/TypeScript) and robust back‑end systems.
* Experience writing tests (functional, integration – not just running them).
* Docker containerization and familiarity with infrastructure tools (Postgres, Kafka, Redis).
* CI/CD understanding (GitHub Actions as a user: triggers, labels, reading results).
* English proficiency – B2.
* Comfortable reading and reasoning about code across the stack; expertise in every area is not required.

Compensation

On this project, contributors can earn up to $50 per hour equivalent, depending on their level and pace of contribution. Compensation varies across projects based on scope, complexity, and required expertise.

#J-18808-Ljbffr

Apply

Create E-mail Alert

Save

Similar job

Senior python data scraping engineer (freelance)

Manchester

Freelance

Mindrift

Engineer

€27,000 - €45,622 a year

Similar job

Freelance data science engineer (python & sql)

Manchester

Freelance

Mindrift

Engineer

€33,000 - €30,855 a year

Similar job

Freelance data scraping engineer (python)

Manchester

Freelance

Mindrift

Engineer

€23,000 - €74,592 a year