Job Overview
Mindrift connects specialists with project‑based AI opportunities for leading tech companies, focused on testing, evaluating, and improving AI systems. Participation is project‑based, not permanent employment.
Responsibilities
You’ll develop a dataset to evaluate AI coding agents by creating challenging tasks and evaluation criteria within realistic simulated environments:
* Build virtual companies following a high‑level plan – codebase, infrastructure, and context (conversations, documentation, tickets) that form a realistic environment with development history.
* Assemble and calibrate tasks from intermediate states of the virtual company: craft the prompt, define evaluation criteria, and ensure the task is solvable and the evaluation is fair.
* Design tasks set in isolated environments – emulations of a developer’s workstation: a Linux machine with development tools (terminal, CLI), MCP servers (repository, task tracker, messenger, documentation, etc.), and a real web application codebase.
* Write tests that accept all correct solutions and reject incorrect ones – neither too strict (breaking on valid approaches) nor too lenient (passing bad ones).
* Iterate with an AI agent on tests – verifying they catch real problems, don’t miss bad solutions, and don’t break on good ones.
* Review code written by agents, analyze why an agent failed or succeeded, and design edge cases and adversarial scenarios.
* Iterate based on feedback from expert QA reviewers who score your work on quality criteria.
What this is NOT
* Data labeling
* Prompt engineering
* Writing code from scratch – the agent writes most of the code; you guide and evaluate.
Qualifications
* Degree in Computer Science, Software Engineering, or related fields.
* 5+ years in software development, primarily Python (FastAPI, pytest, async/await, subprocess, file operations).
* Background in full‑stack development, with experience building React‑based interfaces (JavaScript/TypeScript) and robust back‑end systems.
* Experience writing tests (functional, integration – not just running them).
* Docker containerization and familiarity with infrastructure tools (Postgres, Kafka, Redis).
* CI/CD understanding (GitHub Actions as a user: triggers, labels, reading results).
* English proficiency – B2.
* Comfortable reading and reasoning about code across the stack; expertise in every area is not required.
Compensation
On this project, contributors can earn up to $50 per hour equivalent, depending on their level and pace of contribution. Compensation varies across projects based on scope, complexity, and required expertise.
#J-18808-Ljbffr