STEM Computational Scientific Software & Evaluation Design - Pharmacokinetics & Systems Biology
Were building a large-scale evaluation benchmark for advanced AI reasoning across scientific and engineering domains. Our task designers create challenging computational problems that test whether AI systems can use real scientific software tools to solve research-grade problems from querying simulations and interpreting outputs to designing experimental strategies and recovering hidden information from data.
This is not a typical annotation or labeling role. Youll be designing original, graduate-level computational problems grounded in real scientific workflows, calibrating them against frontier AI models, and iterating on problem design until the difficulty is right.
What Youll Do
Youll design problems that require sophisticated use of domain-specific scientific software libraries. Some problems will require computing precise outputs from fully specified setups — testing whether a solver can correctly implement complex multi-step scientific workflows. Others will require something harder: designing a sequence of queries or experiments to uncover information that isnt directly visible, demanding strategic reasoning about what to measure, how to interpret partial observations, and how to narrow down possibilities efficiently.
Each task goes through a calibration loop where its tested against state-of-the-art AI models, and youll refine the problem design until the difficulty hits the target range.
Domains & Tools Were Hiring For
Were especially interested in experts with deep, hands-on experience in the following area:
* Pharmacokinetics & Systems Biology Working with libRoadRunner, Tellurium, or SBML-based tools for compartmental PK/PD modeling, enzyme kinetics, or systems biology simulations.
* Experience with other specialized software for the above domain will also be considered.
What Makes a Strong Candidate
You have graduate-level expertise (MS or PhD preferred) in the domain listed above, with real hands-on experience using the specific software tools, not just theoretical knowledge of the field. Youve written code that calls these libraries to solve actual research problems, and you understand where they break, what their edge cases are, and what makes a problem genuinely hard versus superficially complex.
Beyond domain expertise, the strongest candidates will be able to think like a puzzle designer: constructing problems where the difficulty comes from reasoning strategy rather than brute computation, where there are multiple plausible approaches but only careful analysis reveals the right one, and where surface-level pattern matching wont get you to the answer.
Requirements
* Graduate-level training in a relevant STEM domain (MS, PhD, or equivalent research experience)
* Demonstrated proficiency with at least one of the listed scientific software libraries, evidenced by research publications, open-source contributions, or professional work
* Strong Python programming skills — youll be writing problem setups, oracle functions, and solution validators
* Ability to work independently and iterate on problem designs based on calibration feedback
* Comfortable working in a Linux/terminal environment with remote compute sandboxes
* Available for at least 15–20 hours per week
Nice to Have
* Experience across multiple listed domains or tools
* Familiarity with benchmark or evaluation design
* Background in scientific pedagogy or exam/problem-set design
* Experience with computational reproducibility and containerized environments
J-18808-Ljbffr