Qualifications
* 6-10+ years working with ML/AI systems, including 3+ years owning human data / labeling / RLHF / evaluation workflows
* Deep understanding of LLM training stacks and post-training methods (e.g. SFT, RLHF, preference learning, DPO, RLAIF) at a conceptual level
* Hands‑on experience designing human data tasks: specs, rubrics, rating scales, and annotation guidelines
* Proven track record working with annotation vendors or internal labeling teams at scale (multi‑country, multi‑language, multi‑project)
* Strong background in data quality: sampling, inter‑rater agreement, disagreement analysis, bias detection, and QA strategies
* Experience building or heavily using tools for labeling, RLHF, or evals (internal platforms, Scale/Surge/Appen‑style tools, or custom pipelines)
* Comfortable collaborating with research, product, infra, and ops stakeholders and translating between them
* Solid understanding of privacy, compliance, and data‑provenance concerns for human data in AI systems
* Ability to define metrics and KPIs that connect human‑data work to model performance and business outcomes
* Strong communication skills and the ability to write clear specs, rubrics, and internal documentation
* Bonus: prior experience leading a team (data ops, RLHF ops, annotation ops, or evals) in an AI lab or data vendor
Responsibilities
* Own the end‑to‑end strategy and roadmap for human data across training, post‑training, and evaluation
* Design and iterate high‑signal human data tasks, specs, and rubrics in partnership with research and product teams
* Architect and oversee human data pipelines across vendors, geographies, and internal teams
* Define and implement QA strategies, calibration processes, and monitoring to ensure label and feedback quality
* Work closely with researchers to turn human data into reward models, fine‑tuning datasets, and eval suites
* Build and maintain a clear provenance and governance layer for all human data (sources, contracts, restrictions, jurisdictions)
* Evaluate and onboard external vendors and tools; benchmark their performance and cost against internal options
* Develop metrics and dashboards that connect human data investments to model improvements and key product metrics
* Identify and address operational bottlenecks in human data workflows; propose and drive process or tooling changes
* Collaborate on the design of new internal products / infra for task authoring, routing, QA, and evaluation
* Represent the "human data" perspective in cross‑functional planning, ensuring it is treated as core infrastructure, not an afterthought
#J-18808-Ljbffr