We're working with a fast-growing, VC-backed healthtech startup on a mission to build a platform that blends human insight with advanced AI for personalised, continuous care. As part of this growth, they’re looking for an AI Data Scientist to work within a distributed team, on a hybrid model.
Your Impact
As one of their earliest AI hires, you'll shape the backend foundation powering secure data flows, AI-driven insights, and provider tools. Expect hands-on ownership in a collaborative, mission-focused team tackling real-world health challenges.
Key Responsibilities
* Design and execute experiments using large datasets, including LLM-generated or LLM-augmented data
* Analyze and validate the reliability, consistency, and bias of LLM outputs across healthcare use cases
* Fine-tune and evaluate LLMs (e.g., OpenAI, Claude, Llama) while managing risks like overfitting and hallucination
* Build scalable pipelines to preprocess, structure, and extract insights from unstructured or semi-structured data
* Work closely with clinicians and product teams to align AI insights with real-world healthcare needs
* Develop metrics and evaluation strategies for model performance, safety, and explainability
* Investigate and mitigate risks related to synthetic data and model-induced artifacts
* Help shape how healthcare AI tech can be safe, fast, and deeply human
What You'll Bring
* 5+ years of experience in data science or machine learning, including work with LLMs or large generative models
* Deep understanding of LLM internals-tokenization, attention mechanisms, fine-tuning, prompt engineering, embeddings
* Strong Python skills, including libraries like PyTorch, HuggingFace Transformers, LangChain, or similar
* Hands-on experience fine-tuning models or building applications with LLM-generated data
* Strong statistical and experimental design skills, especially around model evaluation and failure analysis
* Familiarity with cloud-based ML pipelines (e.g., AWS/GCP/Azure), versioned datasets, and reproducible experimentation
* Experience navigating data quality issues-bias, hallucination, inconsistency
For further details and immediate consideration, please get in touch.
#J-18808-Ljbffr