Title: Data Science Engineer (or Applied Data Scientist)
Seniority: Mid‐Senior
Location: Remote (EU or US timezone preferred)
Reports to: AI/Innovation Tech Lead
The Opportunity
This is a greenfield data science role — the chance to build the entire data science and modelling capability for a production AI platform from scratch. Solvace is in the middle of a major AI transformation: a multi‐agent copilot platform (KAI) is already live, serving manufacturing clients globally. The AI roadmap includes three capabilities that require statistical modelling and advanced analytics — KPI prediction and correlation, off‐spec prediction, and RCA recommendation — and this role will build them.
A Databricks lakehouse POC is being evaluated and the team is assessing the right long‐term analytics platform strategy. The production platform generates rich, real‐world manufacturing data — quality inspections, KPI time series, safety observations, process parameters — across multiple global clients. This data is ripe for modelling and analysis.
This role defines the entire data science and modelling capability for the platform. While there is an active modernisation of the underlying engineering platform underway, this role’s focus is entirely forward‐looking — building new ML models and analytics capabilities that plug into the AI agent layer. Collaboration with the platform engineering team is important (they maintain the transactional systems that generate the data), but the day‐to‐day work is research, modelling, and shipping predictions, not maintaining application code.
Core Technical Requirements
* Python — primary language; NumPy, pandas, scikit‐learn, and the standard ML stack. This is a Python‐first team
* ML/DL frameworks — production experience with PyTorch, XGBoost, and at least one of: TensorFlow, LightGBM, CatBoost, JAX. Must be comfortable selecting the right framework for the problem, not just defaulting to one
* Statistical modelling and machine learning — regression, classification, time series forecasting, anomaly detection. Must have strong fundamentals, not just framework familiarity
* SQL — must be comfortable writing complex queries against both SQL Server and PostgreSQL
* Databricks / Spark MLlib — a lakehouse POC is being evaluated as the analytics platform. Experience with Databricks or equivalent platforms (Snowflake, BigQuery) for model training and serving is important
* Feature engineering — ability to extract meaningful signals from noisy industrial/manufacturing data across multiple data sources and time scales
* Model evaluation and validation — rigorous approach to accuracy measurement, A/B testing, and production monitoring
* Linux / CLI proficiency — must be comfortable working in Linux environments and command‐line tooling
AI-Assisted Development Methodology
Solvace is transitioning towards AI‐assisted development as a core engineering practice. Candidates should demonstrate:
* Hands‐on experience with AI coding tools — Claude Code, OpenAI Codex, GitHub Copilot, Cursor, or similar. Engineers who have integrated these tools into their professional workflow, not just experimented casually
* Spec‐driven development — ability to write clear technical specifications that can be used to drive both human and AI‐assisted implementation, with strong evaluation criteria and test coverage
* Portfolio evidence — professional projects or side projects that demonstrate AI‐assisted development practices. Contributions to or experimentation with emerging projects like OpenClaw are a strong signal
* Testing and evaluation rigour — experience building robust test suites, automated quality gates, and evaluation frameworks that ensure AI‐assisted code meets production standards
Nice-to-Have
* Manufacturing / industrial domain experience — OEE, SPC, quality control, process optimisation, predictive maintenance
* Go or Rust — valued as evidence of strong engineering fundamentals, even though Python is the primary language
* MLOps / model serving — experience deploying models to production (MLflow, SageMaker, Databricks Model Serving, or equivalent)
* NLP / text analytics — useful for the RCA recommendation engine (analysing historical root cause analysis text)
* LLM fine‐tuning — understanding of when to fine‐tune vs. prompt‐engineer for domain‐specific tasks
* Correlation analysis at scale — KPI prediction requires cross‐series correlation across multiple data sources and time windows
* Time series databases or tools — experience with temporal data patterns common in manufacturing (shift patterns, seasonal production cycles, maintenance windows)
Candidate USP / Why Join
* Build production AI models from zero — this person defines the entire data science and modelling capability for the platform
* Rich, real‐world manufacturing data — quality inspections, KPI time series, safety observations, 5W2H root cause analysis records, process parameters — across multiple global clients. This is not synthetic data; these are real operational datasets from factory floors
* Analytics platform taking shape — a lakehouse POC is being evaluated and the production analytics architecture is being defined. This person will influence platform decisions and focus on modelling, not plumbing
* Direct product impact — models they build become AI agent capabilities used by manufacturing teams daily. The feedback loop from model to product is weeks, not years
* Manufacturing AI is a blue ocean — most data science talent is concentrated in adtech, fintech, and social; manufacturing AI is an underserved domain with massive commercial potential and genuinely interesting data problems
* Small team, high autonomy — they own the full lifecycle from exploration to production