Senior Data Engineer / Scientist
Location: Edinburgh, Scotland (office‑based, hybrid 3:2).
Lenovo is a global technology leader. We build smarter technology for all and are driven by an AI‑first vision. The Lenovo AI Technology Center (LATC) is a world‑class center of excellence gathering researchers, engineers, and innovators to deliver AI across Lenovo’s entire product portfolio. This role is critical to the success of our machine learning initiatives, focusing on the creation, quality control, and governance of the datasets that power our models. You will bridge the gap between raw data and model readiness, working closely with model developers to understand their needs and deliver high‑quality, reliable data.
Responsibilities
* Design, build, and implement processes for creating task‑specific training datasets, including labeling, annotation, and data augmentation.
* Leverage tools and technologies to accelerate dataset creation and improvement, scripting, automation, and data labeling platforms.
* Perform thorough data analysis to assess data quality, identify anomalies, and ensure data integrity; use machine learning tools to evaluate dataset performance and identify areas for improvement.
* Utilize database systems (SQL and NoSQL) and big data tools (Spark, Hadoop, cloud data warehouses such as Snowflake, Redshift, BigQuery) to process, transform, and store large datasets.
* Implement and maintain data governance best practices, including data source tracking, lineage documentation, and license management; ensure compliance with data privacy regulations.
* Work closely with machine learning engineers and data scientists to understand their data requirements, provide clean and well‑documented datasets, and iterate on data solutions based on model performance feedback.
* Create and maintain clear and concise documentation for data pipelines, data quality checks, and data governance procedures.
* Keep up to date with the latest advancements in data engineering, machine learning, and data governance.
Qualifications
* Bachelor’s or Master’s degree in Computer Science, Computer Engineering, Electrical Engineering, Statistics, Mathematics, or a related field.
* 15+ years of experience in a data engineering or data science role.
* Mastery in Python and SQL; experience with Java, Scala, or similar languages is a plus.
* Strong experience with relational databases (PostgreSQL, MySQL) and NoSQL databases (MongoDB, Cassandra).
* Experience with big data technologies such as Spark, Hadoop, or cloud data warehousing solutions (Snowflake, Redshift, BigQuery).
* Proficiency in data manipulation and cleaning techniques using Pandas, NumPy, and other data processing libraries.
* Solid understanding of machine learning concepts and techniques, including data preprocessing, feature engineering, and model evaluation.
* Understanding of data governance principles and practices, including data lineage, data quality, and data security.
* Excellent written and verbal communication skills, with the ability to explain complex technical concepts to both technical and non‑technical audiences.
* Strong analytical and problem‑solving skills.
Bonus Points
* Experience with data labeling platforms (Labelbox, Scale AI, Amazon SageMaker Ground Truth).
* Experience with MLOps practices and tools (Kubeflow, MLflow).
* Experience with cloud platforms (AWS, Azure, GCP).
* Experience with data visualization tools (Tableau, Power BI).
* Experience with building and maintaining data pipelines using orchestration tools (Airflow, Prefect).
What We Offer
* Opportunities for career advancement and personal development.
* Access to a diverse range of training programs.
* Performance‑based rewards that celebrate your achievements.
* Flexibility with a hybrid work model (3:2) that blends home and office life.
* Electric car salary sacrifice scheme.
* Life insurance.
We are an Equal Opportunity Employer and do not discriminate against any employee or applicant for employment because of race, color, sex, age, religion, sexual orientation, gender identity, national origin, status as a veteran, and basis of disability or any federal, state, or local protected class.
#J-18808-Ljbffr