Scientific data engineer

Boston

Arrayo

Data engineer

€60,000 - €80,000 a year

Posted: 23h ago

Offer description

Get AI-powered advice on this job and more exclusive features.

Direct message the job poster from Arrayo

We are looking for a Data Engineer to join our cross-disciplinary team and help shape the future of scientific data management in drug discovery. In this role, you’ll play a central part in developing scalable, metadata-rich data products that adhere to FAIR principles, working hand-in-hand with platform, scientific, and AI/ML teams.

Your contributions will accelerate cheminformatics research by building and maintaining curated datasets and robust data pipelines. These pipelines support analytics, modeling, and deep learning within platforms such as AWS, Azure Databricks, and Domino Data Lab.

Key Responsibilities

Develop, deploy, and maintain scalable, cloud-native data pipelines for molecular modeling and related domains.

Operationalize architectural blueprints using modern orchestration frameworks and cloud services (AWS/Azure).

Partner with scientists, cheminformaticians, and data scientists to understand domain-specific requirements and deliver efficient, reusable data solutions.

Process and integrate molecular property datasets, embedding rich metadata to maximize downstream value for AI/ML applications.

Establish data quality, lineage, and governance standards in line with FAIR principles, ensuring reproducibility, traceability, and compliance.

Enable interactive dataset exploration through tools like Spotfire.

Shape schema design, enrich metadata, and develop APIs for reliable and flexible data access.

Optimize storage and compute performance across data lakes and warehouses (e.g., Delta Lake, Parquet, Redshift).

Document data contracts, pipeline logic, and operational best practices to ensure long-term sustainability and effective collaboration.

Required Qualifications

Demonstrated experience as a data engineer in biopharmaceutical or life sciences, particularly supporting drug discovery or translational research.

Hands-on work with molecular structure data, computed properties, simulation outputs, or imaging datasets.

Proficiency in Python (including Pandas or PySpark) and SQL, with exposure to ETL/orchestration tools such as Airflow or dbt.

Strong knowledge of cloud-native services on AWS (e.g., S3, Glue, Lambda, Athena) and Azure (Data Factory, Data Lake).

Track record of collaborating with scientific teams and translating research needs into scalable data solutions.

Preferred Qualifications

Experience with cheminformatics libraries (e.g., RDKit, Open Babel, CDK).

Familiarity with scientific data standards, ontologies, and best practices for metadata capture.

Understanding of data science workflows in computational chemistry, bioinformatics, or AI/ML-driven research.

Orchestration & ETL: Apache Airflow, Prefect

Scientific Libraries (Preferred): RDKit, Open Babel, CDK

Seniority level

* Seniority level

Mid-Senior level

Employment type

* Employment type

Full-time

Job function

* Job function

Engineering, Research, and Information Technology
* Industries

Biotechnology Research, Pharmaceutical Manufacturing, and IT Services and IT Consulting

Referrals increase your chances of interviewing at Arrayo by 2x

Sign in to set job alerts for “Data Engineer” roles.

Boston, MA $170,000.00-$240,000.00 4 months ago

Boston, MA $145,000.00-$204,000.00 2 weeks ago

Bedford, MA $80,000.00-$100,000.00 1 week ago

Boston, MA $225,590.00-$235,400.00 8 hours ago

Boston, MA $130,000.00-$180,000.00 4 months ago

Boston, MA $177,406.00-$196,900.00 8 hours ago

Boston, MA $93,500.00-$133,600.00 8 hours ago

Boston, MA $80,000.00-$100,000.00 2 weeks ago

Worcester, MA $112,597.33-$152,810.66 1 month ago

We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.

#J-18808-Ljbffr

Apply

Create E-mail Alert

Save

See more jobs