Lead PySpark Engineer
As a Lead PySpark Engineer, you will design, develop, and optimise complex data processing solutions on AWS. You will work hands-on with PySpark, modernise Legacy data workflows, and support large-scale SAS-to-PySpark migration programmes. This role requires strong engineering discipline, deep data expertise, and the ability to deliver production-ready data pipelines within a financial services environment.
Skill Profile:
* PySpark - P3 (Advanced)
* AWS - P3 (Advanced)
* SAS - P1 (Foundational)
Key Responsibilities Technical Delivery
* Design, develop, and fix complex PySpark code for ETL/ELT and data-mart workloads.
* Convert and refactor SAS code into PySpark using SAS2PY tooling and manual optimisation.
* Build production-ready PySpark solutions that are scalable, maintainable, and reliable.
* Modernise and stabilise Legacy data workflows into cloud-native architectures.
* Ensure accuracy, quality, and reliability across data transformation processes.
Cloud & Data Engineering (AWS-Focused)
* Build and deploy data pipelines using AWS services such as EMR, Glue, S3, Athena.
* Optimise Spark workloads for performance, partitioning, cost efficiency, and scalability.
* Use CI/CD pipelines and Git-based version control for deployment and automation.
* Collaborate with engineers, architects, and stakeholders to deliver cloud data solutions.
Core Technical SkillsPySpark & Data Engineering
* 5+ years of hands-on PySpark experience (P3).
* Ability to write production-grade data engineering code.
* Strong understanding of:
o ETL/ELT
o Data modelling
o Facts & dimensions
o Data marts
o Slowly Changing Dimensions (SCDs)
Spark Performance & Optimisation
* Expertise in Spark execution, partitioning, performance tuning, and optimisation.
* Troubleshooting distributed data pipelines at scale.
Python & Engineering Quality
* Strong Python coding capability with emphasis on clean code and maintainability.
* Experience applying engineering best practices including:
o Parameterisation
o Configuration management
o Structured logging
o Exception handling
o Modular design
SAS & Legacy Analytics (P1)
* Foundational knowledge of SAS (Base SAS, Macros, DI Studio).
* Ability to understand and interpret Legacy SAS code for migration.
Data Engineering & Testing
* Understanding of end-to-end data flows, orchestration, pipelines, and CDC.
* Experience executing ETL test cases and building unit/data comparison tests.
Engineering Practices
* Proficient with Git workflows, branching, pull requests, and code reviews.
* Ability to document technical decisions, data flows, and architecture.
* Exposure to CI/CD tooling for data engineering pipelines.
AWS & Platform Skills (P3)
* Strong hands-on experience with:
o S3
o EMR/Glue
o Glue Workflows
o Athena
o IAM
* Understanding of distributed computing and big data processing on AWS.
* Experience deploying and operating data pipelines in cloud environments.
Desirable Skills
* Experience in banking or financial services environments.
* Background in SAS modernisation or cloud migration programmes.
* Familiarity with DevOps practices and infrastructure-as-code tools (Terraform, CloudFormation).
* Experience working in Agile/Scrum delivery teams.