Job Description
This is an exciting opportunity for an experienced developer of large-scale data solutions. You will join a team delivering a transformative cloud hosted data platform for a key Version 1 customer.
The ideal candidate will have a proven track record as a senior/self-starting data engineer in implementing data ingestion and transformation pipelines for large scale organisations. We are seeking someone with deep technical skills in a variety of technologies, specifically SPARK performance\tuning\optimisation and Databricks, to play an important role in developing and delivering early proofs of concept and production implementation.
You will ideally have experience in building solutions using a variety of open source tools & Microsoft Azure services, and a proven track record in delivering high quality work to tight deadlines.
Your main responsibilities will be:
1. Designing and implementing highly performant data ingestion & transformation pipelines from multiple sources using Databricks and Spark
2. Streaming and Batch processes in Databricks
3. SPARK performance\tuning\optimisation
4. Providing technical guidance for complex geospatial problems and spark dataframes
5. Developing scalable and re-usable frameworks for ingestion and transformation of large data sets
6. Data quality system and process design and implementation.
7. Integrating the end to end data pipeline to take data from source systems to target data repositories ensuring the quality and consistency of data is maintained at all times
8. Working with other members of the project team to support delivery of additional project components (Reporting tools, API interfaces, Search)
9. Evaluating the performance and applicability of multiple tools against customer requirements
10. Working within an Agile delivery / DevOps methodology to deliver proof of concept and production implementation in iterative sprints.
Qualifications
11. Direct experience of building data piplines using Azure Data Factory and Databricks Spark.
12. Building data integration with Python
13. Hands on experience designing and delivering solutions using the Azure Data Analytics platform.
14. Experience building data warehouse solutions using ETL / ELT tools like Informatica, Talend.
15. Comprehensive understanding of data management best practices including demonstrated experience with data profiling, sourcing, and cleansing routines utilizing typical data quality functions involving standardization, transformation, rationalization, linking and matching.