Technical Expertise:
* Solid experience in Python programming, particularly using data manipulation and processing libraries such as Pandas, NumPy, and Apache Spark.
* Practical experience in designing, developing, and maintaining robust data ingestion pipelines.
* Demonstrated ability to optimize code efficiency, database queries, and system performance.
* Hands-on experience with open-source data frameworks like Apache Spark, Apache Kafka, and Apache Airflow.
* Strong proficiency in SQL, including advanced query development and performance tuning.
* Good understanding of distributed computing principles and big data ecosystems.
* Familiar with version control tools (Git) and CI/CD automation pipelines.
* Experience working with relational databases such as PostgreSQL, MySQL, or equivalent platforms.
* Skilled in using containerization technologies including Docker and Kubernetes.
* Experience with workflow orchestration tools like Apache Airflow or Dagster.
* Strong grasp of data warehousing methodologies, including dimensional modelling and schema design.
* Understanding of cloud infrastructure management, preferably using Infrastructure-as-Code (IaC) tools and approaches.
* Familiar with streaming data pipelines and real-time analytics solutions.