Job Description
Mandatory Skills: Python, PySpark, AWS, Cloud, AWS Services, AWS Components
* Designing and developing scalable, testable data pipelines using Python and Apache Spark
* Orchestrating data workflows with AWS tools like Glue, EMR Serverless, Lambda, and S3
* Applying modern software engineering practices: version control, CI/CD, modular design, and automated testing
* Contributing to the development of a lakehouse architecture using Apache Iceberg
* Collaborating with business teams to translate requirements into data-driven solutions
* Building observability into data flows and implementing basic quality checks
* Participating in code reviews, pair programming, and architecture discussions
* Continuously learning about the financial indices domain and sharing insights with the team
WHAT YOU'LL BRING:
* Writes clean, maintainable Python code (ideally with type hints, linters, and tests like pytest)
* Understands data engineering basics: batch processing, schema evolution, and building ETL pipelines
* Has experience with or is eager to learn Apache Spark for large-scale data processing
* Is familiar with the AWS data stack (eg S3, Glue, Lambda, EMR)
* Enjoys learning the business context and working closely with stakeholders Works well in Agile teams and values collaboration over solo heroics
Nice-to-haves:
* It's great (but not required) if you also bring:
* Experience with Apache Iceberg or similar table formats
* Familiarity with CI/CD tools like GitLab CI, Jenkins, or GitHub Actions
* Exposure to data quality frameworks like Great Expectations or Deequ
* Curiosity about financial markets, index data, or investment analytics
Note: Hybrid (2 or 3 days a week, Onsite)