Overview
At PwC, our people in data and analytics focus on leveraging data to drive insights and make informed business decisions. They utilise advanced analytics techniques to help clients optimise their operations and achieve their strategic goals. In data analysis at PwC, you will focus on utilising advanced analytical techniques to extract insights from large datasets and drive data-driven decision-making. You will leverage skills in data manipulation, visualisation, and statistical modelling to support clients in solving complex business problems.
Experience
Exp – 8+ yrs
Position
PwC US - Acceleration Center is looking for an experienced and visionary GenAI Data Engineer to join our team as a Manager. This leadership role involves overseeing the development and maintenance of data pipelines, the implementation of machine learning models, and the optimization of data infrastructure for our GenAI projects. The ideal candidate will have an extensive background in data engineering, with a deep focus on GenAI technologies, and a solid understanding of data processing, event-driven architectures, containerization, and cloud computing.
Responsibilities
* Lead the design, development, and maintenance of robust data pipelines and ETL processes for GenAI projects.
* Manage and guide a team of data scientists and data engineers in implementing complex data and machine learning systems.
* Strategize and optimize data infrastructure and storage solutions to ensure efficient, scalable, and reliable data processing across projects.
* Implement and optimize real-time data streaming solutions using platforms such as Kafka, Spark Streaming, or similar.
* Oversee the deployment of containerization technologies like Kubernetes and Docker to enhance scalability and operational efficiency.
* Direct the development and governance of data lakes, ensuring effective management of large volumes of structured and unstructured data.
* Lead the integration of LLM frameworks (such as Langchain and Semantic Kernel) to advance language processing and analytical capabilities.
* Collaborate with cross-functional teams to architect and implement solution frameworks that align with GenAI project goals.
* Develop and deploy solutions on multiple cloud platforms (Azure, AWS, GCP, Databricks), leveraging cloud-native services and containerization (Kubernetes, Docker).
* Monitor, diagnose, and resolve issues within data pipelines and systems to maintain continuous and smooth operations.
* Stay current with GenAI and data engineering trends; recommend and implement innovative solutions.
* Implement CI/CD pipelines and version control (Git) for efficient development and deployment.
* Translate complex business requirements into effective technical solutions, driving project success and technological innovation.
* Document and standardize data engineering processes, methodologies, and best practices across teams.
* Ensure professional development and certification in solution architecture for team members, maintaining industry best practices.
Requirements
* 8+ years of relevant technical/technology experience, with a strong emphasis on GenAI projects
* Proficiency in Python (minimum 3 years) and SQL (must have), with hands-on experience in Scala, Java, and Shell scripting.
* Experience with Spark and/or Hadoop for distributed data processing.
* Solid understanding of designing and architecting scalable Python applications, particularly for Gen AI use cases, with a strong understanding of various components and systems architecture patterns to make cohesive and decoupled, scalable applications.
* Familiarity with Python web frameworks (Flask, FastAPI) for building web applications around AI models.
* Demonstrated ability to design applications with modularity, reusability, and security best practices in mind (session management, vulnerability prevention, etc.).
* Familiarity with cloud-native development patterns and tools (e.g., REST APIs, microservices, serverless functions).
* Experience deploying and managing containerized applications on Azure/AWS/GCP, Databricks (Azure Kubernetes Service, Azure Container Instances, or similar).
* Strong proficiency in Git for effective code collaboration and management.
* Proficiency in SQL and database management systems
* Excellent collaboration and communication skills.
Nice To Have Skills
* Experience in setting up data pipelines for model training and real-time inference.
* Exposure to LLM frameworks and tools for interacting with large language models.
* Experience developing and deploying machine learning applications in production environments.
* Understanding data privacy and compliance regulations.
* Practical knowledge of ML/DL frameworks such as TensorFlow, PyTorch, and scikit-learn.
* Proficient in object-oriented programming with languages such as Java, C++, or C#.
Educational Background
* BE / B.Tech / MCA / M.Sc / M.E / M.Tech / Master’s Degree / MBA / Any degree
Preferred Qualifications
* Relevant certifications in Databricks, Cloud, Data Engineering related
* Having finance background, or experience working with Finance or Banking domain
#J-18808-Ljbffr