JOB TITLE: PRINCIPAL DATA ENGINEER (DATACLOUD)
LOCATION:PARSIPPANY NJ (100% ONSITE ROLE5DAYS/WEEK)(NO OPTION FOR REMOTE/HYBRID)100% ONSITE ROLE FROM DAY1.
JOB TYPE: CONTRACT
EXP LEVEL: MIN 14YEARS.
NOTE: US CITIZENS GC GC EAD TN H1B VISA HOLDERSARE ONLY APPLICABLE.
PLEASE DONT SHARE /H1TRANSFERS/OPT/CPT/E3 VISA HOLDERS FOR THIS ROLE.
RESPONSIBILITIES:
* Work closely withcrossfunctional teams including product managers data scientistsand engineers to understand project requirements andobjectivesensuring alignment with overall businessgoals.
* Build data ingestion framework and datapipelines to ingest unstructured and structured data from variousdata sources such as SharePoint Confluence Chat Bots Jira ExternalSites etc. into our existing OneData platform.
* Designa scalable target state architecture for data processingbased ondocument content (Data types may include but are not limited to:XML HTML DOC PDF XLS JPEG TIFF and PPT) including PII/CII handlingpolicybased hierarchy rules and Metadata tagging.
*Design development and deployment of optimal data pipelinesincluding incremental data ingestion strategy by taking advantageof leadingedge technologies through experimentation and iterativerefinement.
* Design and implement vector databases toefficiently store and retrieve highdimensional vectors.
* Conducting research to stay up to date with the latestadvancements in generative AI services and identify opportunitiesto integrate them into our products and services.
*Implement data quality and validation checks to ensure accuracy andconsistency of data.
* Buildautomation that effectivelyand repeatably ensures quality security integrity andmaintainability of our solutions.
* Monitor andtroubleshoot data pipeline performance identifying and resolvingbottlenecks and issues.
* Define and implement dataaccess policies;implement and maintain data security measures andaccess policies for cloud storage buckets and vectordatabases.
QUALIFICATIONSREQUIRED
* Bachelors degree in engineering ComputerScience or a related field; Masters degree is a plus.
*10 YEARS RELEVANT INDUSTRY AND FUNCTIONAL EXPERIENCE IN DATABASEAND CLOUDBASED TECHNOLOGIES
* Experience in workingwithMACHINE LEARNING AND AI CONCEPTS RELATED TO RAG ARCHITECTURELLMSSEMBEDDING AND DATA INSERTION INTO A VECTOR DATASTORE.
* Experience in building data ingestion pipelinesforStructured and Unstructured data both for storage and optimalretrieval
* Experience working with Cloud data storesnoSQL Graph and Vector databases.
* Proficiency withlanguages such asPYTHON SQL AND PYSPARK
* Experienceworking withDATABRICKS AND SNOWFLAKE TECHNOLOGIES.
*Experience with relevant code repository and project tools such asGitHub JIRA and Confluence
* Working experience withContinuous Integration & Continuous Deployment with handsonexpertise on Jenkins Terraform Splunk and Dynatrace.
*Highly innovative with aptitude for foresight systems thinking anddesign thinking with a bias towards simplifyingprocesses.
* Detail oriented individual with stronganalytical problemsolving and organizational skills
*Ability to clearly communicate to both technical and businessteams.
Python,SQL