Description

Data Source Analysis and Integration
• Analyze existing data sources, including SharePoint Online (documents) and Databricks SQL Warehouse (structured data).
• Design and implement methods for extracting and integrating data from these sources.
b. Data Enrichment Layer Design
• Develop techniques for text extraction, metadata standardization, and semantic enrichment.
• Design a scalable enrichment layer to process and enhance data for downstream AI/ML use.
c. Embedding Data Pipeline Development
• Build and automate ETL/ELT pipelines for embedding data in Vertex AI.
• Apply NLP and data processing techniques to ensure data is suitable for AI/ML models.
d. Documentation and Collaboration
• Prepare comprehensive technical documentation for all developed processes and pipelines.
• Collaborate with AI/ML teams to ensure enriched data meets project requirements.
3. Skills/Experience Required
• Bachelor's degree in computer science, data engineering, or related field with 3+ years of experience or Master's Degree in computer science, data engineering, or related field.
• Proven experience in data engineering and enrichment, especially with unstructured and structured data.
• Proficiency in Python, SQL, and cloud platforms (preferably GCP and Vertex AI).
• Experience with NLP techniques and data processing frameworks.
• Strong problem-solving, communication, and documentation skills.

Education

Any Graduate