Description

Responsibilities:  

  • Designing and developing data pipelines for agentic systems, develop Robust data flows to handle complex interactions between AI agents and Data sources. 
  • Ability to train and fine tune large language models 
  • Design and build the data architecture, including databases, data lakes to support various data engineering tasks. 
  • Develop and manage Extract, Load, transform (ELT) processes to ensure data is accurately and efficiently moved from source systems to analytical platforms used in data science. 
  • Implement data pipelines that facilitate feedback loops, allowing human input to improve system performance in human-in-the-loop systems. 
  • Work with vector databases to store and retrieve embeddings efficiently. 
  • Collaborate with data scientists and engineers to preprocess data, train models, and integrate AI into applications. 
  • Optimize data storage and retrieval with high performance 
  • Statistical analysis, trends, patterns to create data formats from multiple sources. 

   

Qualifications:           

  • Strong Data engineering fundamentals 
  • Utilize Big data frameworks like Spark/Databricks 
  • Training LLMs with structed and unstructured data sets. 
  • Understanding of Graph DB 
  • Experience with Azure Blob Storage, Azure Data Lakes, Azure Databricks 
  • Experience implementing Azure Machine Learning, Azure Computer Vision, Azure Video Indexer, Azure OpenAI models, Azure Media Services, Azure AI Search  
  • Determine effective data partitioning criteria  
  • Utilize data storage system spark to implement partition schemes 
  • Understanding core machine learning concepts and algorithms 
  • Familiarity with Cloud computing skills 
  • Strong programming skills in Python and experience with AI/ML frameworks. 
  • Proficiency in vector databases and embedding models for retrieval tasks. 
  • Expertise in integrating with AI agent frameworks. 
  • Experience with cloud AI services (Azure AI). 
  • Experience with GIS spatial data to create markers on maps ( lat long nearest topology of road, geo-locate between datasets, correlation etc.).  
  • Experience with Department of Transportation Data Domains developing an AI Composite Agentic Solution designed to identify and analyze data models, connect & correlate information to validate hypotheses, forecast, predict and recommend potential strategies and conduct What-if analysis.   
  • Bachelor's or master's degree in computer science, AI, Data Science, or a related field

Education

Bachelor's or Master's degrees