Description

Job Description:

Full-time highly skilled data scientist for a short-term contract with recent and extensive experience with Databricks and Large Language Models to help us create a chatbot.

 

They need to have demonstrated skills to:

  • Develop and fine-tune Natural Language Processing (NLP) models to power our chatbot
  • Set up and manage a vector database to ensure efficient information retrieval
  • Create robust data pipelines to ingest and process diverse text data from various sources, including the web
  • Deploy and maintain machine learning applications within an Azure cloud environment
  • Evaluate and iterate on Augmented Generation (RAG) system performance to achieve high accuracy and relevance

Key Expertise and Experience Required:

  • 5+ years of Databricks recent experience and strong MLOps/LLMOps proficiency
  • Deep understanding of Natural Language Processing (NLP) fundamentals (tokenization, embeddings) and LLM chatbot development
  • Hands-on experience with RAG and information retrieval systems (indexing, ranking).
  • Expertise in Python and its data science libraries (Pandas, NumPy).
  • Ability to integrate APIs (REST APIs) for seamless chatbot functionality.
  • Proficiency with Vector Databases (e.g., Mosaic AI Vector Search, Chroma DB, Elasticsearch) and embeddings.
  • Experience with data ingestion, processing, cleaning, and chunking from various sources (documents, websites, databases).
  • Strong command of Azure Cloud for development and deployment.

AI Tools Experience Required:

  • Databricks, Unity Catalogue, Mosaic AI Vector Search Vector, other vector databases like Chroma DB, Elasticsearch, or similar
  • Hugging Face
  • Apache Nutch or similar tool for scraping websites
  • Azure cloud

 

Education

Any Graduate