Job Description:
Full-time highly skilled data scientist for a short-term contract with recent and extensive experience with Databricks and Large Language Models to help us create a chatbot.
They need to have demonstrated skills to:
- Develop and fine-tune Natural Language Processing (NLP) models to power our chatbot
- Set up and manage a vector database to ensure efficient information retrieval
- Create robust data pipelines to ingest and process diverse text data from various sources, including the web
- Deploy and maintain machine learning applications within an Azure cloud environment
- Evaluate and iterate on Augmented Generation (RAG) system performance to achieve high accuracy and relevance
Key Expertise and Experience Required:
- 5+ years of Databricks recent experience and strong MLOps/LLMOps proficiency
- Deep understanding of Natural Language Processing (NLP) fundamentals (tokenization, embeddings) and LLM chatbot development
- Hands-on experience with RAG and information retrieval systems (indexing, ranking).
- Expertise in Python and its data science libraries (Pandas, NumPy).
- Ability to integrate APIs (REST APIs) for seamless chatbot functionality.
- Proficiency with Vector Databases (e.g., Mosaic AI Vector Search, Chroma DB, Elasticsearch) and embeddings.
- Experience with data ingestion, processing, cleaning, and chunking from various sources (documents, websites, databases).
- Strong command of Azure Cloud for development and deployment.
AI Tools Experience Required:
- Databricks, Unity Catalogue, Mosaic AI Vector Search Vector, other vector databases like Chroma DB, Elasticsearch, or similar
- Hugging Face
- Apache Nutch or similar tool for scraping websites
- Azure cloud