Key Responsibilities:
- Lead the development and implementation of the GenAI Product platform, including its core components: StreamSync, RAGCore, DocForge etc.
- Manage and mentor ML scientists.
- Design and optimize Retrieval-Augmented Generation (RAG) systems for processing enterprise architecture documents.
- Develop and implement advanced NLP solutions for document analysis and knowledge extraction.
- Architect scalable ML pipelines for processing and analyzing Solution Architecture Documents (SADs).
- Collaborate with enterprise architects and stakeholders to understand requirements and deliver AI-powered solutions.
- Drive technical decision-making for ML infrastructure, model selection, and system architecture.
- Ensure compliance with enterprise standards and security requirements.
- Lead ML model evaluation, optimization, and deployment strategies.
- Establish best practices for ML development and documentation.
Required Qualifications:
- Master's in Computer Science or Machine Learning or related field.
- 5+ years of experience in machine learning, with at least 2 years in leadership roles (optional).
- Expertise in Natural Language Processing and Large Language Models.
- Knowledge of RAG systems and vector databases.
- Knowledge of enterprise software development and system architecture.
- Expert knowledge of Python and ML frameworks.
- Experience with cloud platforms (AWS) and MLOps practices.
- Understanding of enterprise architecture principles and documentation.
Preferred Qualifications:
- Familiarity with architecture documentation tools (e.g., Lucid).
- Background in enterprise solution architecture.
- Experience with vector databases (e.g., Azure CosmosDB).
- Experience in building GenAI systems (e.g., RAG).
Technical Skills:
- Machine Learning: Advanced NLP, LLMs, RAG systems, vector embeddings.
- Programming: Python, API development.
- Cloud & Infrastructure: AWS, Azure, containerization.
- Data Processing: Document processing pipelines, text analytics.
- Tools & Frameworks: Llama-index, LangChain, PyTorch/TensorFlow, vector databases, MLOps tool.