We are seeking an experienced ML Engineer with hands-on CUDA SDK experience to join our team in enhancing, porting and validating PyTorch-based Large Language Models (LLMs) using CUDA SDK APIs. The successful candidate will be responsible for extending, replacing and debugging the underlying CUDA code to ensure seamless functionality on company-specific AI processors.
Key Responsibilities
Enhance, port and validate PyTorch-based LLMs on company-specific AI processors using CUDA SDK APIs
Debug and troubleshoot issues related to CUDA code integration with PyTorch models
Extend and modify CUDA code to optimize performance on company-specific AI processors
Replace existing CUDA code with custom implementations to meet specific requirements
Collaborate with cross-functional teams to ensure successful integration of LLMs with company-specific AI processors
Develop and maintain validation frameworks and tools for PyTorch-based LLMs
Analyze and optimize the performance of LLMs on company-specific AI processors
Requirements
Bachelor's or Master's degree in Computer Science, Electrical Engineering, or related fields
Strong experience with CUDA programming and PyTorch framework
In-depth knowledge of deep learning models, particularly Large Language Models (LLMs)
Proficiency in C++ and Python programming languages
Experience with debugging and troubleshooting complex software issues
Excellent problem-solving skills and attention to detail
Strong communication and collaboration skills
Nice to Have
Experience with AI processor architecture and design
Knowledge of other deep learning frameworks, such as TensorFlow
Performance Tuning - Profile and tune GPU kernels for speed, memory usage, and scalability.
Personal Attributes
Passion for high-performance computing and AI/ML innovation
Strong problem-solving and analytical skills.
Excellent communication and teamwork abilities.
Self-motivated, with a drive to push boundaries and deliver results
Bachelor's or Master's degrees