Description

Responsibilities

 

  • Create and uphold efficient, scalable, and distributed training systems—including data preprocessing, training orchestration, and model assessment—for training large-scale AI models.
  • Enhance the efficiency of training procedures to improve performance and use of resources, while maintaining scalability and dependability.
  • Collaborate with researchers to create training and evaluation pipelines for state-of-the-art algorithms.
  • Develop and design benchmarks for evaluating ML models.
  • Perform training and and fine-tuning of foundation models for robotic applications .
  • Monitor and analyze pipelines, identifying bottlenecks and proposing solutions to improve efficiency and performance.

Ensure the robustness and reliability of the training infrastructure, including automated testing and continuous integration

Education

Any Graduate