Description

  • 5+ years of experience in distributed systems with deep knowledge in computer science fundamentals
  • Deep understanding of Ray and KubeRay
  • Troubleshooting Ray – team uses Ray as a service
  • Experience with containerization and orchestration technologies, such as Docker and Kubernetes.
  • Experience in delivering data and machine learning infrastructure in production environments
  • Experience configuring, deploying and troubleshooting large scale production environments
  • Experience in designing, building, and maintaining scalable, highly available systems that prioritize ease of use
  • Experience with alerting, monitoring and remediation automation in a large scale distributed environment
  • Extensive programming experience in Java, Python or Go
  • Strong collaboration and communication (verbal and written) skills
  • B.S., M.S., or Ph.D. in Computer Science, Computer Engineering, or equivalent practical experience
  • Experience with ML Training/Inference profiling and optimization

Education

Any Gradute