Description

  • Deep understanding of Ray, Operate, monitor, and triage all aspects of our production and non-production environments. Automate deployment and orchestration of services into the cloud environment as well as other routine processes.       Work on multiple cloud environment like AWS and GCP.
  • Actively participate in capacity planning, scale testing, and disaster recovery exercises.
  • Interact with and support partner teams, including Engineering, QA, and program management.
  • Troubleshoot customer concerns for ML Tuning and inference endpoints on Ray.
  • Designing and implementing RESTful/RPC API and services using Golang OR Python.
  • Implement SLO/SLI, error budget reporting for various customers

Education

Any Gradute