AI/Machine Learning Engineer

Enormous Enterprise LLC
Cupertino, CA, USA

Description

Responsibilities -

· Designing, implementing, and maintaining distributed systems to build world-class ML platforms/products at scale

· Diagnose, fix, improve, and automate complex issues across the entire stack to ensure maximum uptime and performance

· Design and extend services to improve functionality and reliability of the platform

· Monitor system performance, optimize for cost and efficiency, and resolve any issues that arise

· Build relationships with stakeholders across the organization to better understand internal customer needs and enhance our product better for end users

Required Skills -

· 3+ years of experience in distributed systems with deep knowledge in computer science fundamentals

· Experience with containerization and orchestration technologies, such as Docker and Kubernetes.

· Experience in delivering data and machine learning infrastructure in production environments

· Experience configuring, deploying and troubleshooting large scale production environments

· Experience in designing, building, and maintaining scalable, highly available systems that prioritize ease of use

· Experience with alerting, monitoring and remediation automation in a large scale distributed environment

· Extensive programming experience in Java, Python or Go

· Strong collaboration and communication (verbal and written) skills

· B.S., M.S., or Ph.D. in Computer Science, Computer Engineering, or equivalent practical experience

Preferred Skills -

· Understanding of the ML lifecycle and state of the art ML Infrastructure technologies

· Experience with GPU and other type of HPC infrastructure

· Experience with training framework like PyTorch, Tensorflow, JAX

· Deep understanding of Ray and KubeRay

· Experience with ML Training/Inference profiling and optimization

Key Skills

Pytorch Tensorflow Jax Java Python Docker Kubernetes

Education

Any Gradute

Apply Now

Back To Jobs

Posted On: 30+ Days Ago
Experience: 3+ years of experience
Openings: 1
Category: AI / Machine Learning Engineer
Tenure: Contract - Corp-to-Corp Position