Key Responsibilities:
Provide technical support for AI infrastructure, including AI as a Service (AIaaS) solutions.
Manage and optimize GPU-based compute environments for AI workloads.
Oversee containerization strategies and orchestration tools for AI applications.
Troubleshoot and resolve performance and infrastructure-related issues.
Collaborate with AI engineers, DevOps, and infrastructure teams to enhance system efficiency.
Stay updated with industry best practices in AI infrastructure support.
Work with platforms similar to Penguin Computing, Nimbix, PBS Works, Rescale, and Adaptive Computing.
Required Skills & Qualifications:
Strong experience with AI infrastructure and high-performance computing (HPC) environments.
Proficiency in managing and optimizing GPU clusters.
Hands-on experience with containerization tools such as Docker and Kubernetes.
Familiarity with AI workload orchestration and job scheduling.
Experience with cloud-based AI infrastructure solutions.
Ability to diagnose and troubleshoot hardware and software issues.
Knowledge of automation and scripting (Python, Bash, etc.).
Strong communication and problem-solving skills.
Preferred Qualifications:
Experience with AIaaS platforms and enterprise AI deployment.
Previous experience working with AI infrastructure providers like Penguin Computing, Nimbix, PBS Works, Rescale, or Adaptive Computing.
Understanding of networking and storage solutions for AI workloads.
Any Graduate