Job Description
1. Should have worked as HPC Infrastructure solution architect for 5+ years with HPC Hardware deployment exposure. They should have overall 12+ years industry experience.
2. Should have Linux administration and Infrastructure knowledge
3. Should have closely worked with application team for use-case deployment in HPC cluster and they should be able to talk about those engagement in detail.
4. Should have worked on Singularity containers + SLURM workload scheduler
5. Good exposure to parallelism performance testing and tuning – HPC Cluster wide tuning and performance troubleshooting
Automation skills
Requirements
· Demonstrated hands-on expertise and working with container technologies (Kubernetes, Docker) and developing complex solutions
· Demonstrate technical architecture and design skills
· Demonstrated understanding of infrastructure design and hands-on experience working with hardware (e.g. server, storage, networking) preferably
· Strong Linux OS and performance experience
· Demonstrated workload experience in one or more key use-case segments (AI/ML, Analytics, HPC)
· Strong analytical and problem-solving skills.
· Advanced Python, Powershell, Javascript, Node.js or similar languages
· Ansible, Terraform and other scripting / automation toolkits
Experience with Agile development practices
Any Graduate