Description

  • Optimize performance and scalability of HPC applications running in containerized environments.
  • Stay up to date with the latest advancements in HPC, cloud technologies.
  • Collaborate with other DevOps engineers and developers to ensure seamless integration of HPC solutions.
  • Configure Linux OS for HPC needs.
  • Implement and maintain Kubernetes clusters for HPC workloads.
  • Explore, Qualify & tune open source cloud-based technology stacks for High Performance Compute demands.
  • Design robust high performant cloud-based software architecture systems involving CPU/GPU workloads, scalable/robust storages, high-bandwidth inter-connects
  • Strong knowledge of HPC systems and cloud computing technologies (gRPC, Kafka, Kubernetes, ZeroMQ, Redis, Ceph, etc.).
  • Strong Linux Performance tunning
  • CPU and GPU Performance tunning
  • Proven experience with Kubernetes and container orchestration
  • Validated in-depth and flavor agnostic knowledge of Linux systems (SuSE, RedHat, Rocky, Ubuntu)
  • Experience in different remote boot technologies like System-D, Net boot/PXE, Linux HA.
  • Strong understanding of TCP/IP fundamentals and knowledge of protocols, DNS, DHCP.
  • Strong fundamentals with respect to linux networking, storages.
  • Proficiency in scripting languages such as Ansible, Python and Bash.
  • Decent proficiency in low-level language as in c.
  • Experience with CI/CD tools like Jenkins, GitLab or similar.
  • Familiarity with HPC workload managers and schedulers (e.g., Slurm, PBS).
  • Experience with one or more of the listed Configuration Mgmt utilities. (Salt, Chef, Puppet etc) .

Education

BE/BTech/MCA in Electrical Engineering/Computer Engineering