Description

  • Proven ability to orchestrate bare metal linux systems at scale including building automation for firmware updates, bios config management, configuring PXE environments.
  • Deep Linux systems experience including troubleshooting network interfaces, developing and applying configuration management, security best practices and monitoring and alerting.
  • Strong automation mindset.
  • Expert knowledge in 1 or more orchestration tools such as MaaS, Salt, Chef, Ansible or Puppet.
  • Strong communication skills.
  • Your job will involve writing detailed documentation for others to pick up or leading knowledge sharing sessions with operations teams.
  • Bonus skills include:
  • Hands-on experience in High Performance Computing (HPC) clustered environments from Nvidia or AMD.
  • Experience in performing automated wide scale testing on NCCL or other frameworks.
  • Network engineering experience with VyOS platforms.

What You'll Be Working On:

  • Provisioning and automating GPU Bare Metal Deployments
  • DevOps - Assist customer support and Cloud Ops teams with GPU specific knowledge/debugging during customer escalations.
  • Performance testing, analysis and monitoring
  • Firmware, BIOS, Kernel Upgrades and Testing

Education

Any Gradute