Key Duties and Responsibilities
- Deploy, configure, and maintain High-Performance Computing (HPC) systems to ensure optimal performance and reliability.
- Customize, configure, and troubleshoot job schedulers such as SLURM, PBA, LSF, and others to meet specific operational requirements.
- Collaborates with others to brainstorm best techniques to resolve complex technological infrastructure, build or packaging problems
- Performs moderately complex debug and testing actions on code, processes, and deployments to identify ways to streamline execution and minimize errors encountered
- Responsibilities may involve build and packaging open-source third-party software packages
- Maintenance of tools, infrastructure and build environment for product creation staff
- Employs best practices and helps to maintain them through technical reviews
Minimum Education/Certification Requirements and Experience
- BS in Engineering, Computer Science, or related field with 5 years’ experience, MS with 3 years’ experience
- At least 3 years of full hand-on experience in HPC deployment and configuration
Preferred Qualifications and Skills
- Thorough knowledge of software development tools, compilers, and packaging software
- Extensive knowledge of Windows and/or Linux operating systems and on cloud and DevOps technologies like Kubernetes, Terraform, Ansible
- Proficiency in multiple coding languages such as Python, Go, etc.
- Passion for crafting robust and efficient automated build systems
- Good communication and interpersonal skills
- Ability to learn quickly and to collaborate with others in a geographically distributed team