Description

Responsibilities:
• Administration of HPC and VDI clusters
• User Account management for HPC onboarding and offboarding
• Creation and Maintenance of AMI Images in AMI accounts
• Install, configure, and maintain Linux operating systems on HPC clusters.
• Support HPC necessary components and native services of the platform by coordinating with respective providerse.g., EFPortal, AWS RES, CycleCloud, AWS Parallel Cluster, etc.,
• AWS Managed Active Directory support and Management
• Continuous upgrades to the HPC platform and related components - OS, Java, Python, EFPortal, etc.
• Implement and maintain necessary compliance controls i.e., US Export Control, Confidentiality. Conduct regular audits, share the findings and implement corrective actions as required.
• Co-ordinate with other teams like v-drive team in testing and migrating/installing engineering applications to the platform.
• Manage job schedulers such as Slurm or LSF.
• Utilize node provisioning tools like Werewolf.
• Troubleshoot system issues and provide technical support to users.
• Monitor system performance and ensure optimal operation of the HPC environment.
• Collaborate with other IT professionals to integrate new technologies into the existing infrastructure.
• Progressive experience in HPC system administration, preferably in a Redhat/CentOS Linux environment.
• AWS Cloud formation templates to build infrastructure for HPC and storage Amazon FSx for Netapp and Lustre.
• Experience with parallel file systems and storage solutions.
• Strong knowledge of job schedulers such as Slurm or LSF.
• Familiarity with node provisioning tools like Werewolf.
• Proficiency in Linux OS administration
• Knowledge of job scheduling tools (e.g., Slurm)
• Understanding of node provisioning tools (e.g., Werewolf)
• Excellent problem-solving abilities
• Linux+ certification preferred
• Top Secret Clearance: TS/SCI preferred
• On-site presence at customer location in Stennis, MS
• Availability for some on-call/weekend work
• Hands on experience setting up HPC compute cluster.
• Setup PBS job scheduler and supporting PBS servers
• Experience with Redhat and Rocky Linux; bash scripting
• Nice to have Docker, Kubernetes experience
• Nice to have Storage knowledge
• Nice to have networking and devops knowledge

Education

Bachelor's degree