Description

A minimum of 5 years’ experience working in a large HPC enterprise environment comprising thousands of servers, large storage solutions, tape and tape automation

Proficient in the installation, configuration and management of Linux based operating systems, preferably using RHEL, CentOS, Rocky Linux

Experience with IBM’s xCAT distributed computing management software

Experience with installation and maintenance of computer hardware including servers, tape drives, robotic tape libraries, GPGPU, SSD, disk arrays

Experience with containerization

Knowledge of networking and datacenter technologies, switching, routing, high-availability, LAN / WAN / WLAN topologies and system configuration for Ethernet, InfiniBand, and Fiber Channel SAN

Experience with HPC Storage Solutions, for example configuration and operation of HPE ClusterStor systems, NetApp, Dell Isilon, and Pure Storage

Ability to write and troubleshoot Bourne, Bash and C Shell, Perl, Python, Ruby and MRTG scripts

Experience with PostgreSQL and database installation and support

Experience with Google Cloud Platform and Azure public clouds. Able to provision and manage instances, build images, write installation scripts

Experience with configuration tools like Ansible and Terraform

Experience with backup and recovery tools, IBM Spectrum, Dell Networker

Good knowledge of Linux security, including configuration of endpoint security tools

Ability to evaluate HPC system environments and make recommendations for improvement in performance and manageability

Ability to investigate, debug and diagnose system level issues

Education

Any Gradute