Description

We are looking for a highly skilled systems engineer with experience working with Virtualization, Linux, Kubernetes, and Server Infrastructure. The engineer will be responsible to design, deploy, and maintain enterprise-grade cloud infrastructure using Apache CloudStack or similar technology, Kubernetes on Linux operating system.

 

The Work -


Hypervisor Administration & Engineering

• Architect, deploy, and manage Apache CloudStack for private and hybrid cloud environments.

• Manage and optimize KVM or similar virtualization technology

• Implement high-availability cloud services using redundant networking, storage, and compute.

• Automate infrastructure provisioning using OpenTofu, Ansible, and API scripting.

• Troubleshoot and optimize hypervisor networking (virtual routers, isolated networks), storage, and API integrations.

• Working experience with shared storage technologies like GFS and NFS.

 

Kubernetes & Container Orchestration

 

• Deploy and manage Kubernetes clusters in on-premises and hybrid environments.

• Integrate Cluster API (CAPI) for automated K8s provisioning.

• Manage Helm, Azure Devops, and ingress (Nginx/Citrix) for application deployment.

• Implement container security best practices, policy-based access control, and resource optimization. Linux Administration

• Configure and maintain RedHat HA Clustering (Pacemaker, Corosync) for mission-critical applications.

• Manage GFS2 shared storage, cluster fencing, and high-availability networking.

• Ensure seamless failover and data consistency across cluster nodes.

• Perform Linux OS hardening, security patching, performance tuning, and troubleshooting. Physical Server Maintenance & Hardware Management

• Perform physical server installation, diagnostics, firmware upgrades, and maintenance.

• Work with SAN/NAS storage, network switches, and power management in data centers.

• Implement out-of-band management (IPMI/iLO/DRAC) for remote server monitoring and recovery.

• Ensure hardware resilience, failure prediction, and proper capacity planning. Automation, Monitoring & Performance Optimization • Automate infrastructure provisioning, monitoring, and self-healing capabilities.

• Implement Prometheus, Grafana, and custom scripting via API for proactive monitoring.

• Optimize compute, storage, and network performance in large-scale environments. • Implement disaster recovery (DR) and backup solutions for cloud workloads. Collaboration & Documentation

• Work closely with DevOps, Enterprise Support, and software Developers to streamline cloud workflows. • Maintain detailed infrastructure documentation, playbooks, and incident reports.

• Train and mentor junior engineers on CloudStack, Kubernetes, and HA Clustering.

 

The Must-Haves -


• 5+ years of experience in CloudStack or similar virtualization platform, Kubernetes, and Linux system administration.

• Strong expertise in Apache CloudStack (4.19+) or similar virtualization platform, KVM hypervisor, and Cluster API (CAPI).

• Extensive experience in RedHat HA Clustering (Pacemaker, Corosync) and GFS2 shared storage.

• Proficiency in OpenTofu, Ansible, Bash, Python, and Go for infrastructure automation.

• Experience with networking (VXLAN, SDN, BGP) and security best practices.

• Hands-on expertise in physical server maintenance, IPMI/iLO, RAID, and SAN storage.

• Strong troubleshooting skills in Linux performance tuning, logs, and kernel debugging.

• Knowledge of monitoring tools (Prometheus, Grafana, Alert manager).

 

Preferred Qualifications • Experience with multi-cloud (AWS, Azure, GCP) or hybrid cloud environments. • Familiarity with CloudStack API customization, plugin development. • Strong background in disaster recovery (DR) and backup solutions for cloud environments. • Understanding of service meshes, ingress, and SSO. • Experience is Cisco UCS platform management.

Education

Any Graduate