We are looking for a highly skilled systems engineer with experience working with Virtualization, Linux, Kubernetes, and Server Infrastructure. The engineer will be responsible to design, deploy, and maintain enterprise-grade cloud infrastructure using Apache CloudStack or similar technology, Kubernetes on Linux operating system.
The Work -
Hypervisor Administration & Engineering
• Architect, deploy, and manage Apache CloudStack for private and hybrid cloud environments.
• Manage and optimize KVM or similar virtualization technology
• Implement high-availability cloud services using redundant networking, storage, and compute.
• Automate infrastructure provisioning using OpenTofu, Ansible, and API scripting.
• Troubleshoot and optimize hypervisor networking (virtual routers, isolated networks), storage, and API integrations.
• Working experience with shared storage technologies like GFS and NFS.
Kubernetes & Container Orchestration
• Deploy and manage Kubernetes clusters in on-premises and hybrid environments.
• Integrate Cluster API (CAPI) for automated K8s provisioning.
• Manage Helm, Azure Devops, and ingress (Nginx/Citrix) for application deployment.
• Implement container security best practices, policy-based access control, and resource optimization. Linux Administration
• Configure and maintain RedHat HA Clustering (Pacemaker, Corosync) for mission-critical applications.
• Manage GFS2 shared storage, cluster fencing, and high-availability networking.
• Ensure seamless failover and data consistency across cluster nodes.
• Perform Linux OS hardening, security patching, performance tuning, and troubleshooting. Physical Server Maintenance & Hardware Management
• Perform physical server installation, diagnostics, firmware upgrades, and maintenance.
• Work with SAN/NAS storage, network switches, and power management in data centers.
• Implement out-of-band management (IPMI/iLO/DRAC) for remote server monitoring and recovery.
• Ensure hardware resilience, failure prediction, and proper capacity planning. Automation, Monitoring & Performance Optimization • Automate infrastructure provisioning, monitoring, and self-healing capabilities.
• Implement Prometheus, Grafana, and custom scripting via API for proactive monitoring.
• Optimize compute, storage, and network performance in large-scale environments. • Implement disaster recovery (DR) and backup solutions for cloud workloads. Collaboration & Documentation
• Work closely with DevOps, Enterprise Support, and software Developers to streamline cloud workflows. • Maintain detailed infrastructure documentation, playbooks, and incident reports.
• Train and mentor junior engineers on CloudStack, Kubernetes, and HA Clustering.
The Must-Haves -
• 5+ years of experience in CloudStack or similar virtualization platform, Kubernetes, and Linux system administration.
• Strong expertise in Apache CloudStack (4.19+) or similar virtualization platform, KVM hypervisor, and Cluster API (CAPI).
• Extensive experience in RedHat HA Clustering (Pacemaker, Corosync) and GFS2 shared storage.
• Proficiency in OpenTofu, Ansible, Bash, Python, and Go for infrastructure automation.
• Experience with networking (VXLAN, SDN, BGP) and security best practices.
• Hands-on expertise in physical server maintenance, IPMI/iLO, RAID, and SAN storage.
• Strong troubleshooting skills in Linux performance tuning, logs, and kernel debugging.
• Knowledge of monitoring tools (Prometheus, Grafana, Alert manager).
Preferred Qualifications • Experience with multi-cloud (AWS, Azure, GCP) or hybrid cloud environments. • Familiarity with CloudStack API customization, plugin development. • Strong background in disaster recovery (DR) and backup solutions for cloud environments. • Understanding of service meshes, ingress, and SSO. • Experience is Cisco UCS platform management.
Any Graduate