Description

Must Have:
Strong AWS (ECS OR EC2 OR Lambda OR IAM OR Cloud Formation) + (EKS OR Kubernetes OR EKS clusters) + (IaC with Terraform preferred) + (Support OR Ticketing Systems OR Handled large no. of Tickets through any platform)
 
Key Responsibilities:

  • Deliver incident management and advanced-level L1/L2 support for internal applications across public cloud platforms, with a strong emphasis on AWS.
  • Serve as the initial point of contact for application developers via a ticketing system.
  • Communicate effectively with users at various organizational levels.
  • Implement and utilize automation to support the scalability of the environment.
  • Optimize operational processes to enhance efficiency, reliability, and security.
  • Train users to self-diagnose and troubleshoot issues for expedited resolution.
  • Conduct thorough investigations into issues to identify root causes and document strategies to prevent recurrence.
  • Provide support for public cloud environments, particularly AWS.
  • Manage events and incidents efficiently.
  • Develop and implement scalable automation processes to handle tasks in a large-scale environment.
  • Analyze and debug incidents, following up to gather feedback and prevent future issues.
  • Support different development environments, including Unix, Linux, Mainframe, and Windows.

Required Skills and Experience:

  • Proficiency in SDLC with the ability to read code (Java and Python).
  • Hands-on scripting experience (Unix shell, Python).
  • Extensive cloud experience, particularly with AWS.
  • Expertise in Kubernetes.
  • Strong troubleshooting and diagnostic skills for security and access issues in a large enterprise environment.
  • Database management skills (Oracle DBA, Cassandra DBA, CockroachDB) including performance tuning, connectivity, backups, indexes, and monitoring alarms.
  • Middleware and messaging experience (Kafka, MQ).
  • Experience with Tomcat.
  • System engineering and administration skills (Unix/Linux).
  • Familiarity with monitoring tools and ticketing systems.
  • Commitment to automating processes for continuous improvement.
  • Excellent communication skills.
  • Ability to analyze details, understand incident causation, and implement preventive measures to ensure reliability and security

Education

Any Graduate