Description

Job Description:

  • Monitoring and Alerting: Implement and maintain monitoring systems to proactively identify potential issues and alert engineers to problems before they impact users.
  • Incident Response: Respond to incidents and outages, diagnose problems, and implement solutions to minimize downtime and restore service.
  • Automation: Automate repetitive tasks and processes to improve efficiency and reduce manual effort.
  • Performance Optimization: Identify and address performance bottlenecks to ensure systems run efficiently and effectively.
  • Infrastructure Management: Manage and maintain the underlying infrastructure, including servers, networks, and cloud resources.
  • Capacity Planning: Plan for future capacity needs to ensure systems can handle anticipated workloads.
  • Release Engineering: Develop and maintain processes for deploying software updates and releases.
  • Collaboration: Work closely with developers, operations teams, and other stakeholders to ensure system reliability and availability.
  • Documentation: Maintain clear and concise documentation of systems, processes, and procedures.
  • Continuous Improvement: Identify areas for improvement and implement changes to enhance system reliability and performance.

Skills and Qualifications:

  • Cloud Platform (AWS, Microsoft Azure).
  • Automation (DevOps, CI/CD, Terraform).
  • Operating System (Windows, Linux).
  • Scripting (Shell Scripting, Python, PowerShell).
  • Database (MySQL, Oracle, SQL database management).
  • Application Deployment (Wild Fly, JBoss, Apache Tomcat).
  • Container Services (Kubernetes, Docker, Helm).

Education

Any Graduate