Responsibilities:
Implement best practices in cloud infrastructure, emphasizing Site Reliability, Observability, and Scalability.
Foster strong collaboration with various teams, working closely with Product, DBAs, Developers, DevOps, SRE, and Data Engineers to implement AWS standard methodologies, Infrastructure as Code (IaC), and cost optimizations early in the design process.
Identify and eliminate bottlenecks, manage issues, and allocate support as necessary.
Architect and implement technical solutions which improve the service level of the on premise and cloud platform
Perform incident investigation, diagnosis, and provide resolution
Manage cluster provisioning, performance tuning, and security configuration
Perform system monitoring and remediation of any platform related issue
Identify recurring problems and perform root cause analysis
Ensure continuous availability of data platform services.
Document and maintain environment architecture
Identify opportunities for platform improvements and present recommendations to management
Qualifications:
12+ years of experience in IT
5+ years’ experience working with AWS and Azure (professional certification or equivalent work experience).
5+ years’ experience in application deployment automation and DevOps practice
5+ years’ experience installing, administering, and maintaining Linux based software
3+ years’ experience with IT automation tools such as Ansible, Salt and Terraform
Extensive knowledge of Azure DevOps
Experience designing and implementing redundant systems including data backups/recoveries, high availability, load balancing, and disaster recovery
Experience designing, analyzing, and repairing large-scale distributed systems
Exceptional documentation and communication skills, with the ability to present technical content effectively to both management and engineering teams
Knowledgeable of core IT infrastructure technologies including virtualization, networking, and storage management
Bachelor's degree in Computer Science