Key Responsibilities
- Architect, implement, and maintain scalable, resilient, and secure infrastructure using AWS.
- Develop and manage infrastructure as code (IaC) using Terraform to automate deployments and streamline infrastructure management.
- Design and implement CI/CD pipelines for automated deployments and smooth application delivery.
- Contribute to and maintain monitoring, logging, and alerting systems for comprehensive visibility into infrastructure health.
- Troubleshoot system performance issues, identify bottlenecks, and implement solutions to enhance reliability and scalability.
- Participate in sustainable incident response, perform root cause analyses (RCAs), and ensure prompt resolution of incidents with minimal disruption.
- Work with development teams to ensure applications are designed for reliability, scalability, and performance.
- Assist internal and customer-facing teams with deployments, including VPN and other security-related infrastructure.
- Support AutoRABIT services through on-call rotations, ensuring timely resolution of critical issues.
- Automate manual tasks, such as user provisioning in production and test environments.
- Drive automation initiatives to improve efficiency, reliability, and deployment speed.
- Mentor peers and team members through knowledge sharing, training, and collaboration.
- Foster a culture of continuous improvement and blameless postmortems to learn from incidents.
- Ensure security best practices are followed across infrastructure and deployments.
- Adhere to internal controls and compliance requirements, ensuring all infrastructure aligns with security and regulatory standards.
- Responsibility to adhere to set internal controls.
Required Skills and Experience
- Technical Expertise:
- Proven experience designing and managing AWS-based infrastructure.
- Strong hands-on experience with Terraform for infrastructure as code.
- Working knowledge of CI/CD pipelines and related tools like Jenkins, AWS CodePipeline, or equivalent.
- Proficiency in scripting languages such as Bash or Python for automation tasks.
- Knowledge of programming languages like Python.
- Experience with configuration management tools like Ansible or AWS SSM.
- Expertise in monitoring tools like Grafana, or Elasticsearch.
- Soft Skills:
- Strong problem-solving and troubleshooting abilities.
- Excellent written and verbal communication skills, particularly in working with global teams.
- Leadership qualities with the ability to challenge the status quo and drive innovation.
- A collaborative mindset with a focus on mentoring and knowledge sharing