Description

Key Skills- AWS, Dynatrace, Grafana, Python, Jenkins

Job Description

• Ability to create an SRE backlog – broad things that need to be done right to implement SRE at scale
• Work with senior stakeholders to agree and drive the backlog
• Liaise and manage dependencies across teams which implementing the backlog; while staying on course and highlighting any derailment early on
• Incorporate various software engineering aspects to develop and implement services that improve IT and support teams - ranging from production code changes to alerting and monitoring adjustments.
• CI/CD Pipeline Development and optimization
• Building proprietary tools from the scratch to mitigate weaknesses in incident management or software delivery.
• Troubleshooting Support Escalation, routing escalations to concerned teams.
• On-Call Process Optimization via automation
• Documenting Knowledge
• Optimizing SDLC

Skills:


• Deep working knowledge of monitoring and log analytics tools like Dynatrace, Splunk, Grafana, CloudWatch, X-Ray
• Coding – should be proficient in scripting languages like Python, Ruby for automation tool development.
• Mastery of cloud infrastructure, particularly AWS services (EC2, S3, RDS, VPC, Lambda, CouldFormation, AWS CLI)
• CI/CD Expertise - setting up and maintaining CI/CD pipelines to automate testing and deployment processes. Skills in using tools like Jenkins, AWS CodeBuild, AWS CodeDeploy, and AWS CodePipeline are essential.
• Communication - communicate with different teams to report and address incidents, explain technical concepts, negotiate reliability standards, and manage team relationships. They must interact with software engineers, product teams, managers, CEOs, CTOs


 

Education

Any Graduate