Description

The Senior Site Reliability Engineer (SRE) Consultant will be responsible for ensuring the reliability, scalability, and performance of [Company Name]'s services and infrastructure. This role combines deep technical expertise in application architecture, SRE principles, and DevOps best practices to create and maintain robust, highly available, and efficient systems. The ideal candidate will have experience building SRE dashboards, defining and tracking SLAs, SLOs, and SSOs, and developing microservices in cloud environments like AWS.
 

Qualifications:

  • Experience: At least 10 years of hands-on experience as an SRE, DevOps Engineer, or related role. Proven experience with cloud infrastructure, microservices, and application reliability.
  • Technical Skills:
    • Strong experience with AWS infrastructure and services (EC2, S3, Lambda, CloudWatch, etc.).
    • Proficiency in SRE principles such as incident management, reliability metrics (SLA/SLO/SSO), and automation.
    • Expertise in CI/CD pipeline setup and management (Jenkins, GitLab CI, CircleCI, etc.).
    • Experience in microservices development and architecture.
    • Familiarity with monitoring and observability tools (Prometheus, Grafana, ELK stack, New Relic, etc.) for building custom dashboards.
    • System Integration (SI) experience with integrating multiple systems and platforms.
    • Strong understanding of distributed systems and application troubleshooting in production environments.
  • Skills:
    • Advanced troubleshooting and problem-solving skills.
    • Strong knowledge of scripting and programming (Python, Go, Bash, etc.).
    • Excellent communication and collaboration skills to work with cross-functional teams.
    • Ability to mentor and guide junior team members

Education

Any Gradute