Description

1.Python Development for Automation & Reliability

  • Design, develop, and maintain Python-based automation tools and scripts to enhance system reliability.
  • Build APIs, microservices, and infrastructure automation solutions.
  • Write efficient, scalable, and maintainable Python code for operations and monitoring.

2. Site Reliability Engineering (SRE) & System Performance Optimization

  • Ensure the availability, scalability, and resilience of IT infrastructure and services.
  • Implement self-healing mechanisms and proactive monitoring for system stability.
  • Optimize application performance, database queries, and system resources.

3. Infrastructure as Code (IaC) & Cloud Automation

  • Implement Infrastructure as Code (IaC) using Terraform, Ansible, or CloudFormation.
  • Automate cloud infrastructure provisioning on AWS, GCP, or Azure.
  • Manage and optimize containerized workloads using Docker and Kubernetes.

4. CI/CD & Deployment Management

  • Develop and maintain CI/CD pipelines to automate deployment processes.
  • Ensure zero-downtime deployments and improve rollback strategies.
  • Collaborate with DevOps teams to enhance software release processes.

5. Observability, Monitoring & Incident Management

  • Implement logging, monitoring, and alerting solutions (Prometheus, Grafana, ELK, Datadog).
  • Define and track Service Level Indicators (SLIs) and Service Level Objectives (SLOs).
  • Participate in on-call rotations and resolve production incidents efficiently.

6. Security & Compliance

  • Implement automated security scanning and compliance checks.
  • Ensure secure coding practices and vulnerability assessments.
  • Collaborate with security teams to improve system hardening and access controls.

Required Skills & Qualifications:

Technical Skills:

  • Strong proficiency in Python for automation, scripting, and backend development.
  • Experience in Linux/Unix system administration and troubleshooting.
  • Hands-on experience with cloud platforms (AWS, GCP, or Azure).
  • Proficiency in CI/CD tools (Jenkins, GitLab CI/CD, ArgoCD).
  • Knowledge of Kubernetes and Docker for containerized deployments.
  • Experience in monitoring and logging tools (Prometheus, Grafana, ELK, Datadog).
  • Expertise in Infrastructure as Code (IaC) using Terraform, Ansible, or CloudFormation.
  • Understanding of networking, DNS, load balancing, and security best practices.

Soft Skills:

  • Strong analytical and problem-solving skills. Excellent communication and collaboration skills.
  • Passion for automation, DevOps, and reliability engineering

Education

Any Graduate