Site Reliability Engineer

MIZ Tech Solutions LLC
Scottsdale, AZ, USA

Description

Must have:
1. Writing automation scripts with Programming languages such as Go, Python, Java, Rust etc. and one or more database (Oracle, SQL Server, Redis, Clickhouse, postgres, Mongo or any time-series databases)
2. Experience in transitioning platforms to the cloud and Containerization – GCP, AWS and Rancher (or Cloud Formation, Azure and OpenShift) and maintaining containerized app in GKE/RKE/AKE environments.
3. Work experience in specific GraphQL Framework (Apollo, Prisma, Hasura etc...) and to Implement Cloud observability using OTEL to enable real-time monitoring, distributed tracing and incident resolution.

Summary: Associate is responsible for ensuring reliability, availability and performance of enterprise Critical systems and services. Collaborate with Cross function teams to design, implement and maintain scalable resilient infrastructure solution that support the business objectives. Adoption of best practices in site reliability, Incident management and identify scope for continuous improvement
3-5 years of Service reliability/operation experience running large-scale, high-performance applications in a hybrid environment (on-prem and cloud). Develop dashboards for Application Performance management to manage Transaction journeys

Preferred Skills:
• Proven experience managing Application availability, building creative solutions to manage repetitive activities, improve gating and detect for applications at every touchpoint for a 24 x 7 High availability platform exposed to critical clients and customers.
• Working knowledge of Monitoring tools - Splunk, App-dynamics, grafana/Prometheus and Dynatrace.
• Experience with tools like Rally, Confluence and other CI/CD extenders.
• Hands-on experience with implementing in-memory caching solutions. Experience on Redis DB is a plus.
• Excellent debugging skills across variety of integrated technical platforms on API gateway.
• Hands-on with GCS, Cloud SQL, Spanner and Firestore.
• Extensive experience in Enterprise level Infrastructure and Operations.
• Experience in High Availability and distributed systems, Linux and Windows administration, troubleshooting and support.
• Monitor and troubleshoot HashiCorp Vault environments, ensuring minimal downtime and rapid recovery from incidents.
• Working knowledge on Vertex AI, Gen AI and Bigquery

Key Skills

Go Python Java Rust Gcp Aws Graphql Framework Splunk Prometheus Dynatrace

Education

Any Graduate

Apply Now

Back To Jobs

Posted On: 15 days Ago
Experience: 10+ years of experience
Openings: 1
Category: Site Reliability Engineer
Tenure: Flexible Position