Description

What you’ll do

Platform Leadership: Drive the strategic direction of all applications across cloud-native (GCP) and hybrid environments, including the design and implementation of new environments, software upgrades, and performance tuning.

Automation: Lead the creation and enhancement of automation scripts to streamline operational processes. Develop and maintain comprehensive supporting documentation for all automated solutions.

Deployment Services: Provide reliable and efficient software deployment services for both Quality Assurance (QA) and production environments.

Performance Management: Actively direct application performance monitoring and tuning to ensure optimal user experience and system efficiency.

Uptime and Reliability: Take a proactive role in safeguarding the uptime and performance of all applications, implementing best practices for high availability and fault tolerance.

GCP Expertise: Serve as a subject matter expert in the development, operational support, and maintenance of all applications hosted on the Google Cloud Platform (GCP).

CI/CD Pipeline Development: Build and maintain robust CI/CD pipelines for the automated build, testing, and deployment of applications and cloud architecture patterns, utilizing Jenkins and other cloud-native toolchains.

Production Environment Management: Define and manage updates to the production environment, ensuring stability and minimal disruption.

Third-Party Software Support: Oversee the installation, configuration, and ongoing support of third-party software running on internally managed environments.

Process and Procedure Development: Design and establish deployment procedures, set up robust operational processes, monitor production systems, and manage configuration effectively.

Tooling and Automation: Build automated tooling to facilitate service requests and push changes into production seamlessly.

Incident Management and Remediation: Develop and maintain comprehensive and detailed runbooks to effectively detect, remediate, and restore services during incidents. Participate in an on-call rotation for high-severity application incidents and continuously improve runbooks to reduce Mean Time to Resolution (MTTR).

Problem Solving: Triage and solve complex problems within a distributed microservices architecture.

What experience you need 

BS degree in Computer Science or related technical field involving coding (e.g., physics or mathematics), or equivalent job experience required

5+ years of experience in software engineering, systems administration, database administration, and networking.

2+ years of experience developing and/or administering software in public cloud

Experience in monitoring infrastructure and application uptime and availability to ensure functional and performance objectives.

Experience in languages such as Python, Bash, Java, Go JavaScript and/or node.js

Demonstrable cross-functional knowledge with systems, storage, networking, security and databases

System administration skills, including automation and orchestration of Linux/Windows using Terraform, Chef, Ansible and/or containers (Docker, Kubernetes, etc.)

Proficiency with continuous integration and continuous delivery tooling and practices

Cloud Certification Strongly Preferred

What could set you apart

Proven experience in a Platform Engineer, DevOps Engineer, or similar role.

Strong hands-on experience with Google Cloud Platform (GCP) services (e.g., GKE, Compute Engine, Cloud Storage, Cloud SQL).

In-depth knowledge of and experience with CI/CD principles and tools, particularly Jenkins.

Proficiency in automation scripting using languages such as Python, Bash, or Go.

Demonstrated experience in application performance monitoring and tuning.

Solid understanding of infrastructure as code (IaC) concepts and tools (e.g., Terraform, Ansible).

Experience with containerization technologies such as Docker and container orchestration with Kubernetes.

Familiarity with incident management best practices and a proven ability to improve MTTR.

Experience with the installation, configuration, and support of third-party software.

Excellent problem-solving and troubleshooting skills in complex distributed systems.

Strong written and verbal communication skills.

Education

Any Graduate