DevOps Engineer (AWS/Azure/GCP, CI/CD, Kubernetes, Coding (Python)
The Business Operations (Biz Ops) team is seeking a Business Operations Engineer.
The role of Business Operations Organization is to be the production readiness steward for Client products.
As Business Operations Engineers we support platform stability through monitoring.
We support software run principals that includes change implementation, operational design, automation, monitoring that leads to fault-tolerant, scalable products.
We support daily operations with a hyper focus on triage, root cause by understanding the business impact of our products and subsequently performing blameless post-mortems.
The goal of every Business Operations team is to proactively manage production and change activities to maximize customer experience and increase the overall value of supported applications. Business Operations teams also focus on risk management by tying all our activities together with an overarching responsibility for compliance and risk mitigation across all our environments.
Ultimately, the role of Business Operations is to align Product and Customer Focused priorities with Operational needs by providing continuous feedback throughout the lifecycle.
Team Specific Skills:
It is not expected that any single candidate would have expertise across all these areas, but a Biz Ops engineer will spend a bit of time throughout their career with all these aspects of the role:
Operational Readiness Architect:
Serve as the primary contact responsible for the overall application health, performance, and capacity.
Support services before they go live through activities such as system design consulting, capacity planning and launch reviews.
Partner with the development and product team of a new application to establish the right monitoring and alerting strategy and create the framework to achieve zero downtime during deployment.
Increase automation and tooling to reduce toil and manual intervention.
Tackle complex development, automation, and business process problems.
Engage in and improve the whole lifecycle of services—from inception and design, through deployment, operation, and refinement.
Analyses ITSM activities of the platform and provide feedback loop to development teams on perational gaps or resiliency concerns.
Practice sustainable incident response and blameless post-mortems while taking a holistic approach to problem solving and optimizing time to recover.
Experience in Monitoring tools such as Splunk, Dynatrace
Understanding of client-server relationships, Network concepts, Stack trace analysis (TCP dumps, heap dumps, CPU/memory analysis, thread dumps), logging and monitoring tools, high availability, and business continuity planning
Role Qualifications
The ideal candidate will have experience in many of these areas:
BS degree in Computer Science or related technical field involving coding (e.g., physics or mathematics), or equivalent practical experience.
Coding or scripting exposure.
Appetite for change and pushing the boundaries of what can be done with automation.
Be curious about new technology, infrastructure, and practices to scale our architecture and prepare for future growth.
Basic to intermediate level of understanding of algorithms, data structures, scripting, pipeline management, and software design
Systematic problem-solving approach, coupled with strong communication skills and a sense of ownership and drive
Any Graduate