Description

JOB DESCRIPTION:

What is the opportunity?

Senior SRE for our Clear Technology. You will be heavily involved in shaping the future technology landscape by delivering key business values for a transformational project in our Banking Technology while implementing strategic components servicing across all functions defined in our roadmap. This role will be responsible for the development, implementation, and support of Site Reliability Engineering (SRE) solutions for applications supported by the Retail Banking Payments Technology and Integrations (RBPTI).
What will you do?

• Provide hands-on SRE technical support on squad level, providing 24x7 SRE support
• Drive transformation by continuously looking for ways to automate existing processes
• Track, audit, monitor and implement on technical work streams
• Act as portfolio SME (Subject Matter Expert) – understand & document common components, core functionalities, infrastructure of supported applications
• Be an escalation point in the on-call rotation, and support our maintenance, scheduled work, support and release deployment requirements
• Help in incident management and problem management for applications in scope and RCA Action items fulfillment/ownership
• Focus on Continuous improvement and technical standards – Drive improvements in productivity, monitoring, tooling and best practices
• Manage technology currency (server patching, certificate renewal, compliance, etc.) with keen eye on automating opportunities
• Drive best-in-class technical solutions by tracking closely industry leading solutions and applying to environment and needs
• Leverage the value in unit, department, and enterprise wide teams to develop better solutions and achieve a cross enterprise mindset
• Contribute to drive the overall SRE strategy, owning roadmap build
What do you need to succeed?

Must have:

• 2-5 years of experience as SRE
• A Bachelor's degree in Computer Science or related technical field (Example: Mathematics/Engineering/Physics), or equivalent practical experience.
• Advanced knowledge of the following SRE practices and technologies
◦ Python, YAML, Shell scripting
◦ Azure, Linux
◦ Dynatrace, Prometheus, PagerDuty, Moog, Splunk, Elastic, Azure monitor
◦ Chaos Engineering
◦ MQ, Kafka
◦ Perform production support role, including off-hours support
• Ability to influence at the Senior and/or Principal level

What will SREs do? • Provide hands-on SRE with 24x7 SRE support, including incident management, problem management, root cause analysis, monitoring, alerting, and maintenance of infrastructure, compliance • Track, audit, monitor and implement on technical work streams • Act as portfolio SME (Subject Matter Expert) – understand & document common components, core functionalities, infrastructure of supported applications • Be an escalation point in the on-call rotation, and support our maintenance, scheduled work, support and release deployment requirements • Lead in incident management and problem management for applications in scope and RCA Action items fulfillment/ownership • Focus on Continuous improvement and technical standards – Drive improvements in productivity, monitoring, tooling and best practices • Manage technology currency (server patching, certificate renewal, compliance, etc.) with keen eye on automating opportunities • Drive best-in-class technical solutions by tracking closely industry leading solutions and applying to environment and needs • Leverage the value in unit, department, and enterprise wide teams to develop better solutions and achieve a cross enterprise mindset Engineering: • Develop SRE solutions (monitoring and alerting, machine learning anomaly detection, self-healing and reliability testing) • Apply design-thinking and agile mindset in working with SREs, Scrum Masters and Incident Leads • Contribute to and leverage best practices in SRE • Simplifies development by building repeatable solutions to manual tasks • Supports unit's goals to adopt automation solutions for applications in scope Production Support: • Perform production support role, including off-hours support and rotational on-call support to be compensated accordingly with overtime pay, lieu time, and on-call allowance • Assist in incident management and problem management for applications in scope • Evaluate continuously – what went well, what went wrong, what can be done to improve and prevent in future • Maintain technology currency (perform server patching, certificate renewal, etc.) with keen eye on automating opportunities • Ensure availability and uptime of applications in scope, as per service level objectives • Ensure compliance of all systems and applications in scope, including maintaining segregation of duties Technical Consultation: • Support initiatives outside of application or squad level scope • Consult on products build to other teams in RBPT and enterprise Innovation and Learning: • Stay abreast of technology change and learn constantly, through official training assignments and self-assigned learning • Provide demos to team at large of new technology findings Must have: • A Bachelor's degree in Computer Science or related technical field (Example: Mathematics/Engineering/Physics), or equivalent practical experience. • Advanced knowledge of the following SRE practices and technologies • 4-5 years of experience in related field o Python, YAML, Shell scripting o Azure, Linux o Dynatrace, Prometheus, PagerDuty, Moog, Splunk, Elastic, Azure monitor o Chaos Engineering o MQ, Kafka o Perform production support role, including off-hours support • In-depth hands-on experience in a variety of SRE tools (Ansible, Azure Automation, Catchpoint)
Good to have: Dynatrace-Less Than a Year,Kafka-Less Than a Year,Network programming (Perl, Python, Java, etc)-Less Than a Year,Microsoft Azure-Less Than a Year

Education

Any Graduate