OSG is looking for DevOps professionals to join our Cloud Operations team. You'll be a Cloud Engineer part of a growing cross functional technology team who leads the design, implementation and operation of client workloads on the OSG Cloud platform.
Responsibilities
- Function as a key member of the team responsible for the day-to-day, 24x7x365 on-call engineering, system administration, and operation of Paybox Cloud systems.
- Provide review and input into the design of new technical features and architectural changes to the systems.
- Define and implement best practices around various operation processes.
- Perform periodic software updates to systems, and address security vulnerabilities.
- Scale infrastructure to meet growing capacity and launch new applications in both private and public clouds.
- Lead troubleshooting efforts to find root causes and corrective actions throughout the life of a project.
- Develop tools to automate builds and continuous integration using Octopus Deploy, Jenkins, Terraform, Docker, Kubernetes, etc.
- Develop monitoring solutions and appropriate metrics to measure performance and efficiency of applications using the ELK stack and/or Datadog.
- Own the day-to-day health, uptime, monitoring, and reliability of services.
- Participation in an on-call rotation for after-hours coverage as needed.
- Other duties as assigned.
- Position may require some travel.
Desired Background
- Certification: AWS Certified SysOps Administrator Associate.
- Bachelor's degree in Computer Science, Information Technology, Management Information Systems, related field or equivalent experience.
- 3+ years’ experience in a 24×7 high-availability production environment.
- 3+ years’ experience with System Engineering in private and/or public cloud data center environments.
- 5+ years’ experience working with Windows and Linux based server systems such as Windows Server 2008 – 2016, CentOS, CoreOS, and Ubuntu.
- In depth understanding of TCP/IP LAN/WAN networking technologies and troubleshooting techniques.
- Experience with infrastructure health monitoring and capacity planning using tools such as Solarwinds Orion, Prometheus, Grafana, Elasticsearch, Datadog, etc.
- Experience with hardware or software-based firewalls, load balancers and proxy servers.
- Experience with intrusion detection systems and network and server security hardening.
- Experience with virtualization technologies such as VMWare and KVM.
- Experience with container management platforms (Kubernetes) and container runtimes (Docker).
- Good knowledge of languages such as Python, JavaScript and PowerShell.
- Excellent organizational skills, and oral and written communication skills.
- Ability to work with minimal supervision, making decisions based upon priorities, schedules and an understanding of business initiatives.
- Critical attention to detail, thoroughness and documentation.
Preferred Qualifications
- Certification: AWS Certified DevOps Engineer Professional.
- Previous DevOps experience to act as liaison between development and quality assurance teams as part of the SDLC process.
- Experience with public cloud service providers such as Amazon AWS, Microsoft Azure, Google GCP, etc.
- Experience with deploying server workloads in a multi-tier architecture.
- Experience in REST and SOAP distributed application troubleshooting.
- Experience with cloud data protection and backup operations to support DR/BCP solutions for cloud workloads.
- Experience with relational and NoSQL database technologies.
- Extensive experience with configuration management and infrastructure as code using YAML and tools such as SaltStack, Ansible, Terraform, CloudFormation.
- Experience with CI/CD and automated software delivery pipelines using tools such as Octopus Deploy, Jenkins and Terraform.
- Experience monitoring solutions and appropriate metrics to measure performance and efficiency of applications using tools such as ELK stack and/or Datadog.
- • Experience with source control systems such as VSTS, Azure DevOps and Git.
- • Experience in monitoring, metrics collection, and reporting using open source tools.
- • Good knowledge in automating the management of a data center environment.