Description

Responsibilities:
Performance Testing: Conduct load/performance testing to assess application scalability and performance under various conditions, identify bottlenecks, and optimize system resources.
Monitoring: Implement and maintain monitoring solutions leveraging the MDE toolset to track application health, performance metrics, SLAs, and system behavior in real-time, proactively identifying and resolving issues before they impact users. Ensure early detection and resolution of issues to minimize downtime and maintain high availability.
Troubleshooting: Investigate and troubleshoot incidents, outages, and performance issues, utilizing diagnostic tools and techniques to identify root causes and implement effective solutions. Restore service functionality quickly and efficiently to minimize impact on users and business operations.
Error Management: Design and implement error management strategies, including error handling, logging, and alerting mechanisms, to effectively capture and address application errors and anomalies. Improve application stability and reliability by minimizing error rates and providing timely alerts for critical issues.
Automation
Developer Support

Key Skills:
Proficiency in system-level testing tools (e.g., JMeter, Gatling) and techniques for assessing application scalability and performance.
Experience with monitoring tools (e.g., Datadog, Prometheus, Grafana, New Relic) for real-time monitoring and alerting.
Strong troubleshooting and problem-solving skills, with the ability to diagnose and resolve complex technical issues.
Knowledge of management strategies and error handling, logging, and alerting techniques.
Automation skills, including scripting languages (e.g., Go, Bash) and configuration management tools (e.g., Ansible, Terraform).
Collaboration and communication skills to work effectively within stream-aligned teams and coordinate with other stakeholders.
Understanding of Agile and DevOps principles, focusing on continuous improvement and delivery.

Technologies/Tools:
Performance testing tools (e.g., JMeter, Gatling) are used to assess application scalability and performance.
Monitoring tools (e.g., Datadog, New Relic) for real-time monitoring and alerting.
Diagnostic and troubleshooting tools (e.g., Datadog) are used to investigate incidents and performance issues.
Automation frameworks (e.g., Terraform, Docker, Helm) for automating deployment and configuration tasks.
Collaboration platforms (e.g., Slack, Microsoft Teams, Jira, Confluence) for communication and coordination within stream-aligned teams

Education

Any Gradute