Description

 

A BigPanda Monitoring Engineer is responsible for managing and optimizing monitoring systems and workflows, often integrating BigPanda’s incident management platform with other monitoring tools. Their roles and responsibilities typically include:

 

1. Monitoring System Integration:

• Integrate BigPanda with various monitoring, alerting, and ticketing tools such as Datadog, New Relic, Splunk, and Jira.

• Ensure seamless data flow from monitoring tools into BigPanda for centralized alerting.

2. Incident Management:

• Configure and manage BigPanda’s incident correlation and automation features to reduce alert noise and identify root causes.

• Set up and fine-tune alert correlation logic to group related incidents.

3. Alert Management:

• Define alert thresholds, suppression rules, and escalation policies to ensure relevant and actionable alerts.

• Monitor alert fatigue and reduce redundant or low-value alerts by optimizing alerting thresholds.

4. Collaboration with DevOps and SRE Teams:

• Collaborate with DevOps, Site Reliability Engineering (SRE), and IT Operations teams to ensure system reliability and uptime.

• Assist teams in resolving incidents quickly by providing insights into correlated alerts and incidents.

5. Automation and Orchestration:

• Automate repetitive monitoring tasks and incident responses through BigPanda’s automation capabilities.

• Develop runbooks and workflows for common operational incidents.

6. Reporting and Analytics:

• Generate reports and dashboards that provide visibility into the health of systems, incident trends, and mean time to recovery (MTTR).

• Analyze incident data to identify areas for improvement in system performance and alert configurations.

7. Onboarding and Training:

• Help onboard teams onto the BigPanda platform by providing training, documentation, and ongoing support.

• Ensure proper configuration of monitoring and alerting setups across teams.

8. Compliance and Security:

• Ensure that monitoring practices are compliant with security policies and regulatory requirements.

• Implement access controls and audit logs in BigPanda to track changes to monitoring and alerting configurations.

9. Continuous Improvement:

• Regularly review and refine incident management and monitoring practices to improve system reliability and reduce downtime.

• Stay updated with BigPanda’s latest features and implement them where applicable to enhance the monitoring system.

10. Collaboration with Vendors:

• Work closely with BigPanda and other tool vendors to address issues, explore new features, and ensure optimal usage of the platform.

 

 

Education

Any Graduate