A curated list of Site Reliability and Production Engineering resources.
-
Updated
Aug 28, 2025
A curated list of Site Reliability and Production Engineering resources.
A curated collection of publicly available resources on how technology and tech-savvy organizations around the world practice Site Reliability Engineering (SRE)
🫖 Status page with uptime monitoring & API monitoring as code 🫖
Complete open-source monitoring and observability platform.
📟✨ PagerDuty on-call widget for monitoring dashboard. Datadog and Grafana compatible
AI-powered SRE platform for automated incident investigation
A curated list of awesome Site Reliability and Production Engineering resources.
A collection of awesome tools, software, libraries, learning tutorials & videos, frameworks, best practices and technical resources about Incident Response & Management in Cybersecurity
Terraform provider for Rootly - manage incident management, on-call schedules, workflows, and alerts as code
💯网站可靠性和生产工程资源精选清单
opsway mono backend for API, Probes etc.
Simple custom UI for Pagerduty incident searching
A simple notification architecture to remind employees through email or sms their on-call shifts. No need to have a document and keep forgetting your on-call shift schedule! 🤓
Add a description, image, and links to the on-call topic page so that developers can more easily learn about it.
To associate your repository with the on-call topic, visit your repo's landing page and select "manage topics."