Skip to content

Commit fc90b20

Browse files
committed
Add Grafana-to-GitHub alert bridge with Claude SRE agent
Cloudflare Worker that converts Grafana Alertmanager webhooks into categorized GitHub issues, triggering a Claude Code agent (Haiku) to diagnose and act on bridge alerts. - Classifies 28 bridge alerts into 8 categories (finality-lag, delivery-lag, confirmation-lag, reward-lag, relay-down, version-guard, headers-mismatch, low-balance) - Detects environment and bridge pair from alert names - GitHub Action triggers on issue label:claude, uses Grafana API - Located in deployments/local-scripts/grafana-github-bridge/
1 parent d638238 commit fc90b20

File tree

6 files changed

+531
-0
lines changed

6 files changed

+531
-0
lines changed

.github/workflows/claude-sre.yml

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
name: Claude SRE Agent
2+
3+
on:
4+
issues:
5+
types: [opened, labeled]
6+
7+
jobs:
8+
claude:
9+
# Run when issue is opened/labeled with "claude"
10+
if: |
11+
(github.event_name == 'issues' && contains(github.event.issue.labels.*.name, 'claude'))
12+
runs-on: ubuntu-latest
13+
permissions:
14+
contents: read
15+
issues: write
16+
env:
17+
# Grafana API access for live metric queries.
18+
# Claude can use: curl -H "Authorization: Bearer $GRAFANA_TOKEN" "$GRAFANA_URL/api/..."
19+
GRAFANA_URL: ${{ secrets.GRAFANA_URL }}
20+
GRAFANA_TOKEN: ${{ secrets.GRAFANA_TOKEN }}
21+
steps:
22+
- uses: anthropics/claude-code-action@v1
23+
with:
24+
anthropic_api_key: ${{ secrets.ANTHROPIC_API_KEY }}
25+
model: claude-haiku-4-5-20251001
26+
label_trigger: "claude"
Lines changed: 98 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,98 @@
1+
# grafana-github-bridge
2+
3+
Cloudflare Worker that converts Grafana Alertmanager webhook POSTs into GitHub issues, triggering a Claude Code agent to diagnose and act on bridge alerts.
4+
5+
## Flow
6+
7+
```mermaid
8+
flowchart LR
9+
subgraph Monitoring
10+
P["Prometheus<br>:9615"] --> G["Grafana<br>Alert Rules"]
11+
G --> AM["Alertmanager"]
12+
end
13+
14+
subgraph Routing
15+
AM -->|webhook| W["Cloudflare Worker<br>grafana-github-bridge"]
16+
AM -->|webhook| M["Matrix"]
17+
W -->|"classifies alert<br>(8 categories)"| I["GitHub Issue<br>labels: alert, bridge-alert, claude"]
18+
end
19+
20+
subgraph "Claude Code (GitHub Action)"
21+
I -->|"triggers on<br>label: claude"| D["Diagnose<br>Grafana API + Logs"]
22+
D -->|known fix| A["Suggest Action<br>& Post Comment"]
23+
D -->|unknown| H["Page Human<br>& Summarize Findings"]
24+
end
25+
26+
A --> V["Engineer Reviews<br>& Closes Issue"]
27+
H --> V
28+
```
29+
30+
## Alert categories
31+
32+
| Category | Metric Pattern | Suggested Action |
33+
|----------|---------------|------------------|
34+
| `relay-down` | `up{container="bridges-common-relay"}` | Check relay pod status and restart |
35+
| `version-guard` | Loki: `"Aborting"` in relay logs | Redeploy relay with new runtime |
36+
| `headers-mismatch` | `*_is_source_and_source_at_target_using_different_forks` | Re-sync headers from canonical fork |
37+
| `finality-lag` | `*_Sync_best_source_at_target_block_number` | Check relay logs and source chain finality |
38+
| `delivery-lag` | `*_MessageLane_*_lane_state_nonces` (generated > received) | Check message relay process |
39+
| `confirmation-lag` | `*_lane_state_nonces` (received vs confirmed) | Check confirmation relay |
40+
| `reward-lag` | `*_lane_state_nonces` (confirmed src vs confirmed tgt) | Check reward mechanism |
41+
| `low-balance` | `at_*_relay_*Messages_balance` | Top up relay account |
42+
43+
## Grafana configuration
44+
45+
### Contact point
46+
47+
```yaml
48+
- orgId: 1
49+
name: GitHub parity-bridges-common
50+
receivers:
51+
- uid: github_parity_bridges_common
52+
type: webhook
53+
settings:
54+
url: https://grafana-github-bridge.parity-bridges.workers.dev
55+
disableResolveMessage: false
56+
```
57+
58+
### Notification policy
59+
60+
Route bridge alerts to GitHub **and** continue to Matrix:
61+
62+
```yaml
63+
- receiver: GitHub parity-bridges-common
64+
matchers:
65+
- alertname =~ ".*Bridge.*|.*bridge.*|.*headers mismatch"
66+
continue: true
67+
```
68+
69+
`continue: true` ensures the alert also falls through to the default receiver (Matrix).
70+
71+
## Deploy
72+
73+
```bash
74+
cd deployments/local-scripts/grafana-github-bridge
75+
npm install
76+
npx wrangler secret put GITHUB_TOKEN # PAT with issues:write scope
77+
npx wrangler secret put WEBHOOK_SECRET # optional, shared secret
78+
npx wrangler deploy
79+
```
80+
81+
Deployed at `https://grafana-github-bridge.parity-bridges.workers.dev`.
82+
83+
## Test
84+
85+
```bash
86+
# Local
87+
npx wrangler dev
88+
WORKER_URL=http://localhost:8787 node test.js
89+
90+
# Production (dry run — creates a real issue)
91+
WORKER_URL=https://grafana-github-bridge.parity-bridges.workers.dev node test.js
92+
```
93+
94+
## Monitor
95+
96+
- **Worker metrics**: Cloudflare dashboard → Workers → grafana-github-bridge
97+
- **Logs**: `npx wrangler tail`
98+
- **GitHub side**: search `label:alert label:claude` in the repo issues
Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
{
2+
"name": "grafana-github-bridge",
3+
"private": true,
4+
"scripts": {
5+
"dev": "wrangler dev",
6+
"deploy": "wrangler deploy",
7+
"test": "node test.js"
8+
},
9+
"devDependencies": {
10+
"wrangler": "^3"
11+
}
12+
}

0 commit comments

Comments
 (0)