Skip to content

fix(webhook): add dlq_dropped metric + warn on DLQ ring eviction (PILOT-313)#31

Open
matthew-pilot wants to merge 1 commit into
mainfrom
openclaw/pilot-313-20260531-073000
Open

fix(webhook): add dlq_dropped metric + warn on DLQ ring eviction (PILOT-313)#31
matthew-pilot wants to merge 1 commit into
mainfrom
openclaw/pilot-313-20260531-073000

Conversation

@matthew-pilot
Copy link
Copy Markdown
Collaborator

What

The webhook dead-letter queue silently dropped the oldest entry when its 100-entry ring buffer was full, with no metric or log to detect the data loss.

Root cause

addToDLQ at webhook/webhook.go:200-201 used d.dlqItems[1:] to evict the oldest entry without incrementing any counter or emitting a log line.

Fix

  • Added dlqDropped atomic.Uint64 field to the dispatcher struct
  • Increment the counter and emit slog.Warn when an entry is evicted from the DLQ ring
  • Expose dlq_dropped in the HandleGetWebhook response alongside existing delivered/failed/dropped stats
  • Added test assertion for the counter after ring eviction (TestDispatcher_AddToDLQ_DropsOldestWhenFull)

Verification

go build ./...   ✓
go vet ./...     ✓
go test ./...    ✓ (all packages, including webhook 10s)

Files changed

2 files, +20 / -11

Closes PILOT-313

…OT-313)

The dead-letter queue silently dropped the oldest entry when the
100-entry ring was full, with no metric or log to detect the loss.
This adds:
- atomic.Uint64 dlqDropped counter on the dispatcher
- slog.Warn when an entry is evicted from the DLQ ring
- dlq_dropped field in HandleGetWebhook response for monitoring
- Test assertion verifying the counter after ring eviction

Closes PILOT-313
@codecov
Copy link
Copy Markdown

codecov Bot commented May 31, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@matthew-pilot
Copy link
Copy Markdown
Collaborator Author

📊 PR Status — #31 PILOT-313

Field Value
State OPEN
Mergeable ✅ MERGEABLE (CLEAN)
Draft No
Branch openclaw/pilot-313-20260531-073000main
Files 2 files, +20/−11 (webhook/webhook.go, webhook/zz_more_test.go)
Labels (none)

CI Checks (2/2 passing)

Check Result
test ✅ pass
codecov/patch ✅ pass

Canary

🧪 Runningrun #26706697221 (status: queued)

Jira — PILOT-313

Ticket claimed → fix implemented → PR opened. Awaiting canary results.

Last Activity

  • PR created: 2026-05-31T07:36:52Z by matthew-pilot
  • Last watcher scan: 2026-05-31T07:38:25Z

@matthew-pilot
Copy link
Copy Markdown
Collaborator Author

🔍 PR Explanation — #31 PILOT-313

What this does

Adds a dlq_dropped counter + warning log to the webhook dead-letter queue so that silent ring-buffer eviction is detectable.

The problem

addToDLQ in webhook/webhook.go uses a fixed-size 100-entry ring buffer for failed webhook events. When the buffer fills, the oldest entry is silently dropped via d.dlqItems[1:] — no metric, no log, no visibility into data loss.

The fix

1. New dlqDropped atomic counter (webhook/webhook.go)

  • Added dlqDropped atomic.Uint64 to the dispatcher struct
  • Incremented when an entry is evicted from the DLQ ring

2. Warning log on eviction

  • slog.Warn("DLQ ring buffer evicted oldest entry", ...) emitted each time an entry is dropped
  • Includes context: event ID, signature type, and IP source

3. Exposed in status endpoint

  • HandleGetWebhook response now includes dlq_dropped alongside existing delivered, failed, and dropped stats

4. Test coverage

  • TestDispatcher_AddToDLQ_DropsOldestWhenFull extended with assertion that dlqDropped count matches expected evictions (5 pushes into cap 2 = 3 drops)

Impact

Aspect Before After
DLQ eviction Silent drop Counter + Warn log
Metric visibility None dlq_dropped in webhook status
Lines changed +20 / −11 (2 files)

CI Checks (2/2 passing)

Check Result
test ✅ pass
codecov/patch ✅ success

1 similar comment
@matthew-pilot
Copy link
Copy Markdown
Collaborator Author

🔍 PR Explanation — #31 PILOT-313

What this does

Adds a dlq_dropped counter + warning log to the webhook dead-letter queue so that silent ring-buffer eviction is detectable.

The problem

addToDLQ in webhook/webhook.go uses a fixed-size 100-entry ring buffer for failed webhook events. When the buffer fills, the oldest entry is silently dropped via d.dlqItems[1:] — no metric, no log, no visibility into data loss.

The fix

1. New dlqDropped atomic counter (webhook/webhook.go)

  • Added dlqDropped atomic.Uint64 to the dispatcher struct
  • Incremented when an entry is evicted from the DLQ ring

2. Warning log on eviction

  • slog.Warn("DLQ ring buffer evicted oldest entry", ...) emitted each time an entry is dropped
  • Includes context: event ID, signature type, and IP source

3. Exposed in status endpoint

  • HandleGetWebhook response now includes dlq_dropped alongside existing delivered, failed, and dropped stats

4. Test coverage

  • TestDispatcher_AddToDLQ_DropsOldestWhenFull extended with assertion that dlqDropped count matches expected evictions (5 pushes into cap 2 = 3 drops)

Impact

Aspect Before After
DLQ eviction Silent drop Counter + Warn log
Metric visibility None dlq_dropped in webhook status
Lines changed +20 / −11 (2 files)

CI Checks (2/2 passing)

Check Result
test ✅ pass
codecov/patch ✅ success

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant