Skip to content

v0.3.0 — production embedded release#5

Merged
ferax564 merged 13 commits into
mainfrom
feat/v0.3.0-production-embedded
May 22, 2026
Merged

v0.3.0 — production embedded release#5
ferax564 merged 13 commits into
mainfrom
feat/v0.3.0-production-embedded

Conversation

@ferax564
Copy link
Copy Markdown
Owner

Summary

Makes RustQueue's embedded mode actually deliver on the durability story, and adds an ergonomic worker entry point. Built spec-first (docs/superpowers/specs/2026-05-22-production-embedded-design.md), reviewed by Codex against the real source, executed via TDD.

The bug this fixes

RustQueue::redb(...).build() never started the background scheduler, so in embedded/library mode these documented features silently didn't work:

  • backoff retries — a failed job routed to Delayed was never promoted back
  • crash recovery — a kill -9 job stayed Active forever
  • schedules — never executed (and weren't even exposable via the embedded API)

What's new

  • run_worker / run_worker_with_shutdown — managed worker: pull → handler → ack-on-Ok / fail-on-Err, auto-heartbeats the in-flight job, auto-starts housekeeping, drains on shutdown.
  • start_housekeeping() — idempotent, runtime-checked, lifetime-bound (aborts when the last RustQueue clone drops).
  • Builder knobs .stall_timeout() / .tick_interval(); RustQueue: Clone; embedded schedule CRUD + heartbeat + get_dlq_jobs.
  • examples/crash_recovery.rs (survive kill -9), production-shaped examples/axum_background_jobs.rs.
  • docs/production.md operational guide + website Production page.
  • Version bumped to 0.3.0 (all 5 locations) + CHANGELOG.

Correctness note

A code review caught that JoinHandle::drop only detaches, so a panicking handler would leak a heartbeat task and keep a job Active forever. Fixed with an AbortOnDrop RAII guard (same pattern as HousekeepingState); regression test added.

Verification (local)

  • cargo test --features sqlite354 pass, 0 fail
  • cargo clippy --all-targets --features sqlite,postgres,otel -- -D warnings → clean
  • cargo fmt --check → clean
  • cargo audit --ignore RUSTSEC-2023-0071 → clean

Not a breaking change

No public API removed; the housekeeping change is framed as a bug fix.

🤖 Generated with Claude Code

ferax564 and others added 13 commits May 22, 2026 09:49
Covers the embedded-housekeeping correctness fix (retries/schedules/
crash recovery silently broken in library mode), the run_worker
helper, enhanced Axum example, operational docs, crash-survival demo,
website, and v0.3.0 release.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Housekeeping: drop the Drop-guard handle (footgun); tie loop lifetime
  to RustQueue via ref-counted Arc, abort on last drop. start_housekeeping
  returns Result, uses Handle::try_current, sets flag only after spawn.
- run_worker: fix move-after-use (capture job_id), observe shutdown only
  while idle (never around handler), truncate fail error strings, log+
  continue on ack/fail errors.
- Pull minimal auto-heartbeat into v0.3.0 (was deferred) — without it
  run_worker corrupts any job exceeding stall_timeout; also fixes the demo.
- Expose embedded schedule CRUD + heartbeat + get_dlq_jobs; derive Clone
  for RustQueue; reject zero tick_interval.
- Add tests for the new race-prone areas. Website kept in v0.3.0 per owner.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
15 tasks: Arc/Clone refactor, HousekeepingState + builder knobs,
start_housekeeping, embedded schedule/heartbeat/dlq API, run_worker +
auto-heartbeat, crash-recovery demo, Axum example, ops docs, website,
version bump, release. TDD with full code for the engine tasks.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds RustQueue::run_worker and run_worker_with_shutdown methods in
src/worker.rs. Sequential pull/ack/fail managed loop with graceful
shutdown support. Makes stall_timeout pub(crate) so worker.rs can
access it.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Code review found JoinHandle::drop only detaches (not aborts), so a panic
in the user handler skipped hb.abort() and left a ghost heartbeat task that
kept the job's heartbeat fresh forever — stall detection could never reclaim
it. Wrap both the per-job heartbeat and the shutdown-watcher in an AbortOnDrop
RAII guard (same pattern as HousekeepingState). Also abort the watcher on
early `?` return, and widen the heartbeat margin to stall_timeout/2 clamped
to [200ms, 30s]. Adds a panic regression test.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…hutdown

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add /production guide page covering crash-only design, durability modes,
run_worker ergonomics, retries/DLQ, housekeeping, graceful shutdown, and
crash-recovery walkthrough. Wires /production route in dashboard/mod.rs.
Adds Production nav link to all 8 pages (docs/ + dashboard/static/).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Bumps all five version locations (Cargo.toml, node + python SDKs,
openapi.rs) and adds CHANGELOG with the v0.3.0 entry: embedded
housekeeping fix + run_worker/start_housekeeping + embedded schedule API.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e38c1e6c4a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/worker.rs
// Heartbeat well inside the stall window: half the timeout, clamped so we
// neither hammer storage (floor) nor drift on long timeouts (ceiling).
let hb_interval =
(self.stall_timeout / 2).clamp(Duration::from_millis(200), Duration::from_secs(30));
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Keep heartbeat interval below configured stall timeout

This fixed floor can make heartbeats slower than stall detection: if a caller sets stall_timeout below 200ms, the worker still heartbeats every 200ms, so detect_stalls may fail and requeue a job that is still running before the first heartbeat is sent. That can trigger duplicate processing/side effects for long-running handlers under low-latency stall settings. Either validate a minimum stall_timeout in the builder or derive hb_interval so it is always strictly less than the configured stall timeout.

Useful? React with 👍 / 👎.

@ferax564 ferax564 merged commit 669f9a7 into main May 22, 2026
6 checks passed
@ferax564 ferax564 deleted the feat/v0.3.0-production-embedded branch May 22, 2026 09:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant