Skip to content

fix(server): prevent consumer offset skip during concurrent produce+consume#2958

Merged
hubcio merged 1 commit intomasterfrom
skipping-offset
Mar 23, 2026
Merged

fix(server): prevent consumer offset skip during concurrent produce+consume#2958
hubcio merged 1 commit intomasterfrom
skipping-offset

Conversation

@hubcio
Copy link
Copy Markdown
Contributor

@hubcio hubcio commented Mar 17, 2026

After server restart, MemoryMessageJournal was created via
Default with base_offset=0. The three-tier message routing
used this value as in_memory_floor, which prevented all disk
reads when the journal had data. Consumers using
PollingStrategy::Next with auto_commit permanently skipped
the last few disk messages when new messages arrived in the
journal concurrently.

Root cause: journal.init() existed but was never called in
production code - only in tests. After restart with N
messages on disk, the first journal append set
current_offset = 0 + batch_count - 1 instead of
N + batch_count - 1. slice_by_offset then silently returned
messages from the wrong range (clamping to index 0 when
start_offset < first_offset).

The fix has six layers:

  1. Initialize journal base_offset at both bootstrap paths
    (shard/mod.rs and the lazy init_partition_inner path in
    shard/system/partitions.rs). Fix current_offset
    computation to use max end_offset across segments with
    data, not just the active segment. Fix
    should_increment_offset to check any segment has data,
    not just current_offset > 0.

  2. Self-heal base_offset in journal.append() on first
    append when messages_count==0, with debug_assert
    validation.

  3. Change slice_by_offset to return None when
    start_offset < first_offset instead of silently
    returning data from a higher offset range.

  4. Remove Default from MemoryMessageJournal so the bug
    class is structurally impossible. Add explicit
    empty()/at_offset() constructors. Add typed query
    methods (first_offset, last_offset, first_timestamp,
    last_timestamp) on the Journal trait. Delete dead
    Clone impl.

  5. Clamp consumer offsets that are ahead of partition
    offset after crash (OOM, SIGKILL) at both bootstrap
    paths. Prevents permanent empty polls when
    auto_commit persisted an offset beyond what was
    flushed to disk.

  6. Add consumer offset barrier to time-based expiry
    (delete_expired_segments_for_partition), matching the
    existing size-based barrier. Log a warning when the
    barrier blocks segment deletion, identifying the
    blocking consumer kind, ID, and offset. Fix
    is_expired to treat end_timestamp=0 as non-expired
    (prevents instant deletion of segments with empty
    indexes during bootstrap).

Reference issues: #2715 and #2924.

@codecov
Copy link
Copy Markdown

codecov bot commented Mar 17, 2026

Codecov Report

❌ Patch coverage is 77.63578% with 70 lines in your changes missing coverage. Please review.
✅ Project coverage is 72.07%. Comparing base (6180d88) to head (42b58b1).
⚠️ Report is 1 commits behind head on master.

Files with missing lines Patch % Lines
core/server/src/streaming/partitions/ops.rs 69.36% 30 Missing and 4 partials ⚠️
core/server/src/shard/system/partitions.rs 42.10% 20 Missing and 2 partials ⚠️
core/server/src/streaming/partitions/journal.rs 85.00% 6 Missing ⚠️
core/server/src/shard/system/segments.rs 92.85% 3 Missing and 1 partial ⚠️
...ore/common/src/types/message/messages_batch_set.rs 77.77% 1 Missing and 1 partial ⚠️
core/server/src/shard/mod.rs 93.54% 2 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master    #2958      +/-   ##
============================================
+ Coverage     72.02%   72.07%   +0.04%     
  Complexity      930      930              
============================================
  Files          1124     1124              
  Lines         93669    93832     +163     
  Branches      71017    71192     +175     
============================================
+ Hits          67469    67627     +158     
+ Misses        23631    23612      -19     
- Partials       2569     2593      +24     
Flag Coverage Δ
csharp 67.43% <ø> (-0.21%) ⬇️
go 38.68% <ø> (ø)
java 62.08% <ø> (ø)
node 91.37% <ø> (-0.04%) ⬇️
python 81.43% <ø> (ø)
rust 72.78% <77.63%> (+0.05%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
...ore/common/src/types/message/messages_batch_mut.rs 50.20% <100.00%> (+2.45%) ⬆️
core/common/src/types/segment.rs 84.21% <100.00%> (+8.34%) ⬆️
core/server/src/bootstrap.rs 80.90% <100.00%> (ø)
core/server/src/shard/system/messages.rs 89.18% <100.00%> (ø)
...server/src/streaming/partitions/local_partition.rs 100.00% <100.00%> (ø)
core/server/src/streaming/partitions/log.rs 72.50% <100.00%> (-7.32%) ⬇️
...ore/common/src/types/message/messages_batch_set.rs 72.11% <77.77%> (+7.89%) ⬆️
core/server/src/shard/mod.rs 83.92% <93.54%> (+1.36%) ⬆️
core/server/src/shard/system/segments.rs 89.40% <92.85%> (-0.21%) ⬇️
core/server/src/streaming/partitions/journal.rs 85.36% <85.00%> (+12.03%) ⬆️
... and 2 more

... and 29 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@hubcio
Copy link
Copy Markdown
Contributor Author

hubcio commented Mar 18, 2026

This PR waits for @lukaszzborek comment. Once he says that his problem is no longer visible, we can merge.

@hubcio hubcio force-pushed the skipping-offset branch 5 times, most recently from 5a0958b to 3394c61 Compare March 20, 2026 14:20
@lukaszzborek
Copy link
Copy Markdown
Contributor

After those changes, I don't see a problem with skipping offset. So I think is fixed now

…mer message skip

After server restart, MemoryMessageJournal was created via
Default with next_offset=0. The three-tier message routing
used this value as in_memory_floor, which prevented all disk
reads when the journal had data. Consumers using
PollingStrategy::Next with auto_commit permanently skipped
the last few disk messages when new messages arrived in the
journal concurrently.

Root cause: journal.init() existed but was never called in
production code - only in tests. After restart with N
messages on disk, the first journal append set
current_offset = 0 + batch_count - 1 instead of
N + batch_count - 1. slice_by_offset then silently returned
messages from the wrong range (clamping to index 0 when
start_offset < first_offset).

The fix has four layers:

1. Initialize journal next_offset at both bootstrap paths
   (shard/mod.rs and the lazy init_partition_inner path in
   shard/system/partitions.rs, which also lacked the
   current_offset, should_increment, and consumer clamping
   fixes from the previous commit).

2. Self-heal next_offset in journal.append() on first
   append when messages_count==0, with debug_assert
   validation.

3. Change slice_by_offset to return None when
   start_offset < first_offset instead of silently
   returning data from a higher offset range.

4. Remove Default from MemoryMessageJournal so the bug
   class is structurally impossible. Add explicit
   empty()/at_offset() constructors. Replace inner()
   exposure with typed query methods (first_offset,
   last_offset, first_timestamp, last_timestamp) on
   the Journal trait. Rename base_offset to next_offset.
   Delete dead Clone impl. Document single-threaded
   safety invariant on the snapshot-then-read pattern.

Fixes #2715
Fixes #2924
@hubcio hubcio merged commit 298cd32 into master Mar 23, 2026
79 checks passed
@hubcio hubcio deleted the skipping-offset branch March 23, 2026 13:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants