fix(server): prevent consumer offset skip during concurrent produce+consume by hubcio · Pull Request #2958 · apache/iggy

hubcio · 2026-03-17T08:31:55Z

After server restart, MemoryMessageJournal was created via
Default with base_offset=0. The three-tier message routing
used this value as in_memory_floor, which prevented all disk
reads when the journal had data. Consumers using
PollingStrategy::Next with auto_commit permanently skipped
the last few disk messages when new messages arrived in the
journal concurrently.

Root cause: journal.init() existed but was never called in
production code - only in tests. After restart with N
messages on disk, the first journal append set
current_offset = 0 + batch_count - 1 instead of
N + batch_count - 1. slice_by_offset then silently returned
messages from the wrong range (clamping to index 0 when
start_offset < first_offset).

The fix has six layers:

Initialize journal base_offset at both bootstrap paths
(shard/mod.rs and the lazy init_partition_inner path in
shard/system/partitions.rs). Fix current_offset
computation to use max end_offset across segments with
data, not just the active segment. Fix
should_increment_offset to check any segment has data,
not just current_offset > 0.
Self-heal base_offset in journal.append() on first
append when messages_count==0, with debug_assert
validation.
Change slice_by_offset to return None when
start_offset < first_offset instead of silently
returning data from a higher offset range.
Remove Default from MemoryMessageJournal so the bug
class is structurally impossible. Add explicit
empty()/at_offset() constructors. Add typed query
methods (first_offset, last_offset, first_timestamp,
last_timestamp) on the Journal trait. Delete dead
Clone impl.
Clamp consumer offsets that are ahead of partition
offset after crash (OOM, SIGKILL) at both bootstrap
paths. Prevents permanent empty polls when
auto_commit persisted an offset beyond what was
flushed to disk.
Add consumer offset barrier to time-based expiry
(delete_expired_segments_for_partition), matching the
existing size-based barrier. Log a warning when the
barrier blocks segment deletion, identifying the
blocking consumer kind, ID, and offset. Fix
is_expired to treat end_timestamp=0 as non-expired
(prevents instant deletion of segments with empty
indexes during bootstrap).

Reference issues: #2715 and #2924.

codecov · 2026-03-17T08:34:03Z

Codecov Report

❌ Patch coverage is 77.63578% with 70 lines in your changes missing coverage. Please review.
✅ Project coverage is 72.07%. Comparing base (6180d88) to head (42b58b1).
⚠️ Report is 1 commits behind head on master.

Files with missing lines	Patch %	Lines
core/server/src/streaming/partitions/ops.rs	69.36%	30 Missing and 4 partials ⚠️
core/server/src/shard/system/partitions.rs	42.10%	20 Missing and 2 partials ⚠️
core/server/src/streaming/partitions/journal.rs	85.00%	6 Missing ⚠️
core/server/src/shard/system/segments.rs	92.85%	3 Missing and 1 partial ⚠️
...ore/common/src/types/message/messages_batch_set.rs	77.77%	1 Missing and 1 partial ⚠️
core/server/src/shard/mod.rs	93.54%	2 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff              @@
##             master    #2958      +/-   ##
============================================
+ Coverage     72.02%   72.07%   +0.04%     
  Complexity      930      930              
============================================
  Files          1124     1124              
  Lines         93669    93832     +163     
  Branches      71017    71192     +175     
============================================
+ Hits          67469    67627     +158     
+ Misses        23631    23612      -19     
- Partials       2569     2593      +24

Flag	Coverage Δ
csharp	`67.43% <ø> (-0.21%)`	⬇️
go	`38.68% <ø> (ø)`
java	`62.08% <ø> (ø)`
node	`91.37% <ø> (-0.04%)`	⬇️
python	`81.43% <ø> (ø)`
rust	`72.78% <77.63%> (+0.05%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines	Coverage Δ
...ore/common/src/types/message/messages_batch_mut.rs	`50.20% <100.00%> (+2.45%)`	⬆️
core/common/src/types/segment.rs	`84.21% <100.00%> (+8.34%)`	⬆️
core/server/src/bootstrap.rs	`80.90% <100.00%> (ø)`
core/server/src/shard/system/messages.rs	`89.18% <100.00%> (ø)`
...server/src/streaming/partitions/local_partition.rs	`100.00% <100.00%> (ø)`
core/server/src/streaming/partitions/log.rs	`72.50% <100.00%> (-7.32%)`	⬇️
...ore/common/src/types/message/messages_batch_set.rs	`72.11% <77.77%> (+7.89%)`	⬆️
core/server/src/shard/mod.rs	`83.92% <93.54%> (+1.36%)`	⬆️
core/server/src/shard/system/segments.rs	`89.40% <92.85%> (-0.21%)`	⬇️
core/server/src/streaming/partitions/journal.rs	`85.36% <85.00%> (+12.03%)`	⬆️
... and 2 more

... and 29 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

hubcio · 2026-03-18T11:19:28Z

This PR waits for @lukaszzborek comment. Once he says that his problem is no longer visible, we can merge.

lukaszzborek · 2026-03-20T16:43:03Z

After those changes, I don't see a problem with skipping offset. So I think is fixed now

core/server/src/shard/system/segments.rs

core/server/src/streaming/partitions/journal.rs

…mer message skip After server restart, MemoryMessageJournal was created via Default with next_offset=0. The three-tier message routing used this value as in_memory_floor, which prevented all disk reads when the journal had data. Consumers using PollingStrategy::Next with auto_commit permanently skipped the last few disk messages when new messages arrived in the journal concurrently. Root cause: journal.init() existed but was never called in production code - only in tests. After restart with N messages on disk, the first journal append set current_offset = 0 + batch_count - 1 instead of N + batch_count - 1. slice_by_offset then silently returned messages from the wrong range (clamping to index 0 when start_offset < first_offset). The fix has four layers: 1. Initialize journal next_offset at both bootstrap paths (shard/mod.rs and the lazy init_partition_inner path in shard/system/partitions.rs, which also lacked the current_offset, should_increment, and consumer clamping fixes from the previous commit). 2. Self-heal next_offset in journal.append() on first append when messages_count==0, with debug_assert validation. 3. Change slice_by_offset to return None when start_offset < first_offset instead of silently returning data from a higher offset range. 4. Remove Default from MemoryMessageJournal so the bug class is structurally impossible. Add explicit empty()/at_offset() constructors. Replace inner() exposure with typed query methods (first_offset, last_offset, first_timestamp, last_timestamp) on the Journal trait. Rename base_offset to next_offset. Delete dead Clone impl. Document single-threaded safety invariant on the snapshot-then-read pattern. Fixes #2715 Fixes #2924

hubcio force-pushed the skipping-offset branch 5 times, most recently from 12acd1e to 1d05ba5 Compare March 17, 2026 10:17

hubcio mentioned this pull request Mar 18, 2026

Non-deterministic consumer offset jump to "latest" on large streams (~50M records) #2715

Open

hubcio force-pushed the skipping-offset branch 5 times, most recently from 5a0958b to 3394c61 Compare March 20, 2026 14:20

numinnex reviewed Mar 21, 2026

View reviewed changes

core/server/src/shard/system/segments.rs Show resolved Hide resolved

core/server/src/streaming/partitions/journal.rs Outdated Show resolved Hide resolved

hubcio force-pushed the skipping-offset branch from 3394c61 to a2f4aef Compare March 23, 2026 10:32

hubcio force-pushed the skipping-offset branch from 329e620 to 42b58b1 Compare March 23, 2026 12:34

numinnex approved these changes Mar 23, 2026

View reviewed changes

mmodzelewski approved these changes Mar 23, 2026

View reviewed changes

hubcio merged commit 298cd32 into master Mar 23, 2026
79 checks passed

hubcio deleted the skipping-offset branch March 23, 2026 13:09

hubcio mentioned this pull request Mar 23, 2026

Memory leak (?) in iggy server and consumer offsets skip #2924

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(server): prevent consumer offset skip during concurrent produce+consume#2958

fix(server): prevent consumer offset skip during concurrent produce+consume#2958
hubcio merged 1 commit intomasterfrom
skipping-offset

hubcio commented Mar 17, 2026 •

edited

Loading

Uh oh!

codecov bot commented Mar 17, 2026 •

edited

Loading

Uh oh!

hubcio commented Mar 18, 2026

Uh oh!

lukaszzborek commented Mar 20, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

hubcio commented Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

hubcio commented Mar 18, 2026

Uh oh!

lukaszzborek commented Mar 20, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

hubcio commented Mar 17, 2026 •

edited

Loading

codecov bot commented Mar 17, 2026 •

edited

Loading