Skip to content

[retention] Fix hybrid table retention: respect configured retention strategy and guard invalid endTimeMs#18186

Open
deeppatel710 wants to merge 1 commit intoapache:masterfrom
deeppatel710:fix/hybrid-table-retention-invalid-endtime-and-retention-strategy
Open

[retention] Fix hybrid table retention: respect configured retention strategy and guard invalid endTimeMs#18186
deeppatel710 wants to merge 1 commit intoapache:masterfrom
deeppatel710:fix/hybrid-table-retention-invalid-endtime-and-retention-strategy

Conversation

@deeppatel710
Copy link
Copy Markdown
Contributor

Summary

manageRetentionForHybridTable in RetentionManager has two bugs that cause the retention policy to silently misbehave for hybrid tables:

Bug 1 — Segments with invalid endTimeMs are incorrectly deleted

SegmentZKMetadata.getEndTimeMs() returns -1 by default when the END_TIME field was never written to ZK (e.g. older segments, or segments whose metadata was not fully populated). The existing check:

if (segmentZKMetadata.getEndTimeMs() < timeBoundaryMs) {
    segmentsToDelete.add(...);
}

evaluates to true for -1 since any valid timeBoundaryMs is positive, causing those segments to be incorrectly marked for deletion.

TimeRetentionStrategy.isPurgeable() already guards this case with TimeUtils.timeValueInValidRange() for non-hybrid tables. This fix brings the hybrid path to parity.

Bug 2 — Configured retentionTimeUnit / retentionTimeValue is completely ignored

In manageRetentionForTable, a RetentionStrategy is constructed from the table's configured retention, then:

  • passed to manageRetentionForRealtimeTable
  • never passed to manageRetentionForHybridTable

The hybrid path deletes segments based solely on timeBoundaryMs. This causes two failure modes:

Scenario Effect
Offline table not updated (stale time boundary) Realtime segments accumulate past their configured retention indefinitely
Time boundary advances into recent data Segments within the retention window are prematurely deleted

Fix

  1. manageRetentionForHybridTable now accepts RetentionStrategy retentionStrategy
  2. Adds TimeUtils.timeValueInValidRange(endTimeMs) guard before the time boundary comparison
  3. Adds retentionStrategy.isPurgeable(...) as an additional required condition — a segment is only deleted if it is both covered by offline data AND beyond the configured retention period

Tests

Two new regression tests added to RetentionManagerTest:

  • testManageRetentionForHybridTableSkipsSegmentWithInvalidEndTime — ensures segments with endTimeMs = -1 are not deleted
  • testManageRetentionForHybridTableDeletesSegmentBeyondRetentionWhenTimeBoundaryIsStale — ensures only the segment beyond retention (30d old) is deleted, not the one within the retention window (3d old), even when both fall below the time boundary

…strategy and deleting segments with invalid endTimeMs

Two bugs in manageRetentionForHybridTable:

1. Segments with missing end time metadata (endTimeMs = -1, the default
   when END_TIME is never written to ZK) satisfy the condition
   `endTimeMs < timeBoundaryMs` unconditionally, causing premature deletion.
   TimeRetentionStrategy already guards this with TimeUtils.timeValueInValidRange()
   for non-hybrid tables; this fix brings parity to the hybrid path.

2. The retentionStrategy built from retentionTimeUnit/retentionTimeValue was
   passed to manageRetentionForRealtimeTable but never to
   manageRetentionForHybridTable. The hybrid path deleted segments solely
   based on the time boundary, ignoring the configured retention window.
   This causes two failure modes:
   - If the time boundary stalls (offline table not updated), realtime
     segments accumulate indefinitely past their configured retention.
   - If the time boundary is recent, segments within the retention window
     can be prematurely deleted.

Fix: add TimeUtils.timeValueInValidRange() guard and require
retentionStrategy.isPurgeable() to return true before deleting a segment
in the hybrid path. Update method signature and add two regression tests.
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Apr 13, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 55.26%. Comparing base (31eac83) to head (6afca15).

❗ There is a different number of reports uploaded between BASE (31eac83) and HEAD (6afca15). Click for more details.

HEAD has 16 uploads less than BASE
Flag BASE (31eac83) HEAD (6afca15)
java-21 5 2
unittests 4 2
temurin 10 6
java-11 5 4
unittests2 2 0
integration 6 4
integration1 2 1
custom-integration1 2 1
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #18186      +/-   ##
============================================
- Coverage     63.29%   55.26%   -8.04%     
+ Complexity     1627      844     -783     
============================================
  Files          3226     2534     -692     
  Lines        196636   145611   -51025     
  Branches      30401    23407    -6994     
============================================
- Hits         124466    80472   -43994     
+ Misses        62192    58211    -3981     
+ Partials       9978     6928    -3050     
Flag Coverage Δ
custom-integration1 100.00% <ø> (ø)
integration 100.00% <ø> (ø)
integration1 100.00% <ø> (ø)
integration2 0.00% <ø> (ø)
java-11 55.23% <ø> (-8.04%) ⬇️
java-21 55.20% <ø> (-8.06%) ⬇️
temurin 55.26% <ø> (-8.04%) ⬇️
unittests 55.26% <ø> (-8.04%) ⬇️
unittests1 55.26% <ø> (+<0.01%) ⬆️
unittests2 ?

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants