[retention] Fix hybrid table retention: respect configured retention strategy and guard invalid endTimeMs#18186
Open
deeppatel710 wants to merge 1 commit intoapache:masterfrom
Conversation
…strategy and deleting segments with invalid endTimeMs
Two bugs in manageRetentionForHybridTable:
1. Segments with missing end time metadata (endTimeMs = -1, the default
when END_TIME is never written to ZK) satisfy the condition
`endTimeMs < timeBoundaryMs` unconditionally, causing premature deletion.
TimeRetentionStrategy already guards this with TimeUtils.timeValueInValidRange()
for non-hybrid tables; this fix brings parity to the hybrid path.
2. The retentionStrategy built from retentionTimeUnit/retentionTimeValue was
passed to manageRetentionForRealtimeTable but never to
manageRetentionForHybridTable. The hybrid path deleted segments solely
based on the time boundary, ignoring the configured retention window.
This causes two failure modes:
- If the time boundary stalls (offline table not updated), realtime
segments accumulate indefinitely past their configured retention.
- If the time boundary is recent, segments within the retention window
can be prematurely deleted.
Fix: add TimeUtils.timeValueInValidRange() guard and require
retentionStrategy.isPurgeable() to return true before deleting a segment
in the hybrid path. Update method signature and add two regression tests.
Codecov Report✅ All modified and coverable lines are covered by tests.
Additional details and impacted files@@ Coverage Diff @@
## master #18186 +/- ##
============================================
- Coverage 63.29% 55.26% -8.04%
+ Complexity 1627 844 -783
============================================
Files 3226 2534 -692
Lines 196636 145611 -51025
Branches 30401 23407 -6994
============================================
- Hits 124466 80472 -43994
+ Misses 62192 58211 -3981
+ Partials 9978 6928 -3050
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
manageRetentionForHybridTableinRetentionManagerhas two bugs that cause the retention policy to silently misbehave for hybrid tables:Bug 1 — Segments with invalid
endTimeMsare incorrectly deletedSegmentZKMetadata.getEndTimeMs()returns-1by default when theEND_TIMEfield was never written to ZK (e.g. older segments, or segments whose metadata was not fully populated). The existing check:evaluates to
truefor-1since any validtimeBoundaryMsis positive, causing those segments to be incorrectly marked for deletion.TimeRetentionStrategy.isPurgeable()already guards this case withTimeUtils.timeValueInValidRange()for non-hybrid tables. This fix brings the hybrid path to parity.Bug 2 — Configured
retentionTimeUnit/retentionTimeValueis completely ignoredIn
manageRetentionForTable, aRetentionStrategyis constructed from the table's configured retention, then:manageRetentionForRealtimeTable✅manageRetentionForHybridTable❌The hybrid path deletes segments based solely on
timeBoundaryMs. This causes two failure modes:Fix
manageRetentionForHybridTablenow acceptsRetentionStrategy retentionStrategyTimeUtils.timeValueInValidRange(endTimeMs)guard before the time boundary comparisonretentionStrategy.isPurgeable(...)as an additional required condition — a segment is only deleted if it is both covered by offline data AND beyond the configured retention periodTests
Two new regression tests added to
RetentionManagerTest:testManageRetentionForHybridTableSkipsSegmentWithInvalidEndTime— ensures segments withendTimeMs = -1are not deletedtestManageRetentionForHybridTableDeletesSegmentBeyondRetentionWhenTimeBoundaryIsStale— ensures only the segment beyond retention (30d old) is deleted, not the one within the retention window (3d old), even when both fall below the time boundary