Fix parallel segment reload race on IndexLoadingConfig tier; add IndexLoadingConfig.copy() to avoid per-segment ZK fetches#18174
Conversation
…xLoadingConfig.copy() to avoid per-segment ZK fetches
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #18174 +/- ##
============================================
+ Coverage 63.18% 63.23% +0.05%
Complexity 1616 1616
============================================
Files 3214 3214
Lines 195838 195842 +4
Branches 30251 30251
============================================
+ Hits 123734 123836 +102
+ Misses 62236 62105 -131
- Partials 9868 9901 +33
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
Requesting review on this. Integration test failure seems unrelated to this PR - |
| } | ||
|
|
||
| private void reloadSegments(List<SegmentDataManager> segmentDataManagers, IndexLoadingConfig indexLoadingConfig, | ||
| private void reloadSegmentDataManagersInParallel(List<SegmentDataManager> segmentDataManagers, |
There was a problem hiding this comment.
I think we can keep the original name, and mention the parallel to this method's javadoc if you feel like.
| IndexLoadingConfig copy = new IndexLoadingConfig(_instanceDataManagerConfig, _tableConfig, _schema); | ||
| copy.setTableDataDir(_tableDataDir); | ||
| return copy; | ||
| } |
There was a problem hiding this comment.
Let's not have this method. This doesn't copy everything in this object (e.g. _readMode is not copied here), which would fail to meet the semantic of "copy". Unless you make sure everything is honored while performing the copy.
| _segmentReloadSemaphore.acquire(segmentName, _logger); | ||
| try { | ||
| reloadSegment(segmentDataManager, indexLoadingConfig, forceDownload); | ||
| reloadSegment(segmentDataManager, indexLoadingConfigTemplate.copy(), forceDownload); |
There was a problem hiding this comment.
If we have 100k of segments then we'll have as many copies of IndexLoadingConfig objects here, with only the segment tier aren't the same.
Segment tiers are expected to be only a few per server. Can we have, for example, a map from tier to index loading config so we only need to create as many copies as the amount of tiers?
Problem
When multiple segments were reloaded in parallel (reloadAllSegments / batched reloadSegments), all tasks shared a single
IndexLoadingConfigfrom onefetchIndexLoadingConfig()call. Each reload path callssetSegmentTier(...)(and related updates) on that shared instance, so concurrent tasks could overwrite each other’s tier. With tier overrides in table config, that could apply the wrong preprocessing / loading settings (#18164).Fix
BaseTableDataManager.reloadSegments: Renamed toreloadSegmentDataManagersInParalleland callsfetchIndexLoadingConfig()once per batch, then for each parallel task passindexLoadingConfigTemplate.copy()intoreloadSegment, so every segment gets its own config for tier and other per-segment mutation.IndexLoadingConfig.copy(): New method that builds a newIndexLoadingConfigwith the same instance / table / schema references and tableDataDir, without copying segmentTier (each copy starts clean, like a fresh fetch). This keeps correctness while avoiding N repeated ZK reads (one fetch + N light copies instead of N fetches).Tests
IndexLoadingConfigTest: Asserts copy shares TableConfig / Schema, matches tableDataDir, does not inherit segmentTier from the template, and that tier changes on the copy do not affect the original.Fixes #18164