Skip to content

virtual storage improvements#18683

Merged
clintropolis merged 5 commits into
apache:masterfrom
clintropolis:vsf-evict-instead-of-drop-and-no-missing-segments
Oct 24, 2025
Merged

virtual storage improvements#18683
clintropolis merged 5 commits into
apache:masterfrom
clintropolis:vsf-evict-instead-of-drop-and-no-missing-segments

Conversation

@clintropolis
Copy link
Copy Markdown
Member

Description

This PR makes some behavioral changes to how historical servers manage their cache files when operating in the 'virtual storage' mode added in #18176, to provide a better experience and more flexibility to callers.

The main change is that in virtual storage mode, it is now impossible for a segment to be 'missing' in the SegmentManager if a caller has a DataSegment. By 'missing' segments, I am referring to the specific Druid response a historical server provides in the query response context to cover the case of a segment drop command occurring before the query engine has a chance to obtain a reference to the files (or even before ever receiving the request), so that the broker can retry these requests to the new location the segment has been moved to (if retries are enabled the the segment is available elsewhere).

Now, in virtual storage mode, the historical will simply download the segment on demand and process it for any DataSegment object. So for example, when using ServerManager to get segments, this means that after resolving the set of DataSegments participating in a query from the timeline, the query engine can always obtain a SegmentReference so long as there is adequate disk space to download (and it is present in deep storage of course).

To complement this behavior, the segment cache itself no longer actively unmounts weakly held segment cache entries, making the coordinator drop command a near no-op. Instead, files will remain on disk until they are evicted because some other segment needs to reclaim the space so that it can be loaded. This does mean that the disk will 'fill up' over time, but is worth the benefits. To accommodate this and ensure that segment files are tracked across process boundaries (in the event of a failure or whatever), segment info files are now deleted via an 'unmount' hook on the segment files cache entry, meaning that they will not be removed until the segment files are also removed.

changes:
* in virtual storage mode, it is now impossible for a segment to be 'missing' after a caller has obtained a `DataSegment`. Now, the caller can still mount and use a segment (downloading it if necessary) even if the coordinator has issued a drop command
* to support there being no such thing as a missing segment in vsf mode, drops of weakly held cache entries in a storage location are now a no-op. instead, these cache entries will remain on disk until an eviction needs to reclaim the space for some other entry
// write the segment info file if it doesn't exist. this can happen if we are loading after a drop
final File segmentInfoCacheFile = new File(getEffectiveInfoDir(), dataSegment.getId().toString());
if (!segmentInfoCacheFile.exists()) {
jsonMapper.writeValue(segmentInfoCacheFile, dataSegment);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use FileUtils.writeAtomically to avoid leaving a partially-written file after a crash.

for (StorageLocation location : locations) {
final SegmentCacheEntry cacheEntry = location.getCacheEntry(cacheEntryIdentifier);
if (cacheEntry != null) {
cacheEntry.onUnmount.set(null);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this race with the call in removeInfoFile? That one doesn't seem to acquire a lock. Same comment for the other various calls to set(null).

final SegmentCacheEntry cacheEntry = location.getCacheEntry(entryId);
if (cacheEntry != null) {
isCached = true;
cacheEntry.onUnmount.set(delete);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this race with an actual unmounting? A lock doesn't seem to be acquired at this point in time.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea, this one isn't under a segment lock so could, fixed to just make methods on cache entry to synchronize with mount/unmount

@clintropolis clintropolis merged commit c44eb23 into apache:master Oct 24, 2025
131 of 133 checks passed
@clintropolis clintropolis deleted the vsf-evict-instead-of-drop-and-no-missing-segments branch October 24, 2025 18:54
@kgyrtkirk kgyrtkirk added this to the 36.0.0 milestone Jan 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants