Add split_key_prefix_len to index config to shard S3 object keys#6529
Draft
fulmicoton-dd wants to merge 1 commit into
Draft
Add split_key_prefix_len to index config to shard S3 object keys#6529fulmicoton-dd wants to merge 1 commit into
fulmicoton-dd wants to merge 1 commit into
Conversation
a0bce5d to
45cf1a3
Compare
3aeecaf to
c64b82f
Compare
Recent splits share a ULID timestamp prefix, causing S3 key hotspots under high read load. Setting split_key_prefix_len (e.g. 2) on an index extracts N characters from the ULID random portion (positions 10–25) as a subdirectory prefix, distributing new splits across 32^N S3 partitions. Old splits (prefix not set in SplitMetadata) continue using the legacy flat path; no migration needed. SplitMetadata will store a `prefix` string computed once at creation time via compute_split_key_prefix().
c64b82f to
a6488c9
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
split_key_prefix_len: u8toIndexConfig(andIndexTemplate) to configure S3 key sharding per indexcompute_split_key_prefix(split_id, prefix_len)toquickwit-common— extracts N characters from the ULID random portion (positions 10–25) as a prefix; logs a rate-limited warning and falls back to the flat scheme if the split ID is too shortsplit_storage_path(split_id, prefix)toquickwit-common— builds the storage path from a precomputed prefix string; empty prefix = legacy flat schemesplit_key_prefix_len <= 16(ULID random portion length) at config load timeSplitMetadatawill gain aprefix: Stringfield in a follow-up PR to wire the full pipeline (uploader, leaf search, merge, GC)Backward compatibility: the field defaults to
0(serde default), so all existing indexes and splits are unaffected. New splits on indexes withsplit_key_prefix_len: 2will land atND/01ARZ3.../01ARZ3....splitpaths, distributing across 1024 S3 partitions instead of one.Test plan
cargo nextest run -p quickwit-common -p quickwit-config --all-features— 221 tests passcargo clippy --workspace --all-features --tests— no warningscargo +nightly fmt --all -- --check— no issuessplit_key_prefix_len: 2in an index config, verify it round-trips through serde correctlysplit_key_prefix_len: 17, verify it is rejected with a clear error message🤖 Generated with Claude Code