Skip to content

Add split_key_prefix_len to index config to shard S3 object keys#6529

Draft
fulmicoton-dd wants to merge 1 commit into
mainfrom
paul-masurel/split-key-prefix-sharding
Draft

Add split_key_prefix_len to index config to shard S3 object keys#6529
fulmicoton-dd wants to merge 1 commit into
mainfrom
paul-masurel/split-key-prefix-sharding

Conversation

@fulmicoton-dd

@fulmicoton-dd fulmicoton-dd commented Jun 19, 2026

Copy link
Copy Markdown
Collaborator

Summary

  • Adds split_key_prefix_len: u8 to IndexConfig (and IndexTemplate) to configure S3 key sharding per index
  • Adds compute_split_key_prefix(split_id, prefix_len) to quickwit-common — extracts N characters from the ULID random portion (positions 10–25) as a prefix; logs a rate-limited warning and falls back to the flat scheme if the split ID is too short
  • Adds split_storage_path(split_id, prefix) to quickwit-common — builds the storage path from a precomputed prefix string; empty prefix = legacy flat scheme
  • Validates that split_key_prefix_len <= 16 (ULID random portion length) at config load time
  • SplitMetadata will gain a prefix: String field in a follow-up PR to wire the full pipeline (uploader, leaf search, merge, GC)

Backward compatibility: the field defaults to 0 (serde default), so all existing indexes and splits are unaffected. New splits on indexes with split_key_prefix_len: 2 will land at ND/01ARZ3.../01ARZ3....split paths, distributing across 1024 S3 partitions instead of one.

Test plan

  • cargo nextest run -p quickwit-common -p quickwit-config --all-features — 221 tests pass
  • cargo clippy --workspace --all-features --tests — no warnings
  • cargo +nightly fmt --all -- --check — no issues
  • Set split_key_prefix_len: 2 in an index config, verify it round-trips through serde correctly
  • Set split_key_prefix_len: 17, verify it is rejected with a clear error message

🤖 Generated with Claude Code

@fulmicoton fulmicoton force-pushed the paul-masurel/split-key-prefix-sharding branch 5 times, most recently from a0bce5d to 45cf1a3 Compare June 19, 2026 08:21
@fulmicoton-dd fulmicoton-dd force-pushed the paul-masurel/split-key-prefix-sharding branch 2 times, most recently from 3aeecaf to c64b82f Compare June 19, 2026 11:55
Recent splits share a ULID timestamp prefix, causing S3 key hotspots
under high read load. Setting split_key_prefix_len (e.g. 2) on an index
extracts N characters from the ULID random portion (positions 10–25) as
a subdirectory prefix, distributing new splits across 32^N S3 partitions.

Old splits (prefix not set in SplitMetadata) continue using the legacy
flat path; no migration needed. SplitMetadata will store a `prefix`
string computed once at creation time via compute_split_key_prefix().
@fulmicoton-dd fulmicoton-dd force-pushed the paul-masurel/split-key-prefix-sharding branch from c64b82f to a6488c9 Compare June 19, 2026 13:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant