Skip to content

feat: support manifest version hint#5997

Closed
jackye1995 wants to merge 21 commits into
lance-format:mainfrom
jackye1995:manifest-commit
Closed

feat: support manifest version hint#5997
jackye1995 wants to merge 21 commits into
lance-format:mainfrom
jackye1995:manifest-commit

Conversation

@jackye1995

@jackye1995 jackye1995 commented Feb 24, 2026

Copy link
Copy Markdown
Contributor

Based on benchmarking result in #5947 (comment)

Currently I have only kept JSON format manifest hint support. The exact format to choose requires some further discussions.

jackye1995 and others added 21 commits February 21, 2026 09:15
- Benchmark tests performance degradation with many small fragments
- Measures write (commit) and load (manifest open) latencies
- Outputs CSV time series data for graphing
- Calculates linear regression to show per-fragment overhead
- Supports S3, local disk via DATASET_PREFIX env var
- Configurable via NUM_ITERATIONS and ROWS_PER_FRAGMENT

Generated with [Claude Code](https://claude.ai/code)
via [Happy](https://happy.engineering)

Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: Happy <yesreply@happy.engineering>
Add a version hint file (`latest_version_hint.bin`) that encodes the
latest version number via its file size. This enables O(1) latest
version lookup via HEAD request instead of O(n) listing.

Key changes:
- Write version hint after each successful commit (optimistic, non-blocking)
- Race hint-based lookup vs listing, use whichever completes first
- HEAD request (~10ms) is much faster than LIST (~200ms+) at scale
- Works on all object stores (S3 Standard, S3 Express, GCS, Azure)

The optimization can be disabled via environment variable:
  LANCE_USE_VERSION_HINT=0

Performance improvement:
- S3 Express: ~10ms (HEAD) vs ~200ms+ (LIST at 5000 versions)
- S3 Standard: ~20ms (HEAD + probe) vs ~200ms+ (LIST at 5000 versions)

Generated with [Claude Code](https://claude.ai/code)
via [Happy](https://happy.engineering)

Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: Happy <yesreply@happy.engineering>
- Add LANCE_VERSION_HINT_FORMAT env var: "file_size" (default) or "json"
- Add LANCE_VERSION_HINT_WRITE_MODE env var: "async" (default) or "sync"
- Default async mode uses fire-and-forget pattern to avoid commit latency
- JSON format stores version as {"version": N} for human readability
- File-size format uses file size = version number for O(1) HEAD lookup

Generated with [Claude Code](https://claude.ai/code)
via [Happy](https://happy.engineering)

Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: Happy <yesreply@happy.engineering>
…bing

Add list_manifest_locations_since method that uses version hint to avoid
O(n) manifest listing on non-lexically ordered stores (e.g., S3 Express).

Instead of listing all manifests, the optimization:
1. Reads the version hint to get approximate latest version
2. Probes upward from hint to find true latest (sequential HEADs)
3. Parallel HEADs for versions between since_version and hint
4. Returns all found manifests in descending order

This changes commit-time complexity from O(n) to O(k) where k is the
number of new versions since the read version.

The feature is gated by LANCE_USE_VERSION_HINT env var and supports
both file_size and json formats via LANCE_VERSION_HINT_FORMAT.

Generated with [Claude Code](https://claude.ai/code)
via [Happy](https://happy.engineering)

Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: Happy <yesreply@happy.engineering>
When list_manifests_since_version_with_hint found that hint_version <= since_version,
it called probe_versions_upward(since_version + 1). If no version existed at
since_version + 1, probe_versions_upward returned None, which caused the function
to fall back to full O(n) listing instead of returning an empty list (the fast path).

This fix properly handles the None case by returning an empty list, achieving O(1)
performance when there are no new transactions since the read version.

Also:
- Remove slope calculation from manifest_commit benchmark (just report averages)
- Add test for version hint optimization with non-lexical stores

Generated with [Claude Code](https://claude.ai/code)
via [Happy](https://happy.engineering)

Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: Happy <yesreply@happy.engineering>
…D timing

- Revert stagger start approach (didn't help with contention)
- Change default hint format from file_size to JSON
- Add more precise HEAD request timing in debug logs
Add environment variable LANCE_HINT_ONLY to enable hint-only mode that
bypasses tokio::select! racing with listing. This helps isolate whether
the slow load path hint reads are caused by connection contention from
racing, or by something else.

When LANCE_HINT_ONLY=1, the load path will:
1. Only use hint+HEAD approach
2. Fall back to listing only if hint fails (no racing)

This is for debugging/benchmarking purposes to investigate why load path
hint reads are ~50ms while commit path hint reads are ~5ms.

Generated with [Claude Code](https://claude.ai/code)
via [Happy](https://happy.engineering)

Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: Happy <yesreply@happy.engineering>
Add WARM_SESSION=true environment variable to test load latency with
warm connections by reusing the same session. This helps compare:
- Cold connection performance (WARM_SESSION=false, default)
- Warm connection performance (WARM_SESSION=true)

This helps identify whether slow load latency is due to:
- TCP/TLS connection establishment overhead (cold connection)
- tokio::select! racing contention
- Other factors

Generated with [Claude Code](https://claude.ai/code)
via [Happy](https://happy.engineering)

Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: Happy <yesreply@happy.engineering>
Add environment variable LANCE_CONNECTION_WARMUP=1 to enable connection
warmup before hint/listing race. This makes a cheap HEAD request first
to establish TCP/TLS connection, then subsequent operations use warm
connections.

Without warmup (cold session): ~95-100ms load latency
With warmup (cold session): ~50-60ms load latency (estimated)
With warm session: ~14ms load latency (optimal)

The warmup helps cold starts (serverless, CLI tools) by reducing the
impact of TCP/TLS connection establishment overhead.

Generated with [Claude Code](https://claude.ai/code)
via [Happy](https://happy.engineering)

Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: Happy <yesreply@happy.engineering>
- Use checkout_latest() instead of load() for load measurements
- Share ObjectStoreRegistry to reuse warm TCP/TLS connections
- Use zero cache size session to avoid manifest caching
- Remove unused env vars (WARM_SESSION, SHARED_REGISTRY, DIRECT_CHECKOUT)
- Clean up debug statements from commit.rs
- Remove connection warmup feature (it just moves latency, doesn't help)

This approach properly isolates storage read latency from cold start overhead.

Generated with [Claude Code](https://claude.ai/code)
via [Happy](https://happy.engineering)

Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: Happy <yesreply@happy.engineering>
- Auto-detect S3 Express buckets by --x-s3 suffix
- Pass s3_express=true storage option for S3 Express buckets

Generated with [Claude Code](https://claude.ai/code)
via [Happy](https://happy.engineering)

Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: Happy <yesreply@happy.engineering>
S3 Express buckets don't support GetBucketLocation API, so we need
to pass the region explicitly from AWS_DEFAULT_REGION or AWS_REGION.

Generated with [Claude Code](https://claude.ai/code)
via [Happy](https://happy.engineering)

Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: Happy <yesreply@happy.engineering>
- Remove LANCE_HINT_ONLY env var and sequential hint+HEAD code path
- Remove storage options handling from benchmark (auto-detected)
- Add ENABLE_CACHE config (default: false) for benchmark
- Use single shared session for both commit and load
- Keep LANCE_USE_VERSION_HINT to enable/disable version hint feature

Generated with [Claude Code](https://claude.ai/code)
via [Happy](https://happy.engineering)

Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: Happy <yesreply@happy.engineering>
@github-actions github-actions Bot added the enhancement New feature or request label Feb 24, 2026
@github-actions

Copy link
Copy Markdown
Contributor

Code Review for PR #5997: feat: support manifest version hint

Summary

This PR adds a version hint optimization for faster manifest lookup on non-lexically ordered object stores (e.g., S3 Express). The approach uses a JSON hint file + HEAD-based probing instead of full directory listing.

P1 Issues

1. Inefficiency when hint file doesn't exist

In current_manifest_path (rust/lance-table/src/io/commit.rs:46-62), when read_version_hint_and_probe returns None quickly (e.g., new dataset, no hint file), the original resolve_version_from_listing future is cancelled and then re-invoked in the fallback path. This doubles the listing cost for datasets without a hint file.

Consider restructuring to avoid this:

// Start list, check hint, if hint fails continue with existing list
let list_fut = resolve_version_from_listing(object_store, base);
tokio::pin!(list_fut);

tokio::select! {
    biased;
    hint_result = read_version_hint_and_probe(object_store, base) => {
        if let Some(location) = hint_result {
            return Ok(location);
        }
        list_fut.await  // Continue with same future
    }
    list_result = &mut list_fut => list_result
}

2. block_in_place usage in write_version_hint

The pattern at line 152-156 spawns a task then immediately blocks on it:

let handle = tokio::spawn(async move { ... });
if sync_write {
    let _ = tokio::task::block_in_place(|| tokio::runtime::Handle::current().block_on(handle));
}

block_in_place doesn't work on single-threaded runtimes (panics). Consider using .await directly when sync_write is true, or documenting this limitation.

Minor Observations

  • The test test_commit_uses_version_hint_on_non_lexical_store is thorough and validates the optimization effectively using throttled stores.
  • JSON parsing in parse_version_from_json is simple but adequate since format is controlled.

🤖 Generated with Claude Code

touch-of-grey added a commit to touch-of-grey/lance that referenced this pull request May 13, 2026
On object stores where listing is not lexicographically ordered (e.g. S3
Express, the local filesystem), resolving the latest manifest version is
O(n) in the number of versions. After every successful commit on such a
store, write a small JSON file `_versions/latest_version_hint.json`
(`{"version":N}`); readers use it as a starting point and probe a few
higher versions with HEAD requests (O(k), k = versions added since the
hint was written), falling back to a full listing if the hint is missing
(older datasets) or stale, or if a transient object-store error makes the
probed range untrustworthy.

The hint is written/read only on non-lexically-ordered stores — on S3
Standard / GCS / Azure / DynamoDB / memory the ordered listing already
resolves the latest version in roughly one request. The write is awaited
as part of the commit (no fire-and-forget mode) and is best-effort:
failures are logged and ignored, since the hint only accelerates reads
and never affects correctness. Detached versions are never hinted.

`current_manifest_path` uses the hint for non-lexically-ordered, non-local
stores (the local filesystem keeps its single-directory-read fast path);
`CommitHandler::list_manifest_locations_since` (used by
`load_new_transactions`) follows the same strategy, with the gap-fill
HEADs bounded by `io_parallelism()` and a fallback to a single paginated
listing once a reader is more than 1000 versions behind.

Carries on lance-format#5997 / discussion lance-format#5947, and follows up on lance-format#6728 where moving
S3 Express to a version hint was raised.
touch-of-grey added a commit to touch-of-grey/lance that referenced this pull request May 19, 2026
On object stores where listing is not lexicographically ordered (e.g. S3
Express, the local filesystem), resolving the latest manifest version is
O(n) in the number of versions. After every successful commit on such a
store, write a small JSON file `_versions/latest_version_hint.json`
(`{"version":N}`); readers use it as a starting point and probe a few
higher versions with HEAD requests (O(k), k = versions added since the
hint was written), falling back to a full listing if the hint is missing
(older datasets) or stale, or if a transient object-store error makes the
probed range untrustworthy.

The hint is written/read only on non-lexically-ordered stores — on S3
Standard / GCS / Azure / DynamoDB / memory the ordered listing already
resolves the latest version in roughly one request. The write is awaited
as part of the commit (no fire-and-forget mode) and is best-effort:
failures are logged and ignored, since the hint only accelerates reads
and never affects correctness. Detached versions are never hinted.

`current_manifest_path` uses the hint for non-lexically-ordered, non-local
stores (the local filesystem keeps its single-directory-read fast path);
`CommitHandler::list_manifest_locations_since` (used by
`load_new_transactions`) follows the same strategy, with the gap-fill
HEADs bounded by `io_parallelism()` and a fallback to a single paginated
listing once a reader is more than 1000 versions behind.

Carries on lance-format#5997 / discussion lance-format#5947, and follows up on lance-format#6728 where moving
S3 Express to a version hint was raised.
jackye1995 pushed a commit to touch-of-grey/lance that referenced this pull request May 19, 2026
On object stores where listing is not lexicographically ordered (e.g. S3
Express, the local filesystem), resolving the latest manifest version is
O(n) in the number of versions. After every successful commit on such a
store, write a small JSON file `_versions/latest_version_hint.json`
(`{"version":N}`); readers use it as a starting point and probe a few
higher versions with HEAD requests (O(k), k = versions added since the
hint was written), falling back to a full listing if the hint is missing
(older datasets) or stale, or if a transient object-store error makes the
probed range untrustworthy.

The hint is written/read only on non-lexically-ordered stores — on S3
Standard / GCS / Azure / DynamoDB / memory the ordered listing already
resolves the latest version in roughly one request. The write is awaited
as part of the commit (no fire-and-forget mode) and is best-effort:
failures are logged and ignored, since the hint only accelerates reads
and never affects correctness. Detached versions are never hinted.

`current_manifest_path` uses the hint for non-lexically-ordered, non-local
stores (the local filesystem keeps its single-directory-read fast path);
`CommitHandler::list_manifest_locations_since` (used by
`load_new_transactions`) follows the same strategy, with the gap-fill
HEADs bounded by `io_parallelism()` and a fallback to a single paginated
listing once a reader is more than 1000 versions behind.

Carries on lance-format#5997 / discussion lance-format#5947, and follows up on lance-format#6728 where moving
S3 Express to a version hint was raised.
jackye1995 added a commit that referenced this pull request May 19, 2026
Carries on #5997 (and the benchmarking in discussion #5947), and follows
up on #6728 where moving S3 Express away from O(n) manifest listing to a
version hint was raised — picking that up here.

## What

On object stores where `list` is **not** lexicographically ordered (e.g.
S3 Express, the local filesystem), resolving the latest manifest version
is O(n) in the number of versions. To avoid this, after every successful
commit on such a store we write a small JSON file
`_versions/latest_version_hint.json` with content `{"version":N}`. A
reader then does a GET on the hint file plus a few HEAD probes (O(k),
where k = versions added since the hint was written), and falls back to
a full listing if the hint is missing (older datasets) or stale.

- The hint is written/read **only on non-lexically-ordered stores**. On
S3 Standard / GCS / Azure / OSS / Tencent / DynamoDB / memory the
ordered listing already resolves the latest version in roughly one
request, so the hint would only add a PUT per commit for nothing.
- `current_manifest_path` uses the hint for non-lexically-ordered,
non-local stores (the local filesystem keeps its existing
single-directory-read fast path);
`CommitHandler::list_manifest_locations_since` (used by
`load_new_transactions`) follows the same strategy.
- The hint write is **awaited** as part of the commit (no
fire-and-forget mode). It is best-effort: failures are logged and
ignored, since the hint only accelerates reads and never affects
correctness — readers always verify the hinted version and probe upward
from it. Detached versions are never written to the hint.
- A transient (non-`NotFound`) object-store error while probing abandons
the hint path so the caller falls back to a full listing rather than
trust a possibly-stale or incomplete result. The gap-fill HEADs are
bounded by `io_parallelism()`, and a far-behind reader (gap > 1000)
falls back to a single paginated listing.

## Differences from #5997

- Only the JSON hint format is kept (the alternative file-size-encoded
format and its env var are dropped).
- The fire-and-forget / async hint-write mode is removed — the hint is
always written synchronously, which keeps concurrent writes simpler with
no meaningful latency cost.
- The hint is gated to non-lexically-ordered stores, where it's actually
read.
- `current_manifest_path` picks one strategy based on the store rather
than racing a HEAD-probe against a listing, keeping IO behavior
deterministic.

A `manifest_commit` benchmark is included to measure commit/load latency
growth with many small fragments.

Co-Authored-By: Jack Ye <yezhaoqin@gmail.com>
@jackye1995 jackye1995 closed this May 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant