Skip to content

Support remote cache CDC#28437

Closed
tyler-french wants to merge 1 commit intobazelbuild:masterfrom
tyler-french:tfrench/chunked-remote-cache
Closed

Support remote cache CDC#28437
tyler-french wants to merge 1 commit intobazelbuild:masterfrom
tyler-french:tfrench/chunked-remote-cache

Conversation

@tyler-french
Copy link
Contributor

@tyler-french tyler-french commented Jan 26, 2026

TLDR: This PR enables support for content-defined chunking (FastCDC) for large uploads/downloads to remote cache, saving ~40% on storage and upload bandwidth, and making builds faster by deduplicating similar artifacts across builds.

RELNOTES[NEW]: Added --experimental_remote_cache_chunking flag to read and write large blobs to/from the remote cache in chunks. Requires server support.

Motivation

Actions like GoLink and CppLink produce very large output files that are often similar between builds. A small source change can cause a cache miss, wasting storage, bandwidth, and time on nearly-identical artifacts.

Content-Defined Chunking (CDC) addresses this by splitting files at content-determined cut points. Because cut points are derived from the file content itself, small changes, even ones that shift bytes around, tend to affect only a few chunks. This makes action outputs effectively incremental: even though the action must re-run, the upload, download, and storage costs shrink dramatically.

Results

Benchmarked across the last 50 commits of the BuildBuddy repo (server and client on the same host):

Scenario Upload Download RPCs Disk Cache Avg Build Time
chunking + disk cache 52.0 GB 0 B 626K 146.6 GB 55s
chunking, no disk cache 49.2 GB 343.2 GB 4.1M 54s
no chunking + disk cache 85.6 GB 0 B 273K 246.5 GB 100s
no chunking, no disk cache 89.7 GB 343.8 GB 2.5M 97s

Key takeaways:

  • ~40% less data uploaded (52 GB vs 90 GB)
  • ~40% smaller disk cache (147 GB vs 247 GB)
  • Download size is mostly unchanged (~0.2% increase) because we don't yet store downloaded chunks in the output base. Using a disk cache is recommended for full benefit; output-base chunk reuse is planned.
  • RPC count increases as expected since requests become smaller and more granular.
  • faster builds (depends on conditions, like cache async, compression, & network speed)

Additional benefits: better load balancing across distributed clusters (fewer long-running RPCs) and more granular retries on unstable networks.

Try It Out

Anyone can try chunking today using BuildBuddy:

  1. Sign up for a free account at buildbuddy.io
  2. Get an API key with write access
  3. Use the Bazel fork from [9.1.0] Support remote cache CDC #28903
  4. Build!
USE_BAZEL_VERSION="tyler-french/9.1.0-cdc" bazel build //... \
  --experimental_remote_cache_chunking \
  --remote_header=x-buildbuddy-cdc-enabled=true \
  --remote_cache=grpcs://remote.buildbuddy.io

How It Works

Write path:

  1. Check if blob exceeds the chunking threshold.
  2. Run FastCDC to compute chunk boundaries.
  3. Call FindMissingBlobs to identify which chunks the server already has.
  4. Upload only the missing chunks.
  5. Call SpliceBlob to register the blob-to-chunks mapping on the server.

Read path:

  1. Check if blob exceeds the chunking threshold.
  2. Call SplitBlob to get the chunk list for this blob.
  3. Download and reassemble the chunks.

If --disk_cache is enabled, previously downloaded chunks are served locally.

@tyler-french tyler-french force-pushed the tfrench/chunked-remote-cache branch from 8a45f14 to dbc1af6 Compare January 27, 2026 04:17
@tyler-french tyler-french force-pushed the tfrench/chunked-remote-cache branch 11 times, most recently from 789ab23 to 4349030 Compare January 28, 2026 06:00
@tyler-french tyler-french force-pushed the tfrench/chunked-remote-cache branch 12 times, most recently from 6e3f676 to e795b34 Compare February 3, 2026 04:40
@tyler-french tyler-french changed the title PROTOTYPE/WIP: support CDC WIP: Support cache chunking Feb 3, 2026
@tyler-french tyler-french changed the title WIP: Support cache chunking WIP: Support cache chunking with FastCDC Feb 3, 2026
@tyler-french tyler-french force-pushed the tfrench/chunked-remote-cache branch from e795b34 to e02742a Compare February 3, 2026 04:56
@tyler-french tyler-french force-pushed the tfrench/chunked-remote-cache branch from 51d119f to aaeb1b9 Compare February 20, 2026 19:35
@tyler-french tyler-french force-pushed the tfrench/chunked-remote-cache branch 2 times, most recently from 9d508a6 to 23b5f14 Compare February 22, 2026 01:41
tyler-french added a commit to buildbuddy-io/buildbuddy that referenced this pull request Feb 23, 2026
This is needed to run `bb print` to see Split/Splice calls, to test
bazelbuild/bazel#28437

BB print just reads the grpc log directly from whats in this file, so
updating this file is sufficient.
@tyler-french
Copy link
Contributor Author

@tjgq @fmeum If possible (depends on timing), I would like to include this in 8 and 9 LTS versions. It seems like 8.6 is close, so maybe we can target 8.7.

It doesn't patch cleanly on 8.X, so I'll need to create a separate PR once its ready: please let me know if you'd like me to do that.

Thanks again for all the help with this!

tyler-french added a commit to buildbuddy-io/buildbuddy that referenced this pull request Mar 2, 2026
Now anyone can try out chunking using Buildbuddy to test:
1. Sign up for a trial/free account at https://www.buildbuddy.io/
2. Get a token with write access
3. Use Bazel Fork from bazelbuild/bazel#28437
4. Build!
```
USE_BAZEL_VERSION="tyler-french/9.1.0-cdc" bazel build //... \
  --disk_cache= \
  --experimental_remote_cache_chunking \
  --remote_header=x-buildbuddy-cdc-enabled=true \
  --check_direct_dependencies=off \
  --remote_cache=grpcs://remote.buildbuddy.io
```
@tyler-french tyler-french force-pushed the tfrench/chunked-remote-cache branch 4 times, most recently from 9da9f90 to d84edef Compare March 4, 2026 18:25
@tyler-french tyler-french requested a review from tjgq March 4, 2026 19:54
@tyler-french
Copy link
Contributor Author

@tjgq I have seen some flakiness on Windows for:

//src/test/java/com/google/devtools/build/lib/authandtls/credentialhelper:credentialhelper FAILED in 3 out of 3 in 51.7s

@tjgq
Copy link
Contributor

tjgq commented Mar 5, 2026

@tjgq I have seen some flakiness on Windows for:

//src/test/java/com/google/devtools/build/lib/authandtls/credentialhelper:credentialhelper FAILED in 3 out of 3 in 51.7s

Thanks for the heads up - I'll see what I can do about it.

Copy link
Contributor

@tjgq tjgq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I'll import this one myself.

@tyler-french
Copy link
Contributor Author

Rebased again

@iancha1992
Copy link
Member

@bazel-io fork 8.7.0

@iancha1992
Copy link
Member

@bazel-io fork 9.1.0

@tjgq
Copy link
Contributor

tjgq commented Mar 17, 2026

This will be submitted momentarily, but with some changes relative to the state of this PR.

The most important modification is that ChunkingConfig/ChunkedDownloader/ChunkedUploader are now instantiated lazily on first use by the CombinedCache, to avoid blocking on the server capabilities on startup. The rest is linter appeasement (static imports, missing @Nullable annotations, Cdc instead of CDC in class names).

We also noticed two opportunities for optimization (as a followup):

  1. uploadFile should probably go through casUploadCache in the chunked case, to avoid network I/O entirely in the case where all chunks are known to have already been uploaded
  2. downloadAndReassembleChunks could download blobs in parallel

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

team-Remote-Exec Issues and PRs for the Execution (Remote) team

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants