Skip to content

perf(CDC): upload local/remote at the same time#11425

Merged
tyler-french merged 1 commit intomasterfrom
tfrench/sametime
Mar 13, 2026
Merged

perf(CDC): upload local/remote at the same time#11425
tyler-french merged 1 commit intomasterfrom
tfrench/sametime

Conversation

@tyler-french
Copy link
Contributor

@tyler-french tyler-french commented Feb 26, 2026

From the task list: https://github.com/buildbuddy-io/buildbuddy-internal/issues/6426

This creates a new uploader. If enabled (via experiment), this does not require the chunk to be stored locally, instead, we run the upload async using a FMB call and a max of 8 batch uploads.

This prevents re-opening the file and makes the upload start immediately:

goos: linux
goarch: amd64
cpu: AMD Ryzen 9 9950X3D 16-Core Processor          
                                      │ /tmp/write_chunked_off.txt │      /tmp/write_chunked_on.txt      │
                                      │           sec/op           │   sec/op     vs base                │
WriteChunkedWithDedup/overlap=100%-32                  64.63m ± 0%   64.47m ± 0%   -0.24% (p=0.015 n=10)
WriteChunkedWithDedup/overlap=75%-32                   84.08m ± 1%   80.99m ± 1%   -3.67% (p=0.000 n=10)
WriteChunkedWithDedup/overlap=50%-32                  100.90m ± 2%   85.99m ± 1%  -14.77% (p=0.000 n=10)
WriteChunkedWithDedup/overlap=25%-32                  104.59m ± 1%   86.84m ± 1%  -16.97% (p=0.000 n=10)
geomean                                                87.02m        79.02m        -9.19%

                                      │ /tmp/write_chunked_off.txt │       /tmp/write_chunked_on.txt       │
                                      │            B/op            │     B/op       vs base                │
WriteChunkedWithDedup/overlap=100%-32                 26.89Mi ± 5%   27.85Mi ±  7%        ~ (p=0.143 n=10)
WriteChunkedWithDedup/overlap=75%-32                  32.47Mi ± 8%   34.37Mi ±  9%        ~ (p=0.280 n=10)
WriteChunkedWithDedup/overlap=50%-32                  33.69Mi ± 9%   39.20Mi ± 10%  +16.34% (p=0.005 n=10)
WriteChunkedWithDedup/overlap=25%-32                  38.78Mi ± 9%   44.04Mi ±  6%  +13.58% (p=0.000 n=10)
geomean                                               32.68Mi        35.86Mi         +9.71%

                                      │ /tmp/write_chunked_off.txt │      /tmp/write_chunked_on.txt      │
                                      │         allocs/op          │  allocs/op   vs base                │
WriteChunkedWithDedup/overlap=100%-32                  4.497k ± 0%   4.880k ± 0%   +8.51% (p=0.000 n=10)
WriteChunkedWithDedup/overlap=75%-32                   8.204k ± 3%   6.487k ± 2%  -20.93% (p=0.000 n=10)
WriteChunkedWithDedup/overlap=50%-32                  10.986k ± 1%   7.420k ± 3%  -32.45% (p=0.000 n=10)
WriteChunkedWithDedup/overlap=25%-32                  13.352k ± 2%   8.547k ± 4%  -35.98% (p=0.000 n=10)
geomean                                                8.577k        6.694k       -21.96%
~/bb/buildbuddy tfrench/sametime ❯                 

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the chunked write path in ByteStreamServerProxy to upload each chunk to local cache and remote cache concurrently as chunks are produced, using a per-chunk FindMissingBlobs call to skip remote uploads for chunks already present. This is intended to simplify the flow and avoid re-opening/reading chunk data from local cache when uploading to remote.

Changes:

  • Perform per-chunk parallel local write + remote (FindMissingBlobs + conditional upload) instead of “write all locally, then batch FMB + upload missing.”
  • Remove the configurable missing-chunk upload concurrency flag and the uploadMissingChunks / uploadChunk helpers.
  • Compute dedupe metrics during chunk processing rather than after a batch FindMissing response.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@tyler-french tyler-french force-pushed the tfrench/sametime branch 2 times, most recently from c82a647 to 38bada4 Compare February 26, 2026 19:12
@tyler-french tyler-french marked this pull request as draft February 26, 2026 19:17
@tyler-french tyler-french force-pushed the tfrench/sametime branch 6 times, most recently from 73a234a to 4ced674 Compare February 26, 2026 20:35
@tyler-french tyler-french marked this pull request as ready for review February 26, 2026 21:07
@tyler-french tyler-french force-pushed the tfrench/sametime branch 2 times, most recently from ee8864a to a4a2f63 Compare February 26, 2026 21:12
@tyler-french
Copy link
Contributor Author

Having trouble seeing perf gains so going to hold off until I can.

@tyler-french tyler-french marked this pull request as draft February 26, 2026 21:40
tyler-french added a commit that referenced this pull request Feb 27, 2026
Want to get some metrics before
#11425, to see if
there's better ways to tune this.

This adds tracing and some other duration metrics
@tyler-french tyler-french force-pushed the tfrench/sametime branch 6 times, most recently from 51cb9ba to 09220b9 Compare March 3, 2026 18:03
@tyler-french tyler-french force-pushed the tfrench/sametime branch 2 times, most recently from 5fb3cce to 53af603 Compare March 10, 2026 13:38
@tyler-french tyler-french changed the title perf: upload local/remote at the same time perf(CDC): upload local/remote at the same time Mar 11, 2026
@tyler-french tyler-french marked this pull request as ready for review March 11, 2026 15:01
@tyler-french tyler-french requested a review from vanja-p March 11, 2026 15:05
@tyler-french tyler-french requested a review from Copilot March 11, 2026 17:35
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

DigestFunction: repb.DigestFunction_BLAKE3,
})
require.NoError(t, err)
require.Greater(t, len(splitResp.GetChunkDigests()), *chunkUploadConcurrency)
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assertion assumes *chunkUploadConcurrency is smaller than the number of produced chunks; if the test binary is run with -cache_proxy.chunk_upload_concurrency set higher, it will fail unrelated to the batching logic. Set cache_proxy.chunk_upload_concurrency to a fixed value within the test (and/or choose input size based on that value) to avoid flag-dependent failures.

Suggested change
require.Greater(t, len(splitResp.GetChunkDigests()), *chunkUploadConcurrency)
chunkCount := len(splitResp.GetChunkDigests())
require.Greater(t, chunkCount, 0)
if *chunkUploadConcurrency >= chunkCount {
t.Skipf("test requires cache_proxy.chunk_upload_concurrency (%d) to be smaller than produced chunk count (%d)", *chunkUploadConcurrency, chunkCount)
}
require.Greater(t, chunkCount, *chunkUploadConcurrency)

Copilot uses AI. Check for mistakes.
Comment on lines 826 to 829
poolBuf := s.bufPool.Get(chunking.MaxChunkSizeBytes())
_, compressSpn := tracing.StartNamedSpan(chunkCtx, "CompressZstd")
compressedData := compression.CompressZstd(compressBuf, chunkData)
compressedData := compression.CompressZstd(poolBuf, chunkData)
compressSpn.End()
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

poolBuf is allocated at MaxChunkSizeBytes(), which equals the max input chunk size. Zstd can slightly expand data, and compression.CompressZstd will allocate a new buffer when dst is too small. In that case, the uploader still retains poolBuf (unused) until upload completion, increasing memory and partially defeating pooling. Consider sizing the buffer to the zstd max-encoded size (or detecting when CompressZstd allocates and returning poolBuf immediately / pooling the actual compressed buffer).

Copilot uses AI. Check for mistakes.
Comment on lines +1048 to +1052
fmbG *errgroup.Group
fmbCtx context.Context
batchG *errgroup.Group
batchCtx context.Context

Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

chunkUploader uses two separate errgroups/contexts (fmbCtx and batchCtx). As a result, an error in batch uploads won’t cancel in-flight / future FindMissingBlobs calls (and vice versa), and flush() may still wait for ongoing uploads even if an FMB error already makes the overall operation fail. Consider using a single shared cancelable context (or wiring cancellation between the two groups) so any error cancels all outstanding work and flush() can return promptly.

Copilot uses AI. Check for mistakes.
Comment on lines +231 to +233
sort.Ints(fmbSizes)
require.Equal(t, []int{1, *chunkUploadConcurrency}, fmbSizes)
require.Len(t, fmbDigests, len(uniqueChunks), "expected only unique digests to hit FindMissingBlobs")
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test’s expected FindMissingBlobs grouping depends on the global cache_proxy.chunk_upload_concurrency flag value; running tests with a different flag value (or if another test modifies the flag) can make the assertion fail even when the uploader is correct. Set cache_proxy.chunk_upload_concurrency explicitly in the test (and size uniqueChunks accordingly) to keep it deterministic.

Copilot uses AI. Check for mistakes.
@tyler-french tyler-french merged commit 0fa7af4 into master Mar 13, 2026
13 checks passed
@tyler-french tyler-french deleted the tfrench/sametime branch March 13, 2026 01:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants