perf(CDC): upload local/remote at the same time by tyler-french · Pull Request #11425 · buildbuddy-io/buildbuddy

tyler-french · 2026-02-26T16:57:55Z

From the task list: https://github.com/buildbuddy-io/buildbuddy-internal/issues/6426

This creates a new uploader. If enabled (via experiment), this does not require the chunk to be stored locally, instead, we run the upload async using a FMB call and a max of 8 batch uploads.

This prevents re-opening the file and makes the upload start immediately:

goos: linux
goarch: amd64
cpu: AMD Ryzen 9 9950X3D 16-Core Processor          
                                      │ /tmp/write_chunked_off.txt │      /tmp/write_chunked_on.txt      │
                                      │           sec/op           │   sec/op     vs base                │
WriteChunkedWithDedup/overlap=100%-32                  64.63m ± 0%   64.47m ± 0%   -0.24% (p=0.015 n=10)
WriteChunkedWithDedup/overlap=75%-32                   84.08m ± 1%   80.99m ± 1%   -3.67% (p=0.000 n=10)
WriteChunkedWithDedup/overlap=50%-32                  100.90m ± 2%   85.99m ± 1%  -14.77% (p=0.000 n=10)
WriteChunkedWithDedup/overlap=25%-32                  104.59m ± 1%   86.84m ± 1%  -16.97% (p=0.000 n=10)
geomean                                                87.02m        79.02m        -9.19%

                                      │ /tmp/write_chunked_off.txt │       /tmp/write_chunked_on.txt       │
                                      │            B/op            │     B/op       vs base                │
WriteChunkedWithDedup/overlap=100%-32                 26.89Mi ± 5%   27.85Mi ±  7%        ~ (p=0.143 n=10)
WriteChunkedWithDedup/overlap=75%-32                  32.47Mi ± 8%   34.37Mi ±  9%        ~ (p=0.280 n=10)
WriteChunkedWithDedup/overlap=50%-32                  33.69Mi ± 9%   39.20Mi ± 10%  +16.34% (p=0.005 n=10)
WriteChunkedWithDedup/overlap=25%-32                  38.78Mi ± 9%   44.04Mi ±  6%  +13.58% (p=0.000 n=10)
geomean                                               32.68Mi        35.86Mi         +9.71%

                                      │ /tmp/write_chunked_off.txt │      /tmp/write_chunked_on.txt      │
                                      │         allocs/op          │  allocs/op   vs base                │
WriteChunkedWithDedup/overlap=100%-32                  4.497k ± 0%   4.880k ± 0%   +8.51% (p=0.000 n=10)
WriteChunkedWithDedup/overlap=75%-32                   8.204k ± 3%   6.487k ± 2%  -20.93% (p=0.000 n=10)
WriteChunkedWithDedup/overlap=50%-32                  10.986k ± 1%   7.420k ± 3%  -32.45% (p=0.000 n=10)
WriteChunkedWithDedup/overlap=25%-32                  13.352k ± 2%   8.547k ± 4%  -35.98% (p=0.000 n=10)
geomean                                                8.577k        6.694k       -21.96%
~/bb/buildbuddy tfrench/sametime ❯

Copilot

Pull request overview

This PR updates the chunked write path in ByteStreamServerProxy to upload each chunk to local cache and remote cache concurrently as chunks are produced, using a per-chunk FindMissingBlobs call to skip remote uploads for chunks already present. This is intended to simplify the flow and avoid re-opening/reading chunk data from local cache when uploading to remote.

Changes:

Perform per-chunk parallel local write + remote (FindMissingBlobs + conditional upload) instead of “write all locally, then batch FMB + upload missing.”
Remove the configurable missing-chunk upload concurrency flag and the uploadMissingChunks / uploadChunk helpers.
Compute dedupe metrics during chunk processing rather than after a batch FindMissing response.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

enterprise/server/byte_stream_server_proxy/byte_stream_server_proxy.go

tyler-french · 2026-02-26T21:40:35Z

Having trouble seeing perf gains so going to hold off until I can.

Want to get some metrics before #11425, to see if there's better ways to tune this. This adds tracing and some other duration metrics

Copilot

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-11T17:40:42Z

enterprise/server/byte_stream_server_proxy/byte_stream_server_proxy_test.go

+		DigestFunction: repb.DigestFunction_BLAKE3,
+	})
+	require.NoError(t, err)
+	require.Greater(t, len(splitResp.GetChunkDigests()), *chunkUploadConcurrency)


This assertion assumes *chunkUploadConcurrency is smaller than the number of produced chunks; if the test binary is run with -cache_proxy.chunk_upload_concurrency set higher, it will fail unrelated to the batching logic. Set cache_proxy.chunk_upload_concurrency to a fixed value within the test (and/or choose input size based on that value) to avoid flag-dependent failures.

Suggested change

require.Greater(t, len(splitResp.GetChunkDigests()), *chunkUploadConcurrency)

chunkCount := len(splitResp.GetChunkDigests())

require.Greater(t, chunkCount, 0)

if *chunkUploadConcurrency >= chunkCount {

t.Skipf("test requires cache_proxy.chunk_upload_concurrency (%d) to be smaller than produced chunk count (%d)", *chunkUploadConcurrency, chunkCount)

}

require.Greater(t, chunkCount, *chunkUploadConcurrency)

Copilot · 2026-03-11T17:40:42Z

enterprise/server/byte_stream_server_proxy/byte_stream_server_proxy.go

+		poolBuf := s.bufPool.Get(chunking.MaxChunkSizeBytes())
 		_, compressSpn := tracing.StartNamedSpan(chunkCtx, "CompressZstd")
-		compressedData := compression.CompressZstd(compressBuf, chunkData)
+		compressedData := compression.CompressZstd(poolBuf, chunkData)
 		compressSpn.End()


poolBuf is allocated at MaxChunkSizeBytes(), which equals the max input chunk size. Zstd can slightly expand data, and compression.CompressZstd will allocate a new buffer when dst is too small. In that case, the uploader still retains poolBuf (unused) until upload completion, increasing memory and partially defeating pooling. Consider sizing the buffer to the zstd max-encoded size (or detecting when CompressZstd allocates and returning poolBuf immediately / pooling the actual compressed buffer).

Copilot · 2026-03-11T17:40:42Z

enterprise/server/byte_stream_server_proxy/byte_stream_server_proxy.go

+	fmbG     *errgroup.Group
+	fmbCtx   context.Context
+	batchG   *errgroup.Group
+	batchCtx context.Context
+


chunkUploader uses two separate errgroups/contexts (fmbCtx and batchCtx). As a result, an error in batch uploads won’t cancel in-flight / future FindMissingBlobs calls (and vice versa), and flush() may still wait for ongoing uploads even if an FMB error already makes the overall operation fail. Consider using a single shared cancelable context (or wiring cancellation between the two groups) so any error cancels all outstanding work and flush() can return promptly.

Copilot · 2026-03-11T17:40:43Z

enterprise/server/byte_stream_server_proxy/byte_stream_server_proxy_test.go

+	sort.Ints(fmbSizes)
+	require.Equal(t, []int{1, *chunkUploadConcurrency}, fmbSizes)
+	require.Len(t, fmbDigests, len(uniqueChunks), "expected only unique digests to hit FindMissingBlobs")


This test’s expected FindMissingBlobs grouping depends on the global cache_proxy.chunk_upload_concurrency flag value; running tests with a different flag value (or if another test modifies the flag) can make the assertion fail even when the uploader is correct. Set cache_proxy.chunk_upload_concurrency explicitly in the test (and size uniqueChunks accordingly) to keep it deterministic.

enterprise/server/byte_stream_server_proxy/byte_stream_server_proxy.go

tyler-french requested review from Copilot and vanja-p February 26, 2026 16:57

Copilot started reviewing on behalf of tyler-french February 26, 2026 16:58 View session

Copilot AI reviewed Feb 26, 2026

View reviewed changes

enterprise/server/byte_stream_server_proxy/byte_stream_server_proxy.go Show resolved Hide resolved

vanja-p approved these changes Feb 26, 2026

View reviewed changes

enterprise/server/byte_stream_server_proxy/byte_stream_server_proxy.go Outdated Show resolved Hide resolved

tyler-french force-pushed the tfrench/sametime branch 2 times, most recently from c82a647 to 38bada4 Compare February 26, 2026 19:12

tyler-french marked this pull request as draft February 26, 2026 19:17

tyler-french force-pushed the tfrench/sametime branch 6 times, most recently from 73a234a to 4ced674 Compare February 26, 2026 20:35

tyler-french marked this pull request as ready for review February 26, 2026 21:07

tyler-french force-pushed the tfrench/sametime branch 2 times, most recently from ee8864a to a4a2f63 Compare February 26, 2026 21:12

tyler-french marked this pull request as draft February 26, 2026 21:40

tyler-french mentioned this pull request Feb 27, 2026

chore(cache_proxy): add tracing & metrics to chunk write #11442

Merged

tyler-french added a commit that referenced this pull request Feb 27, 2026

chore(cache_proxy): add tracing & metrics to chunk write (#11442)

e0f28b7

Want to get some metrics before #11425, to see if there's better ways to tune this. This adds tracing and some other duration metrics

tyler-french force-pushed the tfrench/sametime branch 6 times, most recently from 51cb9ba to 09220b9 Compare March 3, 2026 18:03

tyler-french force-pushed the tfrench/sametime branch 2 times, most recently from 5fb3cce to 53af603 Compare March 10, 2026 13:38

tyler-french changed the title ~~perf: upload local/remote at the same time~~ perf(CDC): upload local/remote at the same time Mar 11, 2026

tyler-french force-pushed the tfrench/sametime branch from 53af603 to 8d06eca Compare March 11, 2026 15:00

tyler-french marked this pull request as ready for review March 11, 2026 15:01

tyler-french requested a review from vanja-p March 11, 2026 15:05

tyler-french force-pushed the tfrench/sametime branch from 8d06eca to faf3f01 Compare March 11, 2026 16:16

tyler-french requested a review from Copilot March 11, 2026 17:35

Copilot started reviewing on behalf of tyler-french March 11, 2026 17:35 View session

Copilot AI reviewed Mar 11, 2026

View reviewed changes

vanja-p reviewed Mar 11, 2026

View reviewed changes

tyler-french force-pushed the tfrench/sametime branch from faf3f01 to 5efe29e Compare March 11, 2026 20:55

perf: upload local/remote at the same time

fcf47ee

tyler-french force-pushed the tfrench/sametime branch from 5efe29e to fcf47ee Compare March 13, 2026 01:36

tyler-french merged commit 0fa7af4 into master Mar 13, 2026
13 checks passed

tyler-french deleted the tfrench/sametime branch March 13, 2026 01:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(CDC): upload local/remote at the same time#11425

perf(CDC): upload local/remote at the same time#11425
tyler-french merged 1 commit intomasterfrom
tfrench/sametime

tyler-french commented Feb 26, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

tyler-french commented Feb 26, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 11, 2026

Uh oh!

Copilot AI Mar 11, 2026

Uh oh!

Copilot AI Mar 11, 2026

Uh oh!

Copilot AI Mar 11, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

-	require.Greater(t, len(splitResp.GetChunkDigests()), *chunkUploadConcurrency)
+	chunkCount := len(splitResp.GetChunkDigests())
+	require.Greater(t, chunkCount, 0)
+	if *chunkUploadConcurrency >= chunkCount {
+		t.Skipf("test requires cache_proxy.chunk_upload_concurrency (%d) to be smaller than produced chunk count (%d)", *chunkUploadConcurrency, chunkCount)
+	}
+	require.Greater(t, chunkCount, *chunkUploadConcurrency)

Conversation

tyler-french commented Feb 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

tyler-french commented Feb 26, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

tyler-french commented Feb 26, 2026 •

edited

Loading