exporter: fix snapshot GC race during image unpack#6558
Merged
tonistiigi merged 1 commit intomoby:masterfrom Mar 9, 2026
Merged
exporter: fix snapshot GC race during image unpack#6558tonistiigi merged 1 commit intomoby:masterfrom
tonistiigi merged 1 commit intomoby:masterfrom
Conversation
Use ApplyLayers instead of per-layer ApplyLayer loop to allow recursive parent rebuild when GC collects a parent snapshot between Stat and Prepare calls. Pre-lease the top chain ID snapshot before calling ApplyLayers so that GC cannot collect it during the Stat shortcut path which does not add snapshots to the lease. Fixes moby#6521 Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com>
stasadev
approved these changes
Mar 6, 2026
stasadev
left a comment
There was a problem hiding this comment.
Thank you very much @tonistiigi !
I confirm that this fix works. I ran the repro https://github.com/rfay/buildx-6521-repro for 30 minutes without any errors (it usually failed after ~3-5 minutes before).
I tested in on Arch-based Linux by patching the docker package with:
diff --git a/PKGBUILD b/PKGBUILD
index 1f05678..945c246 100644
--- a/PKGBUILD
+++ b/PKGBUILD
@@ -20,11 +20,13 @@ _TINI_COMMIT=de40ad007797e0dcd8b7126f27bb87401d224240
source=("git+https://github.com/docker/cli.git#tag=v$pkgver"
"git+https://github.com/moby/moby.git#tag=docker-v$pkgver"
"git+https://github.com/krallin/tini.git#commit=$_TINI_COMMIT"
- "$pkgname.sysusers")
+ "$pkgname.sysusers"
+ "https://patch-diff.githubusercontent.com/raw/moby/buildkit/pull/6558.patch")
sha256sums=('b3e8d552827376a3ac74ac48d6a31cba7d61d30330f562f0db833e62c739bcc2'
'654584d3b1b3a890a9af4aee9e23182d10f7c0b15bcff92d2d211a9892ed9e63'
'28a6641d508f60d47315efb3c85d97360188750a45bd6d3c8737d3f1a2b44121'
- '541826011a9836d05a2f42293d5f1beadf2ca8d89fb604487d61a013505678eb')
+ '541826011a9836d05a2f42293d5f1beadf2ca8d89fb604487d61a013505678eb'
+ 'SKIP')
# create a fake go path directory and pushd into it
# $1 real directory
@@ -40,6 +42,10 @@ _fake_gopath_popd() {
popd >/dev/null
}
+prepare() {
+ patch -Np1 -d moby/vendor/github.com/moby/buildkit < 6558.patch
+}
+
build() {
### check my mistakes on commit version
echo 'Checking commit mismatch'|
I ran it for another 3 hours and everything was fine. |
crazy-max
approved these changes
Mar 6, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Use
ApplyLayersinstead of per layerApplyLayerloop to allow recursive parent rebuild when GC collects a parent snapshot between Stat and Prepare calls.Pre-lease the top chain ID snapshot before calling
ApplyLayersso that GC cannot collect it during the Stat shortcut path, which does not add snapshots to the lease.Fixes #6521
@rfay @stasadev @dmcg
Afaics, both containerd
rootfs.ApplyLayerandrootfs.ApplyLayersare unsafe. The former can cause this "parent snapshot does not exist" that was reported here, and the second can return a chainid that may already be deleted by the time the function returns.