Skip to content

gateway: fix exec process lifecycle ordering#6531

Merged
tonistiigi merged 2 commits intomoby:masterfrom
tonistiigi:exec-order-fix
Mar 20, 2026
Merged

gateway: fix exec process lifecycle ordering#6531
tonistiigi merged 2 commits intomoby:masterfrom
tonistiigi:exec-order-fix

Conversation

@tonistiigi
Copy link
Copy Markdown
Member

Send Started before any async Exit/Done paths to preserve protocol order. Close all tracked processIO pipe endpoints during Close so pio.done can always drain and avoid hangs in gateway exec teardown.

Hope this fixes some flakiness/hangs we sometimes see in exec tests in CI.

}
}
for fd, w := range pio.processWriters {
delete(pio.processWriters, fd)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://github.com/moby/buildkit/actions/runs/22342178945/job/64648494627#step:8:1223

    sandbox.go:205: time="2026-02-24T08:22:44Z" level=debug msg="|<--- NewContainer 5csf58g7gk7jgcoi687dpr6k2" spanID=3af04cd4fa4357d3 traceID=5f061102d7b4ceebbd34d62ee276bcb5
    sandbox.go:205: time="2026-02-24T08:22:44Z" level=debug msg="|<--- Init Message 5csf58g7gk7jgcoi687dpr6k2:nfngt1u9833tq4yk4fuen8xhd" spanID=67be47f640a62501 traceID=67bc755b8f76225c7044896b1054f0fb
    sandbox.go:205: time="2026-02-24T08:22:44Z" level=debug msg="Starting new container for 5csf58g7gk7jgcoi687dpr6k2 with args: [\"sh\"]"
    sandbox.go:205: time="2026-02-24T08:22:44Z" level=debug msg="returning network namespace it86wk5ub34v0a8u7gefehkyd from pool"
    sandbox.go:205: time="2026-02-24T08:22:44Z" level=debug msg="|---> Started Message 5csf58g7gk7jgcoi687dpr6k2:nfngt1u9833tq4yk4fuen8xhd" spanID=67be47f640a62501 traceID=67bc755b8f76225c7044896b1054f0fb
    sandbox.go:205: time="2026-02-24T08:22:44Z" level=debug msg="|---> File Message 5csf58g7gk7jgcoi687dpr6k2:nfngt1u9833tq4yk4fuen8xhd, fd=1, 2 bytes" spanID=67be47f640a62501 traceID=67bc755b8f76225c7044896b1054f0fb
    sandbox.go:205: time="2026-02-24T08:22:44Z" level=debug msg="|<--- Init Message 5csf58g7gk7jgcoi687dpr6k2:j1n4pirkltsmzt7frqmipuoo6" spanID=67be47f640a62501 traceID=67bc755b8f76225c7044896b1054f0fb
    sandbox.go:205: time="2026-02-24T08:22:44Z" level=debug msg="|---> Started Message 5csf58g7gk7jgcoi687dpr6k2:j1n4pirkltsmzt7frqmipuoo6" spanID=67be47f640a62501 traceID=67bc755b8f76225c7044896b1054f0fb
    sandbox.go:205: time="2026-02-24T08:22:44Z" level=debug msg="Execing into container 5csf58g7gk7jgcoi687dpr6k2 with args: [\"cat\" \"/data\"]"
    sandbox.go:205: time="2026-02-24T08:22:44Z" level=debug msg="|---> File Message 5csf58g7gk7jgcoi687dpr6k2:j1n4pirkltsmzt7frqmipuoo6, fd=1, 26 bytes" spanID=67be47f640a62501 traceID=67bc755b8f76225c7044896b1054f0fb
    sandbox.go:205: time="2026-02-24T08:22:44Z" level=debug msg="|---> Exit Message 5csf58g7gk7jgcoi687dpr6k2:j1n4pirkltsmzt7frqmipuoo6, code=0, error=%!s(<nil>)" spanID=67be47f640a62501 traceID=67bc755b8f76225c7044896b1054f0fb
    sandbox.go:205: time="2026-02-24T08:22:44Z" level=debug msg="|---> Done Message 5csf58g7gk7jgcoi687dpr6k2:j1n4pirkltsmzt7frqmipuoo6" spanID=67be47f640a62501 traceID=67bc755b8f76225c7044896b1054f0fb
    sandbox.go:205: time="2026-02-24T08:22:44Z" level=debug msg="|---> File Message 5csf58g7gk7jgcoi687dpr6k2:j1n4pirkltsmzt7frqmipuoo6, fd=1, EOF" spanID=67be47f640a62501 traceID=67bc755b8f76225c7044896b1054f0fb
    sandbox.go:205: time="2026-02-24T08:22:44Z" level=debug msg="|<--- File Message 5csf58g7gk7jgcoi687dpr6k2:nfngt1u9833tq4yk4fuen8xhd, fd=0, 7 bytes" spanID=67be47f640a62501 traceID=67bc755b8f76225c7044896b1054f0fb
    sandbox.go:205: time="2026-02-24T08:22:44Z" level=debug msg="|---> File Message 5csf58g7gk7jgcoi687dpr6k2:nfngt1u9833tq4yk4fuen8xhd, fd=1, 8 bytes" spanID=67be47f640a62501 traceID=67bc755b8f76225c7044896b1054f0fb
    sandbox.go:205: time="2026-02-24T08:22:44Z" level=debug msg="|---> Exit Message 5csf58g7gk7jgcoi687dpr6k2:nfngt1u9833tq4yk4fuen8xhd, code=0, error=%!s(<nil>)" spanID=67be47f640a62501 traceID=67bc755b8f76225c7044896b1054f0fb
    sandbox.go:205: time="2026-02-24T08:22:44Z" level=debug msg="|---> Done Message 5csf58g7gk7jgcoi687dpr6k2:nfngt1u9833tq4yk4fuen8xhd" spanID=67be47f640a62501 traceID=67bc755b8f76225c7044896b1054f0fb
    sandbox.go:205: time="2026-02-24T08:22:44Z" level=debug msg="|---> File Message 5csf58g7gk7jgcoi687dpr6k2:nfngt1u9833tq4yk4fuen8xhd, fd=2, EOF" spanID=67be47f640a62501 traceID=67bc755b8f76225c7044896b1054f0fb
    sandbox.go:205: time="2026-02-24T08:22:44Z" level=debug msg="|---> File Message 5csf58g7gk7jgcoi687dpr6k2:nfngt1u9833tq4yk4fuen8xhd, fd=1, EOF" spanID=67be47f640a62501 traceID=67bc755b8f76225c7044896b1054f0fb
    sandbox.go:205: time="2026-02-24T08:22:44Z" level=debug msg="|<--- ReleaseContainer 5csf58g7gk7jgcoi687dpr6k2" spanID=8b5a1eae515c69eb traceID=865855f6e5868780bffbcc583d63ed69
    sandbox.go:205: time="2026-02-24T08:22:44Z" level=debug msg="|<--- File Message 5csf58g7gk7jgcoi687dpr6k2:nfngt1u9833tq4yk4fuen8xhd, fd=0, EOF" spanID=67be47f640a62501 traceID=67bc755b8f76225c7044896b1054f0fb
    sandbox.go:205: time="2026-02-24T08:22:44Z" level=debug msg="returning network namespace it86wk5ub34v0a8u7gefehkyd from pool" span="sh -c echo cyo5arhgx70su9rgnxqtv28vb > /data && echo cyo5arhgx70su9rgnxqtv28vb > /rw/data && fail" spanID=99e8311a5093af6f traceID=0cfd195b1cbac41cc54c8addd9374e68

Seems related to Done being emitted before all per-fd EOF messages for a process from logs above. Next Init arrives, then hang.

Pushed extra commit so writers are closed but left tracked until the output goroutines send EOF and remove them.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still deadlock

@tonistiigi tonistiigi closed this Feb 24, 2026
@Inonameraja

This comment was marked as spam.

The proc.Wait goroutine could send Exit and call
pio.Close() before the Started message was sent and
output reader goroutines were spawned. pio.Close()
deletes serverReaders, causing the range loop to see
an empty map, so processWriters were never cleaned up,
pio.done never closed, and the client hung waiting for
the Done message.

Gate the proc.Wait goroutine on a startedSent channel
that is closed after Started is sent and output readers
are set up.

Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com>
The Close() method called Send() which could race with msgs channel
closure, causing "send on closed channel" panic. Simplify by removing
closeOnce and the close(b.msgs) call — the done channel already
provides shutdown signaling. After close(b.done), Send's <-b.done
case fires immediately, so the unbuffered msgs channel is never
selected.

Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com>
@tonistiigi tonistiigi marked this pull request as ready for review March 10, 2026 05:14
@tonistiigi
Copy link
Copy Markdown
Member Author

@crazy-max 3 green runs

Copy link
Copy Markdown
Member

@crazy-max crazy-max left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@crazy-max crazy-max modified the milestones: v0.28.0, v0.29.0 Mar 12, 2026
@tonistiigi tonistiigi merged commit 676a48c into moby:master Mar 20, 2026
189 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants