Skip to content

Hotfix optout of bake builder in CI#16464

Merged
stevejalim merged 2 commits into
mozilla:mainfrom
janbrasna:fix/docker-compose-bake-cockup
Aug 1, 2025
Merged

Hotfix optout of bake builder in CI#16464
stevejalim merged 2 commits into
mozilla:mainfrom
janbrasna:fix/docker-compose-bake-cockup

Conversation

@janbrasna
Copy link
Copy Markdown
Collaborator

@janbrasna janbrasna commented Aug 1, 2025

One-line summary

Disabling the new behavior until GHA update compose to a patched version in the runner image and finish deploying to 100% demography.

Significant changes and points to review

actions/runner-images#12669 made a breaking update that enabled COMPOSE_BAKE by default. That also had followup fixes released that are necessary for successful runs, however the current GHA runner images deployed ship a version that leaves the build process hanging indefinitely. docker/compose#12998

We need to disable it until a new runner image ships with compose bumped further, as it is believed a version already released some time ago covered this issue and it's expected it should go away here in CI too once GH updates the runner to a more recent tool version. Also note it's already reported as deprecated, for COMPOSE_BAKE=false to be ignored/removed completely in an upcoming release.

Issue / Bugzilla link

actions/runner-images#12685

(Supersedes #16463)

Testing

/actions/runs/16663214340/job/47164637902 💚

@slightlyoffbeat
Copy link
Copy Markdown
Contributor

@janbrasna incredible! All of those green tests are great to see :)

Thank you for your work on this. We are very grateful.

@stevejalim stevejalim added the WMO and FXC Code relevant to both mozilla/bedrock (www.mozilla.org) and mozmeao/springfield (www.firefox.com) label Aug 1, 2025
@stevejalim
Copy link
Copy Markdown
Contributor

Thank you so much @janbrasna - above and beyond! You are a star.

I'm happy for us to merge this now, with the caveat that support for COMPOSE_BAKE=false may be removed/ignored (as noted in the conversation on docker/compose#12998), so we'll need to keep an eye out for that. If COMPOSE_BAKE gets removed and "baking" becomes the default, then we're stuck again. But given that you've flagged it clearly @janbrasna I hope they take notice before removing the flag

Copy link
Copy Markdown
Contributor

@stevejalim stevejalim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you again @janbrasna!

This is a good stopgap to keep us moving, and there's hope that before Docker makes bake the default behaviour, they ensure the hanging GHA issue is resolved first.

If we end up back in the same situation, then moving to a build pattern where we don't use docker compose for building unit test images (nor release images) looks like the way forward. Indeed, we might do well to look at that approach anyway, if there are efficiency gains there

make clean test-image
CONTAINER_ID=$(docker ps -alq)
docker cp $CONTAINER_ID:/app/python_coverage .
timeout-minutes: 30
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No harm in that - our builds are done well before 30 mins

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is — if it blows up again, let everyone know by failing the CI while they're still at work, not six hours later when the runner times out.

(Once this gets the container deployed in prod build, I'm porting this over to fxc.)

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

… aaand it's built, incl. gotoprod pipelines.

This is the failing CI:

"bin/docker-compose.sh" build --pull release
#1 [internal] load local bake definitions
#1 reading from stdin 599B done

This is the legacy kicking in now in the green CIs:

"bin/docker-compose.sh" build --pull release
#0 building with "builder-218dd43c-893e-45d1-b3f4-15b44da5b582" instance using docker-container driver

@stevejalim stevejalim merged commit 9277b28 into mozilla:main Aug 1, 2025
5 checks passed
@janbrasna janbrasna deleted the fix/docker-compose-bake-cockup branch August 1, 2025 07:08
@janbrasna
Copy link
Copy Markdown
Collaborator Author

When I get more time to wrestle with dep mgmt on the runner itself, I want to confirm a PoC that just updating to their v2.38 branch resolves this; as a confirmation for shipping a GH image update with that version bump would resolve that for everyone.

(I think the current notion is that this issue is already fixed in some of the released v2.38–39 versions, so they are making amends to remove the flag in v2.40 and beyond… But the issue is open, is actively being investigated, and if any of the already landed patches mentioned in the thread are confirmed to be resolving this, I think it's safe to just ride the trains — as any version that would not take this flag is understood to have it fixed at the same time — so I think the fact the ticket is still open to make this confirmation on a very reproducible public STR would help the confidence in that.

Basically just this:

- uses: docker/setup-compose-action@v1
with:
version: v2.38.1

gets us green again, if need be.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

WMO and FXC Code relevant to both mozilla/bedrock (www.mozilla.org) and mozmeao/springfield (www.firefox.com)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants