From 3140fb4be35277debb7a746d22c0f4f638ab8bfd Mon Sep 17 00:00:00 2001 From: Jarek Potiuk Date: Tue, 4 Mar 2025 10:24:13 +0100 Subject: [PATCH] Decrease size of docker context by two orders of magnitude When building docker image, local source files are sent as context. Unfortunately node and pnmp are adding a lot of cache inside the source tree and if we are not carefuly with excludin those, we end up with GBs of context being sent to docker before the build even starts (which takes minutes) This PR removes .pnpm-store folders that were the root cause for sending 1.5GB of context. It also adds simple instructions how you can check which files are in the context and how to see the size of the context. With this change, the context is down from 1.5 GB to 90 MB - cutting docker build context sending time from ~ minute to under a second. --- .dockerignore | 15 ++++++++++++--- dev/MANUALLY_BUILDING_IMAGES.md | 26 ++++++++++++++++++++++++++ 2 files changed, 38 insertions(+), 3 deletions(-) diff --git a/.dockerignore b/.dockerignore index dd90a9258180d..f561bc836f8e5 100644 --- a/.dockerignore +++ b/.dockerignore @@ -80,9 +80,10 @@ # Git version is dynamically generated airflow/git_version -airflow/ui/node_modules -airflow/auth/managers/simple/ui/node_modules +# Exclude node/pmpme caches.. +**/.pnpm-store +**/node_modules # Exclude link to docs airflow/ui/static/docs @@ -91,6 +92,14 @@ airflow/www/static/docs airflow/www/static/dist airflow/www/node_modules +# Exclude any .venv and .ruff_cache +**/.venv +**/.ruff_cache/ + +# Exclude docs artifacts +**/_inventory_cache/ +docs/**/_api/** + # Exclude python generated files **/__pycache__/ **/*.py[cod] @@ -99,7 +108,7 @@ airflow/www/node_modules **/env/ **/build/ **/develop-eggs/ -/dist/ +**/dist/ **/downloads/ **/eggs/ **/.eggs/ diff --git a/dev/MANUALLY_BUILDING_IMAGES.md b/dev/MANUALLY_BUILDING_IMAGES.md index 2a07f8b0c084e..b0537fe6c5b80 100644 --- a/dev/MANUALLY_BUILDING_IMAGES.md +++ b/dev/MANUALLY_BUILDING_IMAGES.md @@ -22,6 +22,7 @@ **Table of Contents** *generated with [DocToc](https://github.com/thlorenz/doctoc)* - [Building docker images](#building-docker-images) +- [Keeping your docker context small](#keeping-your-docker-context-small) - [Setting environment with emulation](#setting-environment-with-emulation) - [Setting up cache refreshing with hardware ARM/AMD support](#setting-up-cache-refreshing-with-hardware-armamd-support) @@ -37,6 +38,31 @@ you do not have those two installed. You also need to have the right permissions to push the images, so you should run `docker login` before and authenticate with your DockerHub token. +## Keeping your docker context small + +Sometimes, especially when you generate node assets, some of the files generated are kept in the source +directory. This can make the docker context very large when building images, because the whole context +is transferred to the docker daemon. In order to avoid this we have .dockerignore where we exclude certain +paths from being treated as part of the context - similar to .gitignore that keeps them away from git. + +If your context gets large you see a long (minutes) preliminary step before dockeer build is run +where the context is being transmitted. + +You can see all the context files by running: + +```shell script +printf 'FROM scratch\nCOPY . /' | DOCKER_BUILDKIT=1 docker build -q -f- -o- . | tar t +``` + +Once you see something that should be excluded from the context, you should add it to `.dockerignore` file. + +You can also check the size of the context by running: + +```shell script +printf 'FROM scratch\nCOPY . /' | DOCKER_BUILDKIT=1 docker build -q -f- -o- . | wc -c | numfmt --to=iec --suffix=B +``` + + ## Setting environment with emulation According to the [official installation instructions](https://docs.docker.com/buildx/working-with-buildx/#build-multi-platform-images)