Skip to content

Optimize Docker Build Layers and Add Sudo Privileges for Fast-LLM Container#2

Merged
jlamypoirier merged 8 commits into
mainfrom
tscholak/tune_dockerfile
Oct 21, 2024
Merged

Optimize Docker Build Layers and Add Sudo Privileges for Fast-LLM Container#2
jlamypoirier merged 8 commits into
mainfrom
tscholak/tune_dockerfile

Conversation

@tscholak
Copy link
Copy Markdown
Collaborator

@tscholak tscholak commented Oct 16, 2024

I'd like to refine the Dockerfile slightly to improve build efficiency and add runtime flexibility for the Fast-LLM container. The changes are small but impactful, focusing on two main improvements:

  1. Improved Build Layering for Faster Rebuilds:
    • The build process is now split into two distinct stages:
      1. Dependency installation (based on setup.py, setup.cfg, pyproject.toml) is done first.
      2. Fast-LLM code installation is done last, by using the new --exclude= option enabled by Dockerfile syntax version 1.7-labs.
    • With this change the dependencies don't need to be reinstalled when the Fast-LLM source code changes. That can reduce rebuild times significantly since code changes land in different Docker image layers than dependencies.
  2. Added Sudo Privileges for Fast-LLM User:
    • Introduced password-less sudo privileges to the fast_llm user. This addition allows system adjustments (e.g., modifying system limits or adjusting host settings) directly from within the container.
    • I found this very useful in bare Kubernetes environments (like LambdaLabs), where I needed to frequently make changes to system configurations (such as those controllable with ulimit) that do not persist across container restarts.

Here's a breakdown of the build time:

docker build --platform linux/amd64 -t torstenscholak663/fast-llm:latest --build-arg FAST_LLM_USER_ID=1000 .                                   
[+] Building 48.3s (23/23) FINISHED                                                                                                                                                                                                                               docker:desktop-linux
 => [internal] load build definition from Dockerfile                                                                                                                                                                                                                              0.0s
 => => transferring dockerfile: 1.48kB                                                                                                                                                                                                                                            0.0s
 => resolve image config for docker-image://docker.io/docker/dockerfile:1.7-labs                                                                                                                                                                                                  0.7s
 => [auth] docker/dockerfile:pull token for registry-1.docker.io                                                                                                                                                                                                                  0.0s
 => CACHED docker-image://docker.io/docker/dockerfile:1.7-labs@sha256:b99fecfe00268a8b556fad7d9c37ee25d716ae08a5d7320e6d51c4dd83246894                                                                                                                                            0.0s
 => [internal] load metadata for nvcr.io/nvidia/pytorch:24.07-py3                                                                                                                                                                                                                 1.0s
 => [internal] load .dockerignore                                                                                                                                                                                                                                                 0.0s
 => => transferring context: 163B                                                                                                                                                                                                                                                 0.0s
 => [ 1/15] FROM nvcr.io/nvidia/pytorch:24.07-py3@sha256:f47441c102a810a27758b0b6274d46012ac15fd467119b2e1f0467be82bc8af3                                                                                                                                                         0.0s
 => [internal] load build context                                                                                                                                                                                                                                                 0.0s
 => => transferring context: 12.73kB                                                                                                                                                                                                                                              0.0s
 => CACHED [ 2/15] RUN apt-get update     && apt-get install --no-install-recommends -y git-lfs sudo util-linux     && rm -rf /var/lib/apt/lists/*     && git lfs install                                                                                                         0.0s
 => CACHED [ 3/15] RUN useradd -m -u 1000 -s /bin/bash fast_llm     && echo 'fast_llm ALL=(ALL) NOPASSWD: ALL' >> /etc/sudoers                                                                                                                                                    0.0s
 => CACHED [ 4/15] WORKDIR /app                                                                                                                                                                                                                                                   0.0s
 => [ 5/15] COPY --chown=fast_llm ./fast_llm/csrc/ fast_llm/csrc/                                                                                                                                                                                                                 0.1s
 => [ 6/15] RUN make -C ./fast_llm/csrc/                                                                                                                                                                                                                                          4.9s
 => [ 7/15] COPY --chown=fast_llm setup.py setup.cfg ./                                                                                                                                                                                                                           0.0s
 => [ 8/15] RUN PIP_NO_INPUT=1 pip3 install --no-cache-dir ".[CORE,OPTIONAL,DEV]"                                                                                                                                                                                                35.7s
 => [ 9/15] COPY --chown=fast_llm ./Megatron-LM Megatron-LM                                                                                                                                                                                                                       0.0s 
 => [10/15] COPY --chown=fast_llm ./examples examples                                                                                                                                                                                                                             0.0s 
 => [11/15] COPY --chown=fast_llm ./tests tests                                                                                                                                                                                                                                   0.0s 
 => [12/15] COPY --chown=fast_llm ./tools tools                                                                                                                                                                                                                                   0.0s 
 => [13/15] COPY --exclude=./fast_llm/csrc/ --chown=fast_llm ./fast_llm/ fast_llm/                                                                                                                                                                                                0.0s 
 => [14/15] COPY --chown=fast_llm pyproject.toml ./                                                                                                                                                                                                                               0.0s 
 => [15/15] RUN PIP_NO_INPUT=1 pip3 install --no-deps -e .                                                                                                                                                                                                                        4.6s
 => exporting to image                                                                                                                                                                                                                                                            1.0s
 => => exporting layers                                                                                                                                                                                                                                                           1.0s
 => => writing image sha256:f9b20cc3ca3c99ad8d3788cb6eacf5f48d518f48aec3bc3250f8d1d0d7cedeb3                                                                                                                                                                                      0.0s 
 => => naming to docker.io/torstenscholak663/fast-llm:latest                                                                                                                                                                                                                      0.0s 

…reserve compiled C++ artifacts, and add sudo for runtime adjustments
…reserve compiled C++ artifacts, and add sudo for runtime adjustments
@tscholak tscholak requested a review from jlamypoirier October 16, 2024 14:12
Comment thread Dockerfile Outdated
Comment thread Dockerfile
Comment thread Dockerfile Outdated

# Copy the main source code for Fast-LLM and install in editable mode
COPY --exclude=./fast_llm/csrc/ --chown=fast_llm ./fast_llm/ fast_llm/
RUN PIP_NO_INPUT=1 pip3 install --no-deps -e .
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the need for another install here? What was wrong with the previous version?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the most important change: In the previous version, dependencies and code were installed at the same time. Now we first install the dependencies (see above) and only make the editable install at the very end. It's also ensured this way that pip can find and link all code in the fast-llm folder.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, but what's the difference in practice? All setuptools really does is add a symlink to the fast_llm directory so it shouldn't make any real difference

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, looks like you are right.

Comment thread Dockerfile
COPY --chown=fast_llm fast_llm/csrc/ ./fast_llm/csrc/
RUN make -C ./fast_llm/csrc/
# Copy the dependency files and install dependencies
COPY --chown=fast_llm setup.py setup.cfg pyproject.toml ./
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why adding the toml? It's not used for the installation.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it appeared to me that the setup.* files and pyproject.toml fulfil similar purposes and could be grouped together. it is not uncommon to specify dependencies in the pyproject.toml file, because this is how setup tools usually works.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't do that in fast-llm though, the toml file is just there because black needs it.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I'll split it up then and copy pyproject.toml somewhere else

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really think it's worth copying at all... But if we do keep it let's keep it here so we don't add another line

Comment thread Dockerfile

# Add a user for Fast-LLM with sudo privileges for runtime adjustments
ARG FAST_LLM_USER_ID=1000
RUN useradd -m -u $FAST_LLM_USER_ID -s /bin/bash fast_llm \
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🥇

@jlamypoirier
Copy link
Copy Markdown
Collaborator

  • With this change the dependencies don't need to be reinstalled when the Fast-LLM source code changes. That can reduce rebuild times significantly since code changes land in different Docker image layers than dependencies.

Not sure I'm following here, was it not the case already?

@tscholak
Copy link
Copy Markdown
Collaborator Author

Not sure I'm following here, was it not the case already?

People were telling me it was not. I never checked those claims, I just reworked the Dockerfile such that it was clear and sure that we wouldn't always rebuild everything on small code changes. Looks like it wasn't truly necessary. I removed those changes.

@tscholak tscholak closed this Oct 16, 2024
@tscholak tscholak reopened this Oct 16, 2024
Copy link
Copy Markdown
Collaborator

@jlamypoirier jlamypoirier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I double-checked the build times, it was ~100-500 ms before this PR even with code changes. I suspect the complains about rebuilding happened because of the multiple recent changes to dependencies, etc., that do force a re-install.

The rest of this PR looks useful though, I have one minor comment and then we can merge.

Comment thread Dockerfile

# Copy the dependency files and install dependencies
COPY --chown=fast_llm setup.py setup.cfg pyproject.toml ./
RUN PIP_NO_INPUT=1 pip3 install --no-cache-dir -e ".[CORE,OPTIONAL,DEV]"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's move this back above the crsc compile, since this installation is much longer.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@jlamypoirier jlamypoirier merged commit a21f9b5 into main Oct 21, 2024
@jlamypoirier jlamypoirier deleted the tscholak/tune_dockerfile branch October 21, 2024 12:31
jlamypoirier added a commit that referenced this pull request May 20, 2026
- Add `_sdp_dim`/`_sdp_active` to `LanguageModelLoss.__init__` so GSPO's
  SDP branch doesn't AttributeError on the first non-test call.
- Replace `document_index.max().item()` (and the SDP MAX all-reduce) with
  `len(kwargs[BlockKwargs.lengths])`: CPU-side, identical across SDP ranks,
  removes two GPU→CPU syncs per microbatch.
- Decorate `fused_gspo_loss_forward_backward` with `@torch.compile` for
  parity with GRPO. The `num_segments == 1` test case skips on CPU since
  torch._inductor's CPU codegen mishandles `index_add_` into a size-1
  buffer (atomic_add scatter).
- Make `divisor` a required arg on `fused_gspo_loss_forward_backward`:
  the wrapper always overrides it with the global document count, and
  the previous local-rank default would silently mis-normalize under SDP.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
jlamypoirier added a commit that referenced this pull request May 20, 2026
- Add `sp_group` arg to fused_gspo_loss_forward_backward and all-reduce the
  three segment buffers over it when sequence-parallel shards the sequence
  across the TP group; otherwise per-segment ratios use partial sums and
  produce silent corruption under SP. Wrapper passes `self._parallel_dim.group`
  when `_sequence_parallel` is active.
- Wire `num_labels_in_seq` through the GSPO test and assert
  `new_logprobs_fused` against the reference. Required aligning the reference
  to use scaled logits for new_logprobs (reusing `target_log_probabilities`),
  matching the kernel's behavior of reporting the loss-path log-probs.
- Drop the unreachable `max(num_segments, 1)` guard in the GSPO reference and
  the matching `divisor=max(num_segments, 1)` at the test call site.

SDP all-reduce branch coverage (review item 3) deferred to a follow-up adding
a `gspo_loss` flag to `tests/layers/test_lm_head.py` alongside the existing
GRPO config, with an SDP distributed variant.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants