Merged
Conversation
windreamer
approved these changes
Oct 30, 2025
windreamer
approved these changes
Nov 15, 2025
windreamer
approved these changes
Nov 25, 2025
grimoire
reviewed
Nov 25, 2025
grimoire
reviewed
Nov 25, 2025
lvhan028
reviewed
Nov 25, 2025
windreamer
approved these changes
Nov 25, 2025
Collaborator
|
@CUHKSZzxy test_docker workflow failed |
lvhan028
reviewed
Nov 26, 2025
docker/install.sh
Outdated
| rm -rf /var/lib/apt/lists/* | ||
|
|
||
| # install GDRCopy | ||
| GDRCOPY_VERSION=2.5.1 |
Collaborator
There was a problem hiding this comment.
Will it affect on A100 platform?
Collaborator
Author
There was a problem hiding this comment.
GDRCopy is used in the case of DeepEP, theoretically wont affect non-Hopper devices. But haven't tested on A100 yet
lvhan028
reviewed
Nov 26, 2025
lvhan028
approved these changes
Nov 27, 2025
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Related
DeepEP mode switching and buffer clear updates, credit to @SHshenhao
Those fixes in the current PR refer to the following PR
And huge thanks to collaborators who upgraded DeepEP, DeepGEMM in DLBlas
Modifications
According to the official guide, to enable NVSHMEM IBGDA support, we can either (1) modify host driver config and reboot, or (2) install GDRCopy and load the gdrdrv kernel module. When modifying the driver is not an option, GDRCopy should be installed. Some concerns discussed in Questions about environment setups deepseek-ai/DeepEP#486
vLLM dockerfile, SGLang dockerfile both install GDRCopy.
vLLM ep kernels emphasize that enabling IBGDA is crucial for multinode DeepEP deployment. vLLM expert parallel deployment lists installing GDRCopy as a necessary step.
Default DeepEP buffer num sms will raise the following errors on H200 multi-nodes. Therefore, we expose this environment variable to users for configuration. A feasible value on H200 is
DEEPEP_BUFFER_NUM_SMS=16.This is a known issue in deepep
Flip DeepEP mode between prefill and decode, and also clear the buffer (performed by the DLBLas side when setting to low latency). Otherwise, it will trigger CUDA illegal memory access in deepep or the following deepgemm kernel, as known in