[NVRTC] Add NVSHMEM support to NVRTC compilation path#18681
[NVRTC] Add NVSHMEM support to NVRTC compilation path#18681spectrometerHBH merged 5 commits intoapache:mainfrom
Conversation
Summary of ChangesHello @Kathryn-cat, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances TVM's CUDA compilation capabilities by integrating NVSHMEM support into the NVRTC backend. This allows users to compile and run CUDA kernels that leverage NVSHMEM features using the NVRTC compiler, offering an alternative to the traditional NVCC path. The changes involve adapting the NVRTC compilation process to include necessary linking steps, handle specific CUDA header requirements, and manage different binary output formats, ensuring seamless operation for distributed GPU programming with NVSHMEM. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces NVSHMEM support for the NVRTC compilation path, which is a valuable addition. The implementation is well-structured, including necessary workarounds for NVRTC and comprehensive end-to-end tests for both nvcc and nvrtc paths. My main concern is a potential resource leak related to CUDA context management, which I've detailed in a specific comment.
| # Check if there's already a CUDA context; create one if not | ||
| result, context = cu.cuCtxGetCurrent() | ||
| if result != cu.CUresult.CUDA_SUCCESS or context is None or int(context) == 0: | ||
| result, device = cu.cuDeviceGet(0) | ||
| if result != cu.CUresult.CUDA_SUCCESS: | ||
| raise RuntimeError(f"Failed to get CUDA device: {result}") | ||
| result, context = cu.cuCtxCreate(None, 0, device) | ||
| if result != cu.CUresult.CUDA_SUCCESS: | ||
| raise RuntimeError(f"Failed to create CUDA context: {result}") |
There was a problem hiding this comment.
The CUDA context created here if one doesn't already exist is not destroyed. This can lead to a resource leak, as CUDA contexts hold significant GPU resources. It would be safer to ensure that if a context is created within this function, it is also destroyed before the function returns.
A try...finally block should be used to manage the context's lifecycle:
context_created = False
context = None
try:
# get or create context
# ...
# linking logic
# ...
finally:
if context_created and context:
cu.cuCtxDestroy(context)This would ensure that any context created specifically for this compilation is properly cleaned up, preventing resource leaks. Additionally, the condition context is None or int(context) == 0 can be simplified to not context since the cuda-python CUcontext object has a __bool__ method that handles this check.
|
@tvm-bot rerun |
| return bytearray(binary_buf) | ||
| # link stage for NVSHMEM | ||
| if use_nvshmem: | ||
| import ctypes # pylint: disable=import-outside-toplevel |
There was a problem hiding this comment.
conside move into a separate function
|
@tvm-bot re-run |
1 similar comment
|
@tvm-bot re-run |
This PR adds NVSHMEM support to the NVRTC path in
python/tvm/contrib/nvcc.py.This is enabled by a compile stage and a link stage for NVSHMEM programs. Tested locally via
tests/python/disco/test_nvshmem.py.Results show it's about 10-35% faster in compilation speed. Kernel perf is the same.