Skip to content

[Relax][Pass] Lowering passes for GPU IPC memory and allreduce#16759

Merged
tqchen merged 1 commit intoapache:mainfrom
MasterJH5574:tvm-dev/2024-03-20-lowering-ipc-mem
Mar 21, 2024
Merged

[Relax][Pass] Lowering passes for GPU IPC memory and allreduce#16759
tqchen merged 1 commit intoapache:mainfrom
MasterJH5574:tvm-dev/2024-03-20-lowering-ipc-mem

Conversation

@MasterJH5574
Copy link
Copy Markdown
Contributor

This PR introduces the lowering passes for GPU IPC memory and all-reduce. It contains the following changes:

  1. a pass IPCAllreduceRewrite which rewrites "runtime.disco.allreduce" to "runtime.disco.cuda_ipc.custom_allreduce", and rewrites the storage scopes of the all-reduce inputs's from "global" to "ipc_memory" accordingly.

  2. memory planning enhancement, making the planning be aware of storage scopes. So each storage scope will be planned independently.

  3. a pass LowerGPUIPCAllocStorage that rewrites the storage allocation of IPC memory from builtin ops to calls to function "runtime.disco.cuda_ipc.alloc_storage".

  4. supports the op relax.builtin.alloc_tensor with storage scope. The default storage scope is "global".

We write the new passes in Python for experiment and fast development. These are good demos showing we can have efficient development with the architecture enabled by TVM.

This PR introduces the lowering passes for GPU IPC memory and
all-reduce. It contains the following changes:

1. a pass `IPCAllreduceRewrite` which rewrites `"runtime.disco.allreduce"`
to `"runtime.disco.cuda_ipc.custom_allreduce"`, and rewrites
the storage scopes of the all-reduce inputs's from "global" to
"ipc_memory" accordingly.

2. memory planning enhancement, making the planning be aware of
storage scopes. So each storage scope will be planned independently.

3. a pass `LowerGPUIPCAllocStorage` that rewrites the storage allocation
of IPC memory from builtin ops to calls to function `"runtime.disco.cuda_ipc.alloc_storage"`.

4. supports the op `relax.builtin.alloc_tensor` with storage scope.
The default storage scope is `"global"`.

We write the new passes in Python for experiment and fast development.
These are good demos showing we can have efficient development
with the architecture enabled by TVM.
@MasterJH5574 MasterJH5574 force-pushed the tvm-dev/2024-03-20-lowering-ipc-mem branch from 0211a3a to 3b8183f Compare March 21, 2024 04:02
@tqchen tqchen merged commit 858486f into apache:main Mar 21, 2024
thaisacs pushed a commit to thaisacs/tvm that referenced this pull request Apr 3, 2024
…e#16759)

This PR introduces the lowering passes for GPU IPC memory and
all-reduce. It contains the following changes:

1. a pass `IPCAllreduceRewrite` which rewrites `"runtime.disco.allreduce"`
to `"runtime.disco.cuda_ipc.custom_allreduce"`, and rewrites
the storage scopes of the all-reduce inputs's from "global" to
"ipc_memory" accordingly.

2. memory planning enhancement, making the planning be aware of
storage scopes. So each storage scope will be planned independently.

3. a pass `LowerGPUIPCAllocStorage` that rewrites the storage allocation
of IPC memory from builtin ops to calls to function `"runtime.disco.cuda_ipc.alloc_storage"`.

4. supports the op `relax.builtin.alloc_tensor` with storage scope.
The default storage scope is `"global"`.

We write the new passes in Python for experiment and fast development.
These are good demos showing we can have efficient development
with the architecture enabled by TVM.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants