Skip to content

Add complex64 and complex128 gpu support for tensor_scatter_nd_add#40585

Merged
tensorflow-copybara merged 1 commit into
tensorflow:masterfrom
yongtang:40577-tensor_scatter_nd_add-gpu
Jul 8, 2020
Merged

Add complex64 and complex128 gpu support for tensor_scatter_nd_add#40585
tensorflow-copybara merged 1 commit into
tensorflow:masterfrom
yongtang:40577-tensor_scatter_nd_add-gpu

Conversation

@yongtang

Copy link
Copy Markdown
Member

This PR adds complex64 and complex128 gpu support for tensor_scatter_nd_add,
as was raised in #40577.

This PR fixes #40577.

Signed-off-by: Yong Tang yong.tang.github@outlook.com

This PR adds complex64 and complex128 gpu support for tensor_scatter_nd_add,
as was raised in 40577.

This PR fixes 40577.

Signed-off-by: Yong Tang <yong.tang.github@outlook.com>
@google-ml-butler google-ml-butler Bot added the size:XS CL Change Size: Extra Small label Jun 18, 2020
@gbaned gbaned self-assigned this Jun 18, 2020
@gbaned gbaned added the prtype:bugfix PR to fix a bug label Jun 18, 2020
@gbaned gbaned requested a review from chsigg June 18, 2020 17:16
@gbaned gbaned added the awaiting review Pull request awaiting review label Jun 23, 2020

@jaingaurav jaingaurav left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a test case that covers this already?

@gbaned gbaned added the comp:core issues related to core part of tensorflow label Jul 3, 2020
@tensorflowbutler tensorflowbutler removed the awaiting review Pull request awaiting review label Jul 5, 2020
@google-ml-butler google-ml-butler Bot added kokoro:force-run Tests on submitted change ready to pull PR ready for merge process labels Jul 6, 2020
@kokoro-team kokoro-team removed the kokoro:force-run Tests on submitted change label Jul 6, 2020
@tensorflow-copybara tensorflow-copybara merged commit 014cdf6 into tensorflow:master Jul 8, 2020
copybara-service Bot pushed a commit that referenced this pull request Apr 14, 2026
…to PreparedTransfer

Imported from GitHub PR openxla/xla#40585

📝 Summary of Changes

This PR is the first in a sequence of PRs that will refactor cross-host data transfer implementations to eventually rely on a shared helper function `CrossHostTransferBuffers`. `CrossHostTransferBuffers` is planned to eventually be integrated into the PJRT APIs to enable receiving data into preallocated receive buffers (this feature is being planned in collaboration with @gspschmid, @emilyfertig, and @pschuh). As a first step, this PR unifies `Prepared{Send,Receive}` structs into a single `PreparedTransfer` struct.

This PR also moves waiting for dependency events out of NCCL group calls since the benefit of a NCCL group section is to aggregate the collectives launched inside of it (unrelated to waiting on dependency events).

🎯 Justification

It is difficult to achieve good comm/compute overlap with cross-host data transfers as the current implementation always allocates receive-buffers 'just-in-time', and because the GPU memory allocator blocks on the compute stream. `CrossHostTransferBuffers` will enable users to receive into preallocated receive buffers, making it easier to avoid the allocator blocking issue. This PR is a first step towards implementing `CrossHostTransferBuffers`.

🚀 Kind of Contribution
 ♻️ Cleanup (eventually ✨ New Feature)

🧪 Unit Tests:
This PR only refactors the implementation of `CrossHost{Send/Receive}Buffers`, so the pre-existing unit tests for those methods already test this PR.

🧪 Execution Tests:
Verified that [these 4 correctness tests](https://gist.github.com/rao-ashish/24ac0df0cb18243c649ac535964b31b8) continue to pass.
Copybara import of the project:

--
8d21ba239d7f46fabaf864c48b52d1cb655d0f10 by Ashish Rao <asrao@nvidia.com>:

Unify Prepared{Send,Receive} into PreparedTransfer

Merging this change closes #40585

FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#40585 from rao-ashish:asrao/cross_host_refactor_v2_1 8d21ba239d7f46fabaf864c48b52d1cb655d0f10
PiperOrigin-RevId: 899628303
copybara-service Bot pushed a commit that referenced this pull request Apr 14, 2026
…to PreparedTransfer

Imported from GitHub PR openxla/xla#40585

📝 Summary of Changes

This PR is the first in a sequence of PRs that will refactor cross-host data transfer implementations to eventually rely on a shared helper function `CrossHostTransferBuffers`. `CrossHostTransferBuffers` is planned to eventually be integrated into the PJRT APIs to enable receiving data into preallocated receive buffers (this feature is being planned in collaboration with @gspschmid, @emilyfertig, and @pschuh). As a first step, this PR unifies `Prepared{Send,Receive}` structs into a single `PreparedTransfer` struct.

This PR also moves waiting for dependency events out of NCCL group calls since the benefit of a NCCL group section is to aggregate the collectives launched inside of it (unrelated to waiting on dependency events).

🎯 Justification

It is difficult to achieve good comm/compute overlap with cross-host data transfers as the current implementation always allocates receive-buffers 'just-in-time', and because the GPU memory allocator blocks on the compute stream. `CrossHostTransferBuffers` will enable users to receive into preallocated receive buffers, making it easier to avoid the allocator blocking issue. This PR is a first step towards implementing `CrossHostTransferBuffers`.

🚀 Kind of Contribution
 ♻️ Cleanup (eventually ✨ New Feature)

🧪 Unit Tests:
This PR only refactors the implementation of `CrossHost{Send/Receive}Buffers`, so the pre-existing unit tests for those methods already test this PR.

🧪 Execution Tests:
Verified that [these 4 correctness tests](https://gist.github.com/rao-ashish/24ac0df0cb18243c649ac535964b31b8) continue to pass.
Copybara import of the project:

--
8d21ba239d7f46fabaf864c48b52d1cb655d0f10 by Ashish Rao <asrao@nvidia.com>:

Unify Prepared{Send,Receive} into PreparedTransfer

Merging this change closes #40585

FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#40585 from rao-ashish:asrao/cross_host_refactor_v2_1 8d21ba239d7f46fabaf864c48b52d1cb655d0f10
PiperOrigin-RevId: 899628303
copybara-service Bot pushed a commit that referenced this pull request Apr 14, 2026
…to PreparedTransfer

Imported from GitHub PR openxla/xla#40585

📝 Summary of Changes

This PR is the first in a sequence of PRs that will refactor cross-host data transfer implementations to eventually rely on a shared helper function `CrossHostTransferBuffers`. `CrossHostTransferBuffers` is planned to eventually be integrated into the PJRT APIs to enable receiving data into preallocated receive buffers (this feature is being planned in collaboration with @gspschmid, @emilyfertig, and @pschuh). As a first step, this PR unifies `Prepared{Send,Receive}` structs into a single `PreparedTransfer` struct.

This PR also moves waiting for dependency events out of NCCL group calls since the benefit of a NCCL group section is to aggregate the collectives launched inside of it (unrelated to waiting on dependency events).

🎯 Justification

It is difficult to achieve good comm/compute overlap with cross-host data transfers as the current implementation always allocates receive-buffers 'just-in-time', and because the GPU memory allocator blocks on the compute stream. `CrossHostTransferBuffers` will enable users to receive into preallocated receive buffers, making it easier to avoid the allocator blocking issue. This PR is a first step towards implementing `CrossHostTransferBuffers`.

🚀 Kind of Contribution
 ♻️ Cleanup (eventually ✨ New Feature)

🧪 Unit Tests:
This PR only refactors the implementation of `CrossHost{Send/Receive}Buffers`, so the pre-existing unit tests for those methods already test this PR.

🧪 Execution Tests:
Verified that [these 4 correctness tests](https://gist.github.com/rao-ashish/24ac0df0cb18243c649ac535964b31b8) continue to pass.
Copybara import of the project:

--
8d21ba239d7f46fabaf864c48b52d1cb655d0f10 by Ashish Rao <asrao@nvidia.com>:

Unify Prepared{Send,Receive} into PreparedTransfer

Merging this change closes #40585

PiperOrigin-RevId: 899784485
copybara-service Bot pushed a commit that referenced this pull request Apr 15, 2026
…vice into ScheduleTransfersOnLocalDevice

Imported from GitHub PR openxla/xla#40919

📝 Summary of Changes
This PR builds on [XLA #40585](openxla/xla#40585), and is the next in a sequence of PRs that will refactor cross-host data transfer implementations to eventually rely on a shared helper function `CrossHostTransferBuffers`. `CrossHostTransferBuffers` is planned to eventually be integrated into the PJRT APIs to enable receiving data into preallocated receive buffers (this feature is being planned in collaboration with @gspschmid, @emilyfertig, and @pschuh). As part of implementing `CrossHostTransferBuffers`, this PR introduces a `CrossHostTransferSpec` struct and refactors `ScheduleSendsOnLocalDevice` into a more general `ScheduleTransfersOnLocalDevice`.

This PR also cleans up some of the error handling around the `prepare_transfers` closure inside `ScheduleTransfersOnLocalDevice` (formerly the `setup_sends` closure inside `ScheduleSendsOnLocalDevice`). Previously, if we got an error when we tried to extract the `LocalDeviceState`, we failed to set the `transfer_event` as an error. The current changes make sure that the error from `prepare_transfers` is always propagated through the transfer event.

🎯 Justification
It is difficult to achieve good comm/compute overlap with cross-host data transfers as the current implementation always allocates receive-buffers 'just-in-time', and because the GPU memory allocator blocks on the compute stream. `CrossHostTransferBuffers` will enable users to receive into preallocated receive buffers, making it easier to avoid the allocator blocking issue. This PR is a step towards implementing `CrossHostTransferBuffers`.

🚀 Kind of Contribution
♻️ Cleanup (eventually ✨ New Feature)

🧪 Unit Tests:
This PR only refactors the implementation of `CrossHost{Send/Receive}Buffers`, so the pre-existing unit tests for those methods already test this PR.

🧪 Execution Tests:
Verified that [these 4 correctness tests](https://gist.github.com/rao-ashish/24ac0df0cb18243c649ac535964b31b8) continue to pass.
Copybara import of the project:

--
fcf2fa1139441fda4b3ce339bc1c222f93ae7023 by Ashish Rao <asrao@nvidia.com>:

Refactor ScheduleSendsOnLocalDevice into more generic ScheduleTransfersOnLocalDevice

--
426c90fd3ef312eb161b6a9575383ed6c733ec2e by Ashish Rao <asrao@nvidia.com>:

Add comment clarifying rank ids; add const keywords

Merging this change closes #40919

FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#40919 from rao-ashish:asrao/cross_host_refactor_v2_2 426c90fd3ef312eb161b6a9575383ed6c733ec2e
PiperOrigin-RevId: 900247155
copybara-service Bot pushed a commit that referenced this pull request Apr 15, 2026
…vice into ScheduleTransfersOnLocalDevice

Imported from GitHub PR openxla/xla#40919

📝 Summary of Changes
This PR builds on [XLA #40585](openxla/xla#40585), and is the next in a sequence of PRs that will refactor cross-host data transfer implementations to eventually rely on a shared helper function `CrossHostTransferBuffers`. `CrossHostTransferBuffers` is planned to eventually be integrated into the PJRT APIs to enable receiving data into preallocated receive buffers (this feature is being planned in collaboration with @gspschmid, @emilyfertig, and @pschuh). As part of implementing `CrossHostTransferBuffers`, this PR introduces a `CrossHostTransferSpec` struct and refactors `ScheduleSendsOnLocalDevice` into a more general `ScheduleTransfersOnLocalDevice`.

This PR also cleans up some of the error handling around the `prepare_transfers` closure inside `ScheduleTransfersOnLocalDevice` (formerly the `setup_sends` closure inside `ScheduleSendsOnLocalDevice`). Previously, if we got an error when we tried to extract the `LocalDeviceState`, we failed to set the `transfer_event` as an error. The current changes make sure that the error from `prepare_transfers` is always propagated through the transfer event.

🎯 Justification
It is difficult to achieve good comm/compute overlap with cross-host data transfers as the current implementation always allocates receive-buffers 'just-in-time', and because the GPU memory allocator blocks on the compute stream. `CrossHostTransferBuffers` will enable users to receive into preallocated receive buffers, making it easier to avoid the allocator blocking issue. This PR is a step towards implementing `CrossHostTransferBuffers`.

🚀 Kind of Contribution
♻️ Cleanup (eventually ✨ New Feature)

🧪 Unit Tests:
This PR only refactors the implementation of `CrossHost{Send/Receive}Buffers`, so the pre-existing unit tests for those methods already test this PR.

🧪 Execution Tests:
Verified that [these 4 correctness tests](https://gist.github.com/rao-ashish/24ac0df0cb18243c649ac535964b31b8) continue to pass.
Copybara import of the project:

--
fcf2fa1139441fda4b3ce339bc1c222f93ae7023 by Ashish Rao <asrao@nvidia.com>:

Refactor ScheduleSendsOnLocalDevice into more generic ScheduleTransfersOnLocalDevice

--
426c90fd3ef312eb161b6a9575383ed6c733ec2e by Ashish Rao <asrao@nvidia.com>:

Add comment clarifying rank ids; add const keywords

Merging this change closes #40919

FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#40919 from rao-ashish:asrao/cross_host_refactor_v2_2 426c90fd3ef312eb161b6a9575383ed6c733ec2e
PiperOrigin-RevId: 900247155
copybara-service Bot pushed a commit that referenced this pull request Apr 15, 2026
…vice into ScheduleTransfersOnLocalDevice

Imported from GitHub PR openxla/xla#40919

📝 Summary of Changes
This PR builds on [XLA #40585](openxla/xla#40585), and is the next in a sequence of PRs that will refactor cross-host data transfer implementations to eventually rely on a shared helper function `CrossHostTransferBuffers`. `CrossHostTransferBuffers` is planned to eventually be integrated into the PJRT APIs to enable receiving data into preallocated receive buffers (this feature is being planned in collaboration with @gspschmid, @emilyfertig, and @pschuh). As part of implementing `CrossHostTransferBuffers`, this PR introduces a `CrossHostTransferSpec` struct and refactors `ScheduleSendsOnLocalDevice` into a more general `ScheduleTransfersOnLocalDevice`.

This PR also cleans up some of the error handling around the `prepare_transfers` closure inside `ScheduleTransfersOnLocalDevice` (formerly the `setup_sends` closure inside `ScheduleSendsOnLocalDevice`). Previously, if we got an error when we tried to extract the `LocalDeviceState`, we failed to set the `transfer_event` as an error. The current changes make sure that the error from `prepare_transfers` is always propagated through the transfer event.

🎯 Justification
It is difficult to achieve good comm/compute overlap with cross-host data transfers as the current implementation always allocates receive-buffers 'just-in-time', and because the GPU memory allocator blocks on the compute stream. `CrossHostTransferBuffers` will enable users to receive into preallocated receive buffers, making it easier to avoid the allocator blocking issue. This PR is a step towards implementing `CrossHostTransferBuffers`.

🚀 Kind of Contribution
♻️ Cleanup (eventually ✨ New Feature)

🧪 Unit Tests:
This PR only refactors the implementation of `CrossHost{Send/Receive}Buffers`, so the pre-existing unit tests for those methods already test this PR.

🧪 Execution Tests:
Verified that [these 4 correctness tests](https://gist.github.com/rao-ashish/24ac0df0cb18243c649ac535964b31b8) continue to pass.
Copybara import of the project:

--
fcf2fa1139441fda4b3ce339bc1c222f93ae7023 by Ashish Rao <asrao@nvidia.com>:

Refactor ScheduleSendsOnLocalDevice into more generic ScheduleTransfersOnLocalDevice

--
426c90fd3ef312eb161b6a9575383ed6c733ec2e by Ashish Rao <asrao@nvidia.com>:

Add comment clarifying rank ids; add const keywords

Merging this change closes #40919

PiperOrigin-RevId: 900297557
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla: yes comp:core issues related to core part of tensorflow prtype:bugfix PR to fix a bug ready to pull PR ready for merge process size:XS CL Change Size: Extra Small

Projects

None yet

Development

Successfully merging this pull request may close these issues.

tensor scatter nd add doesn't support complex64 in tf 2.3-dev

7 participants