Add complex64 and complex128 gpu support for tensor_scatter_nd_add by yongtang · Pull Request #40585 · tensorflow/tensorflow

yongtang · 2020-06-18T16:19:17Z

This PR adds complex64 and complex128 gpu support for tensor_scatter_nd_add,
as was raised in #40577.

This PR fixes #40577.

Signed-off-by: Yong Tang yong.tang.github@outlook.com

This PR adds complex64 and complex128 gpu support for tensor_scatter_nd_add, as was raised in 40577. This PR fixes 40577. Signed-off-by: Yong Tang <yong.tang.github@outlook.com>

jaingaurav

Is there a test case that covers this already?

@gspschmid

…to PreparedTransfer Imported from GitHub PR openxla/xla#40585 📝 Summary of Changes This PR is the first in a sequence of PRs that will refactor cross-host data transfer implementations to eventually rely on a shared helper function `CrossHostTransferBuffers`. `CrossHostTransferBuffers` is planned to eventually be integrated into the PJRT APIs to enable receiving data into preallocated receive buffers (this feature is being planned in collaboration with @gspschmid, @emilyfertig, and @pschuh). As a first step, this PR unifies `Prepared{Send,Receive}` structs into a single `PreparedTransfer` struct. This PR also moves waiting for dependency events out of NCCL group calls since the benefit of a NCCL group section is to aggregate the collectives launched inside of it (unrelated to waiting on dependency events). 🎯 Justification It is difficult to achieve good comm/compute overlap with cross-host data transfers as the current implementation always allocates receive-buffers 'just-in-time', and because the GPU memory allocator blocks on the compute stream. `CrossHostTransferBuffers` will enable users to receive into preallocated receive buffers, making it easier to avoid the allocator blocking issue. This PR is a first step towards implementing `CrossHostTransferBuffers`. 🚀 Kind of Contribution ♻️ Cleanup (eventually ✨ New Feature) 🧪 Unit Tests: This PR only refactors the implementation of `CrossHost{Send/Receive}Buffers`, so the pre-existing unit tests for those methods already test this PR. 🧪 Execution Tests: Verified that [these 4 correctness tests](https://gist.github.com/rao-ashish/24ac0df0cb18243c649ac535964b31b8) continue to pass. Copybara import of the project: -- 8d21ba239d7f46fabaf864c48b52d1cb655d0f10 by Ashish Rao <asrao@nvidia.com>: Unify Prepared{Send,Receive} into PreparedTransfer Merging this change closes #40585 FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#40585 from rao-ashish:asrao/cross_host_refactor_v2_1 8d21ba239d7f46fabaf864c48b52d1cb655d0f10 PiperOrigin-RevId: 899628303

@gspschmid

…to PreparedTransfer Imported from GitHub PR openxla/xla#40585 📝 Summary of Changes This PR is the first in a sequence of PRs that will refactor cross-host data transfer implementations to eventually rely on a shared helper function `CrossHostTransferBuffers`. `CrossHostTransferBuffers` is planned to eventually be integrated into the PJRT APIs to enable receiving data into preallocated receive buffers (this feature is being planned in collaboration with @gspschmid, @emilyfertig, and @pschuh). As a first step, this PR unifies `Prepared{Send,Receive}` structs into a single `PreparedTransfer` struct. This PR also moves waiting for dependency events out of NCCL group calls since the benefit of a NCCL group section is to aggregate the collectives launched inside of it (unrelated to waiting on dependency events). 🎯 Justification It is difficult to achieve good comm/compute overlap with cross-host data transfers as the current implementation always allocates receive-buffers 'just-in-time', and because the GPU memory allocator blocks on the compute stream. `CrossHostTransferBuffers` will enable users to receive into preallocated receive buffers, making it easier to avoid the allocator blocking issue. This PR is a first step towards implementing `CrossHostTransferBuffers`. 🚀 Kind of Contribution ♻️ Cleanup (eventually ✨ New Feature) 🧪 Unit Tests: This PR only refactors the implementation of `CrossHost{Send/Receive}Buffers`, so the pre-existing unit tests for those methods already test this PR. 🧪 Execution Tests: Verified that [these 4 correctness tests](https://gist.github.com/rao-ashish/24ac0df0cb18243c649ac535964b31b8) continue to pass. Copybara import of the project: -- 8d21ba239d7f46fabaf864c48b52d1cb655d0f10 by Ashish Rao <asrao@nvidia.com>: Unify Prepared{Send,Receive} into PreparedTransfer Merging this change closes #40585 FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#40585 from rao-ashish:asrao/cross_host_refactor_v2_1 8d21ba239d7f46fabaf864c48b52d1cb655d0f10 PiperOrigin-RevId: 899628303

@gspschmid

…to PreparedTransfer Imported from GitHub PR openxla/xla#40585 📝 Summary of Changes This PR is the first in a sequence of PRs that will refactor cross-host data transfer implementations to eventually rely on a shared helper function `CrossHostTransferBuffers`. `CrossHostTransferBuffers` is planned to eventually be integrated into the PJRT APIs to enable receiving data into preallocated receive buffers (this feature is being planned in collaboration with @gspschmid, @emilyfertig, and @pschuh). As a first step, this PR unifies `Prepared{Send,Receive}` structs into a single `PreparedTransfer` struct. This PR also moves waiting for dependency events out of NCCL group calls since the benefit of a NCCL group section is to aggregate the collectives launched inside of it (unrelated to waiting on dependency events). 🎯 Justification It is difficult to achieve good comm/compute overlap with cross-host data transfers as the current implementation always allocates receive-buffers 'just-in-time', and because the GPU memory allocator blocks on the compute stream. `CrossHostTransferBuffers` will enable users to receive into preallocated receive buffers, making it easier to avoid the allocator blocking issue. This PR is a first step towards implementing `CrossHostTransferBuffers`. 🚀 Kind of Contribution ♻️ Cleanup (eventually ✨ New Feature) 🧪 Unit Tests: This PR only refactors the implementation of `CrossHost{Send/Receive}Buffers`, so the pre-existing unit tests for those methods already test this PR. 🧪 Execution Tests: Verified that [these 4 correctness tests](https://gist.github.com/rao-ashish/24ac0df0cb18243c649ac535964b31b8) continue to pass. Copybara import of the project: -- 8d21ba239d7f46fabaf864c48b52d1cb655d0f10 by Ashish Rao <asrao@nvidia.com>: Unify Prepared{Send,Receive} into PreparedTransfer Merging this change closes #40585 PiperOrigin-RevId: 899784485

@gspschmid

…vice into ScheduleTransfersOnLocalDevice Imported from GitHub PR openxla/xla#40919 📝 Summary of Changes This PR builds on [XLA #40585](openxla/xla#40585), and is the next in a sequence of PRs that will refactor cross-host data transfer implementations to eventually rely on a shared helper function `CrossHostTransferBuffers`. `CrossHostTransferBuffers` is planned to eventually be integrated into the PJRT APIs to enable receiving data into preallocated receive buffers (this feature is being planned in collaboration with @gspschmid, @emilyfertig, and @pschuh). As part of implementing `CrossHostTransferBuffers`, this PR introduces a `CrossHostTransferSpec` struct and refactors `ScheduleSendsOnLocalDevice` into a more general `ScheduleTransfersOnLocalDevice`. This PR also cleans up some of the error handling around the `prepare_transfers` closure inside `ScheduleTransfersOnLocalDevice` (formerly the `setup_sends` closure inside `ScheduleSendsOnLocalDevice`). Previously, if we got an error when we tried to extract the `LocalDeviceState`, we failed to set the `transfer_event` as an error. The current changes make sure that the error from `prepare_transfers` is always propagated through the transfer event. 🎯 Justification It is difficult to achieve good comm/compute overlap with cross-host data transfers as the current implementation always allocates receive-buffers 'just-in-time', and because the GPU memory allocator blocks on the compute stream. `CrossHostTransferBuffers` will enable users to receive into preallocated receive buffers, making it easier to avoid the allocator blocking issue. This PR is a step towards implementing `CrossHostTransferBuffers`. 🚀 Kind of Contribution ♻️ Cleanup (eventually ✨ New Feature) 🧪 Unit Tests: This PR only refactors the implementation of `CrossHost{Send/Receive}Buffers`, so the pre-existing unit tests for those methods already test this PR. 🧪 Execution Tests: Verified that [these 4 correctness tests](https://gist.github.com/rao-ashish/24ac0df0cb18243c649ac535964b31b8) continue to pass. Copybara import of the project: -- fcf2fa1139441fda4b3ce339bc1c222f93ae7023 by Ashish Rao <asrao@nvidia.com>: Refactor ScheduleSendsOnLocalDevice into more generic ScheduleTransfersOnLocalDevice -- 426c90fd3ef312eb161b6a9575383ed6c733ec2e by Ashish Rao <asrao@nvidia.com>: Add comment clarifying rank ids; add const keywords Merging this change closes #40919 FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#40919 from rao-ashish:asrao/cross_host_refactor_v2_2 426c90fd3ef312eb161b6a9575383ed6c733ec2e PiperOrigin-RevId: 900247155

@gspschmid

…vice into ScheduleTransfersOnLocalDevice Imported from GitHub PR openxla/xla#40919 📝 Summary of Changes This PR builds on [XLA #40585](openxla/xla#40585), and is the next in a sequence of PRs that will refactor cross-host data transfer implementations to eventually rely on a shared helper function `CrossHostTransferBuffers`. `CrossHostTransferBuffers` is planned to eventually be integrated into the PJRT APIs to enable receiving data into preallocated receive buffers (this feature is being planned in collaboration with @gspschmid, @emilyfertig, and @pschuh). As part of implementing `CrossHostTransferBuffers`, this PR introduces a `CrossHostTransferSpec` struct and refactors `ScheduleSendsOnLocalDevice` into a more general `ScheduleTransfersOnLocalDevice`. This PR also cleans up some of the error handling around the `prepare_transfers` closure inside `ScheduleTransfersOnLocalDevice` (formerly the `setup_sends` closure inside `ScheduleSendsOnLocalDevice`). Previously, if we got an error when we tried to extract the `LocalDeviceState`, we failed to set the `transfer_event` as an error. The current changes make sure that the error from `prepare_transfers` is always propagated through the transfer event. 🎯 Justification It is difficult to achieve good comm/compute overlap with cross-host data transfers as the current implementation always allocates receive-buffers 'just-in-time', and because the GPU memory allocator blocks on the compute stream. `CrossHostTransferBuffers` will enable users to receive into preallocated receive buffers, making it easier to avoid the allocator blocking issue. This PR is a step towards implementing `CrossHostTransferBuffers`. 🚀 Kind of Contribution ♻️ Cleanup (eventually ✨ New Feature) 🧪 Unit Tests: This PR only refactors the implementation of `CrossHost{Send/Receive}Buffers`, so the pre-existing unit tests for those methods already test this PR. 🧪 Execution Tests: Verified that [these 4 correctness tests](https://gist.github.com/rao-ashish/24ac0df0cb18243c649ac535964b31b8) continue to pass. Copybara import of the project: -- fcf2fa1139441fda4b3ce339bc1c222f93ae7023 by Ashish Rao <asrao@nvidia.com>: Refactor ScheduleSendsOnLocalDevice into more generic ScheduleTransfersOnLocalDevice -- 426c90fd3ef312eb161b6a9575383ed6c733ec2e by Ashish Rao <asrao@nvidia.com>: Add comment clarifying rank ids; add const keywords Merging this change closes #40919 FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#40919 from rao-ashish:asrao/cross_host_refactor_v2_2 426c90fd3ef312eb161b6a9575383ed6c733ec2e PiperOrigin-RevId: 900247155

@gspschmid

…vice into ScheduleTransfersOnLocalDevice Imported from GitHub PR openxla/xla#40919 📝 Summary of Changes This PR builds on [XLA #40585](openxla/xla#40585), and is the next in a sequence of PRs that will refactor cross-host data transfer implementations to eventually rely on a shared helper function `CrossHostTransferBuffers`. `CrossHostTransferBuffers` is planned to eventually be integrated into the PJRT APIs to enable receiving data into preallocated receive buffers (this feature is being planned in collaboration with @gspschmid, @emilyfertig, and @pschuh). As part of implementing `CrossHostTransferBuffers`, this PR introduces a `CrossHostTransferSpec` struct and refactors `ScheduleSendsOnLocalDevice` into a more general `ScheduleTransfersOnLocalDevice`. This PR also cleans up some of the error handling around the `prepare_transfers` closure inside `ScheduleTransfersOnLocalDevice` (formerly the `setup_sends` closure inside `ScheduleSendsOnLocalDevice`). Previously, if we got an error when we tried to extract the `LocalDeviceState`, we failed to set the `transfer_event` as an error. The current changes make sure that the error from `prepare_transfers` is always propagated through the transfer event. 🎯 Justification It is difficult to achieve good comm/compute overlap with cross-host data transfers as the current implementation always allocates receive-buffers 'just-in-time', and because the GPU memory allocator blocks on the compute stream. `CrossHostTransferBuffers` will enable users to receive into preallocated receive buffers, making it easier to avoid the allocator blocking issue. This PR is a step towards implementing `CrossHostTransferBuffers`. 🚀 Kind of Contribution ♻️ Cleanup (eventually ✨ New Feature) 🧪 Unit Tests: This PR only refactors the implementation of `CrossHost{Send/Receive}Buffers`, so the pre-existing unit tests for those methods already test this PR. 🧪 Execution Tests: Verified that [these 4 correctness tests](https://gist.github.com/rao-ashish/24ac0df0cb18243c649ac535964b31b8) continue to pass. Copybara import of the project: -- fcf2fa1139441fda4b3ce339bc1c222f93ae7023 by Ashish Rao <asrao@nvidia.com>: Refactor ScheduleSendsOnLocalDevice into more generic ScheduleTransfersOnLocalDevice -- 426c90fd3ef312eb161b6a9575383ed6c733ec2e by Ashish Rao <asrao@nvidia.com>: Add comment clarifying rank ids; add const keywords Merging this change closes #40919 PiperOrigin-RevId: 900297557

Add complex64 and complex128 gpu support for tensor_scatter_nd_add

9616883

This PR adds complex64 and complex128 gpu support for tensor_scatter_nd_add, as was raised in 40577. This PR fixes 40577. Signed-off-by: Yong Tang <yong.tang.github@outlook.com>

google-ml-butler Bot added the size:XS CL Change Size: Extra Small label Jun 18, 2020

googlebot added the cla: yes label Jun 18, 2020

yongtang mentioned this pull request Jun 18, 2020

tensor scatter nd add doesn't support complex64 in tf 2.3-dev #40577

Closed

gbaned self-assigned this Jun 18, 2020

gbaned added the prtype:bugfix PR to fix a bug label Jun 18, 2020

gbaned requested a review from chsigg June 18, 2020 17:16

gbaned added the awaiting review Pull request awaiting review label Jun 23, 2020

jaingaurav reviewed Jul 3, 2020

View reviewed changes

gbaned added the comp:core issues related to core part of tensorflow label Jul 3, 2020

tensorflowbutler removed the awaiting review Pull request awaiting review label Jul 5, 2020

jaingaurav approved these changes Jul 6, 2020

View reviewed changes

google-ml-butler Bot added kokoro:force-run Tests on submitted change ready to pull PR ready for merge process labels Jul 6, 2020

kokoro-team removed the kokoro:force-run Tests on submitted change label Jul 6, 2020

tensorflow-copybara merged commit 014cdf6 into tensorflow:master Jul 8, 2020

copybara-service Bot mentioned this pull request Apr 14, 2026

PR #40585: Cross Host Data Transfers: Unify Prepared{Send,Receive} into PreparedTransfer #115915

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add complex64 and complex128 gpu support for tensor_scatter_nd_add#40585

Add complex64 and complex128 gpu support for tensor_scatter_nd_add#40585
tensorflow-copybara merged 1 commit into
tensorflow:masterfrom
yongtang:40577-tensor_scatter_nd_add-gpu

yongtang commented Jun 18, 2020

Uh oh!

jaingaurav left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

Uh oh!

Conversation

yongtang commented Jun 18, 2020

Uh oh!

jaingaurav left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants