@@ -336,25 +336,68 @@ for(int i = 0; i < N; ++i) {
336336
337337Note that the CUDA device that is current when creating a ` device_memory_resource ` must also be
338338current any time that ` device_memory_resource ` is used to deallocate memory, including in a
339- destructor. This affects RAII classes like ` rmm::device_buffer ` and ` rmm::device_uvector ` . Here's an
340- (incorrect) example that assumes the above example loop has been run to create a
341- ` pool_memory_resource ` for each device. A correct example adds a call to ` cudaSetDevice(0) ` on the
342- line of the error comment.
339+ destructor. The RAII class ` rmm::device_buffer ` and classes that use it as a backing store
340+ (` rmm::device_scalar ` and ` rmm::device_uvector ` ) handle this by storing the active device when the
341+ constructor is called, and then ensuring that the stored device is active whenever an allocation or
342+ deallocation is performed (including in the destructor). The user must therefore only ensure that
343+ the device active during _ creation_ of an ` rmm::device_buffer ` matches the active device of the
344+ memory resource being used.
345+
346+ Here is an incorrect example that creates a memory resource on device zero and then uses it to
347+ allocate a ` device_buffer ` on device one:
343348
344349``` c++
345350{
346351 RMM_CUDA_TRY (cudaSetDevice (0));
347- rmm::device_buffer buf_a(16);
348-
352+ auto mr = rmm::mr::cuda_memory_resource{};
349353 {
350354 RMM_CUDA_TRY(cudaSetDevice(1));
351- rmm::device_buffer buf_b(16);
355+ // Invalid, current device is 1, but MR is only valid for device 0
356+ rmm::device_buffer buf(16, rmm::cuda_stream_default, &mr);
352357 }
358+ }
359+ ```
360+
361+ A correct example creates the device buffer with device zero active. After that it is safe to switch
362+ devices and let the buffer go out of scope and destruct with a different device active. For example,
363+ this code is correct:
364+
365+ ``` c++
366+ {
367+ RMM_CUDA_TRY (cudaSetDevice (0));
368+ auto mr = rmm::mr::cuda_memory_resource{};
369+ rmm::device_buffer buf(16, rmm::cuda_stream_default, &mr);
370+ RMM_CUDA_TRY (cudaSetDevice (1));
371+ ...
372+ // No need to switch back to device 0 before ~ buf runs
373+ }
374+ ```
375+
376+ #### Use of ` rmm::device_vector ` with multiple devices
377+
378+ > [ !CAUTION] In contrast to the uninitialized ` rmm:device_uvector ` , ` rmm::device_vector ` ** DOES
379+ > NOT** store the active device during construction, and therefore cannot arrange for it to be
380+ > active when the destructor runs. It is therefore the responsibility of the user to ensure the
381+ > currently active device is correct.
382+
383+ ` rmm::device_vector ` is therefore slightly less ergonomic to use in a multiple device setting since
384+ the caller must arrange that active devices on allocation and deallocation match. Recapitulating the
385+ previous example using ` rmm::device_vector ` :
353386
354- // Error: when buf_a is destroyed, the current device must be 0, but it is 1
387+ ``` c++
388+ {
389+ RMM_CUDA_TRY (cudaSetDevice (0));
390+ auto mr = rmm::mr::cuda_memory_resource{};
391+ rmm::device_vector<int > vec(16, rmm::mr::thrust_allocator<int >(rmm::cuda_stream_default, &mr));
392+ RMM_CUDA_TRY (cudaSetDevice (1));
393+ ...
394+ // ERROR: ~ vec runs with device 1 active, but needs device 0 to be active
355395}
356396```
357397
398+ A correct example adds a call to ` cudaSetDevice(0) ` on the line of the error comment before the dtor
399+ for ` ~vec ` runs.
400+
358401## ` cuda_stream_view ` and ` cuda_stream `
359402
360403` rmm::cuda_stream_view ` is a simple non-owning wrapper around a CUDA ` cudaStream_t ` . This wrapper's
0 commit comments