Skip to content

[Bug] Using numba to detect gpu availability breaks Dask-CUDA worker pinning #144

@rjzamora

Description

@rjzamora

While attempting to benchmark NVIDIA-Merlin/NVTabular#1687, I discovered that the dask-criteo benchmark does not work with the latest version of NVTabular/Merlin-core.

As far as I can tell, the problem is that #98 added the following logic to detect GPU availability: HAS_GPU = len(cuda.gpus.lst) > 0. This logic works just fine within a local process, but breaks Dask-CUDA device pinning when it is included in a top-level import (or is performed in the global context of the program). In other words, code like this shouldn't be executed by an import statement, like from merlin.core.compat import HAS_GPU.

The problem becomes apparent in a simple (Merlin-free) reproducer:

# reproducer.py
from dask_cuda import LocalCUDACluster
from numba import cuda # This is fine

HAS_GPU = len(cuda.gpus.lst) > 0  # This is not fine

if __name__ == "__main__":
    cluster = LocalCUDACluster()

If you execute python ./reproducer.py, you sill see warnings like:

/.../distributed/distributed/comm/ucx.py:67: UserWarning: Worker with process ID 49507 should have a CUDA context assigned to device 1, but instead the CUDA context is on device 0. This is often the result of a CUDA-enabled library calling a CUDA runtime function before Dask-CUDA can spawn worker processes. Please make sure any such function calls don't happen at import time or in the global scope of a program.

Metadata

Metadata

Assignees

No one assigned

    Labels

    P0bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions