Skip to content

Update imports in dispatch module to handle missing GPU #99

Merged
karlhigley merged 1 commit intoNVIDIA-Merlin:mainfrom
oliverholworthy:dispatch-handle-runtime-error
Jun 8, 2022
Merged

Update imports in dispatch module to handle missing GPU #99
karlhigley merged 1 commit intoNVIDIA-Merlin:mainfrom
oliverholworthy:dispatch-handle-runtime-error

Conversation

@oliverholworthy
Copy link
Contributor

@oliverholworthy oliverholworthy commented Jun 7, 2022

Use HAS_GPU in dispatch module to avoid some imports that raise exceptions. Followup to #98

Motivation

Importing the dispatch module without GPU available (e.g. in a docker container running without gpu configuration) raises an exception.

This module is imported by merlin.systems and makes it more difficult to experiment with triton without GPU features enabled.

import merlin.core.dispatch

Results in the following error (partial error output)

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
File /usr/local/lib/python3.8/dist-packages/cuda/_cuda/ccuda.pyx:3553, in cuda._cuda.ccuda._cuInit()

File /usr/local/lib/python3.8/dist-packages/cuda/_cuda/ccuda.pyx:424, in cuda._cuda.ccuda.cuPythonInit()

RuntimeError: Failed to dlopen libcuda.so
Exception ignored in: 'cuda._lib.ccudart.utils.cudaPythonGlobal.lazyInit'
Traceback (most recent call last):
  File "cuda/_cuda/ccuda.pyx", line 3553, in cuda._cuda.ccuda._cuInit
  File "cuda/_cuda/ccuda.pyx", line 424, in cuda._cuda.ccuda.cuPythonInit
RuntimeError: Failed to dlopen libcuda.so
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
[...]
RuntimeError: Function "cuDeviceGetCount" not found

Implementation Details

The dask_cudf import is the one responsible for the exception. And the RuntimeError raised cannot be caught with a simple try/except.

This PR wraps some of the imports with a check for the HAS_GPU variable from the compat module added in #98

@oliverholworthy oliverholworthy force-pushed the dispatch-handle-runtime-error branch from e9dd7d3 to 8068cfe Compare June 7, 2022 17:49
@oliverholworthy oliverholworthy force-pushed the dispatch-handle-runtime-error branch from 8068cfe to eaf8cae Compare June 7, 2022 17:58
@github-actions
Copy link

github-actions bot commented Jun 7, 2022

Documentation preview

https://nvidia-merlin.github.io/core/review/pr-99

@oliverholworthy oliverholworthy added the bug Something isn't working label Jun 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants