Skip to content

Support Dataset cpu-mode in environment with GPUs that have not been detected #236

Merged
karlhigley merged 33 commits intoNVIDIA-Merlin:mainfrom
oliverholworthy:dataset-support-without-visible-gpu
Mar 22, 2023
Merged

Support Dataset cpu-mode in environment with GPUs that have not been detected #236
karlhigley merged 33 commits intoNVIDIA-Merlin:mainfrom
oliverholworthy:dataset-support-without-visible-gpu

Conversation

@oliverholworthy
Copy link
Contributor

@oliverholworthy oliverholworthy commented Mar 8, 2023

  • Support Dataset cpu-mode in environment with GPUs that have not been detected.

When CUDA visible devices environment variable is empty CUDA_VISIBLE_DEVICES=""

Currently you'll get an error like the following in the case where cudf has imported sucessfully while HAS_GPU is False.

AttributeError: 'NoneType' object has no attribute 'DataFrame'
  • Update TensorTable Tests to run in non-GPU environment
  • Adds workflow to run tests in environment where GPUs are not visible

@oliverholworthy oliverholworthy added the enhancement New feature or request label Mar 8, 2023
@oliverholworthy oliverholworthy self-assigned this Mar 8, 2023
Copy link
Contributor

@karlhigley karlhigley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything outside dispatch.py LGTM, but I have concerns about using the GPU classes there

python -m pytest --cov-report term --cov merlin -rxs tests/unit


[testenv:test-gpu-not-visible]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💯

@oliverholworthy oliverholworthy marked this pull request as ready for review March 13, 2023 14:38
output_col_types: List[Type] = []

if cp:
if cp and HAS_GPU:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way we could put the HAS_GPU in merlin.core.compat so we don't have to repeat it in so many places?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are you suggesting making cp return None if HAS_GPU=False or something else?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so? Can we import CuPy successfully if we don't have a GPU?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that depends on the extent of what "don't have a GPU" means

If CuPy or cudf fail to find relevant cuda drivers they raise an ImportError which we catch in compat. and cp will be None. However, if we set CUDA_VISIBLE_DEVICES= in an environment that has the relevant cuda drivers and libs available then the import is successful, cp will have the value of the cupy module, but if you try to use it e.g. cupy.array(...), then you get a CUDARuntimeError

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible this is a misuse of CUDA_VISIBLE_DEVICES and not a scenario we would like to support.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the latter case, is HAS_GPU false? If so, then seems like we might be able to use it in compat

@karlhigley
Copy link
Contributor

This test environment looks like it breaks some of the Dask cluster tests

@karlhigley karlhigley merged commit 3e3da45 into NVIDIA-Merlin:main Mar 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants