Remove use of HAS_GPU from `dispatch` functions by oliverholworthy · Pull Request #244 · NVIDIA-Merlin/core

oliverholworthy · 2023-03-13T13:44:58Z

Remove use of HAS_GPU from dispatch functions.

Following #211 the variable HAS_GPU no longer includes the information about whether or not cudf is installed. This can result in confusing errors when using the dispatch functions in a GPU enabled environment without cudf available. ('NoneType' object has no attribute 'DataFrame')

karlhigley · 2023-03-13T13:51:51Z

merlin/core/dispatch.py

 rmm = None

-if HAS_GPU:
+if cudf:


I could be missing something, but seems like cudf would need to imported from merlin.core.compat for this to work

I missed this commit from moving from the other PR. Updated

merlin/core/dispatch.py

oliverholworthy · 2023-03-13T14:47:21Z

merlin/core/compat.py

    cupy = None

+try:
+    import cudf


Based on your comment here ("may blow up in CPU-only environments") @karlhigley

This is currently the case if you try to import cudf in our containers without GPUs available, you'll get a Segmentation fault, which the except ImportError can't catch

Exception ignored in: 'cuda._lib.ccudart.utils.cudaPythonGlobal.lazyInitGlobal' Traceback (most recent call last): File "cuda/_cuda/ccuda.pyx", line 3671, in cuda._cuda.ccuda._cuInit File "cuda/_cuda/ccuda.pyx", line 435, in cuda._cuda.ccuda.cuPythonInit RuntimeError: Failed to dlopen libcuda.so Segmentation fault (core dumped)

I was / still am considering wrapping this with a HAS_GPU check, but have been going back and forth about that because it is slightly confusing for this variable to represent both the successful import of cudf and what HAS_GPU represents (the successful import of pynvml and it's initialization).

It's possible to be in a situation where cudf can be imported, but HAS_GPU=False (pynvml not installed or fails to initialize for some reason)

The segmentation fault was fixed in cuda-python 11.7.1 , however that requires use of cudf 22.12 and above. With that in mind I think it might be ok to keep the import as it is and say that support for CPU-only environments that have cudf installed requries cudf >=22.12 ?

Yeah, that sounds good. You're way ahead of me (as usual.)

Used in the Target Encoding NVTabular Operator

karlhigley · 2023-03-16T15:35:27Z

It looks like some of the NVT test failures might actually be related to these changes

oliverholworthy · 2023-03-17T17:09:46Z

It looks like some of the NVT test failures might actually be related to these changes

I've updated and resolved these issues. The typehint for read_dispatch suggested that it might only accept a dataframe type, but is also called with strings. Updated to handle that case.

And the functions {random_state, arange, array, zeros} all had the same pattern. Updated to handle RangeIndex type passed as like_df and the base case where like_df is None and we're simply picking between cupy and numpy. The code is more verbose, but is clearer about what it's doing now. Hopefully a further refactor later will remove the need for these altogether

…n detected (#236) * Run tests in GPU environment with no GPUs visible * Update TensorTable tests with checks for HAS_GPU * Remove unused `_HAS_GPU` variable from `test_utils` * Wrap cupy/cudf imports in HAS_GPU check in `compat` * Update tests to use HAS_GPU from compat module * Reformat test_tensor_table.py * Move HAS_GPU import to compat module * Add pynvml dependency * Update functions in `dispatch` to not use HAS_GPU * Raise RuntimeError in Dataset if we can't run on GPU when cpu=False * Update `convert_data` to handle unavailable cudf and dask_cudf * Remove use of `HAS_GPU` from dispatch * Keep cudf and cupy values representing presence of package * Revert changes to `dataset.py`. Now part of #243 * Revert changes to `dispatch.py`. Now part of #244 * Use branch-name action for branch selection * Remove unused ref_type variable * Extend reason in `test_tensor_column.py` Co-authored-by: Karl Higley <kmhigley@gmail.com> * Extend reason in `tests/unit/table/test_tensor_column.py` Co-authored-by: Karl Higley <kmhigley@gmail.com> * Remove cudf import from compat. Now unrelated to this PR * Remove use of branch-name action. `docker` not available in runner * Add HAS_GPU checks with cupy to support env without visible devices * Correct value of empty visible devices * Update deps for GPU envs to match others * Update get_lib to account for missing visible GPU * Check HAS_GPU in `make_df` to handle visible GPU devices * Update Dataset to handle default case when no visible GPUs are found * Update fixtures to handle cudf with no visible devices * Update tests to handle case of no visible GPUs --------- Co-authored-by: Karl Higley <kmhigley@gmail.com> Co-authored-by: Karl Higley <karlb@nvidia.com>

Remove use of HAS_GPU from dispatch functions

7cfdf61

oliverholworthy added the chore Maintenance for the repository label Mar 13, 2023

oliverholworthy self-assigned this Mar 13, 2023

oliverholworthy added a commit to oliverholworthy/core that referenced this pull request Mar 13, 2023

Revert changes to dispatch.py. Now part of NVIDIA-Merlin#244

bab54d7

karlhigley reviewed Mar 13, 2023

View reviewed changes

oliverholworthy mentioned this pull request Mar 13, 2023

Support Dataset cpu-mode in environment with GPUs that have not been detected #236

Merged

karlhigley reviewed Mar 13, 2023

View reviewed changes

merlin/core/dispatch.py Outdated Show resolved Hide resolved

Add cudf import to compat module

8fc0a35

karlhigley approved these changes Mar 13, 2023

View reviewed changes

oliverholworthy commented Mar 13, 2023

View reviewed changes

oliverholworthy and others added 3 commits March 13, 2023 15:46

Correct return to raise for ValueError

845dd64

Check for RangeIndex in random_state

0585708

Used in the Target Encoding NVTabular Operator

Merge branch 'main' into dispatch-remove-has-gpu

25d8d78

oliverholworthy mentioned this pull request Mar 16, 2023

Improved Performance for Fixed List Columns NVIDIA-Merlin/dataloader#97

Closed

Merge branch 'main' into dispatch-remove-has-gpu

f8120ac

karlhigley and others added 3 commits March 17, 2023 12:10

Merge branch 'main' into dispatch-remove-has-gpu

981b5ff

Update {arange, array, zeros, and random_state} to handle base case

38343da

Update read_dispatch to handle string argument

a38ca00

karlhigley merged commit 20bf15f into NVIDIA-Merlin:main Mar 17, 2023

This was referenced Mar 20, 2023

Update is_list_dtype to handle additional types #250

Merged

Support cupy and numpy array types in flatten_list_column_values #251

Merged

oliverholworthy mentioned this pull request Mar 29, 2023

Run with import without gpu #261

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove use of HAS_GPU from `dispatch` functions#244

Remove use of HAS_GPU from `dispatch` functions#244
karlhigley merged 9 commits intoNVIDIA-Merlin:mainfrom
oliverholworthy:dispatch-remove-has-gpu

oliverholworthy commented Mar 13, 2023

Uh oh!

karlhigley Mar 13, 2023

Uh oh!

oliverholworthy Mar 13, 2023

Uh oh!

Uh oh!

oliverholworthy Mar 13, 2023

Uh oh!

oliverholworthy Mar 13, 2023

Uh oh!

oliverholworthy Mar 13, 2023

Uh oh!

karlhigley Mar 13, 2023

Uh oh!

karlhigley commented Mar 16, 2023

Uh oh!

oliverholworthy commented Mar 17, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

oliverholworthy commented Mar 13, 2023

Uh oh!

karlhigley Mar 13, 2023

Choose a reason for hiding this comment

Uh oh!

oliverholworthy Mar 13, 2023

Choose a reason for hiding this comment

Uh oh!

Uh oh!

oliverholworthy Mar 13, 2023

Choose a reason for hiding this comment

Uh oh!

oliverholworthy Mar 13, 2023

Choose a reason for hiding this comment

Uh oh!

oliverholworthy Mar 13, 2023

Choose a reason for hiding this comment

Uh oh!

karlhigley Mar 13, 2023

Choose a reason for hiding this comment

Uh oh!

karlhigley commented Mar 16, 2023

Uh oh!

oliverholworthy commented Mar 17, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants