Remove use of HAS_GPU from dispatch functions#244
Remove use of HAS_GPU from dispatch functions#244karlhigley merged 9 commits intoNVIDIA-Merlin:mainfrom
dispatch functions#244Conversation
| rmm = None | ||
|
|
||
| if HAS_GPU: | ||
| if cudf: |
There was a problem hiding this comment.
I could be missing something, but seems like cudf would need to imported from merlin.core.compat for this to work
There was a problem hiding this comment.
I missed this commit from moving from the other PR. Updated
| cupy = None | ||
|
|
||
| try: | ||
| import cudf |
There was a problem hiding this comment.
Based on your comment here ("may blow up in CPU-only environments") @karlhigley
This is currently the case if you try to import cudf in our containers without GPUs available, you'll get a Segmentation fault, which the except ImportError can't catch
Exception ignored in: 'cuda._lib.ccudart.utils.cudaPythonGlobal.lazyInitGlobal'
Traceback (most recent call last):
File "cuda/_cuda/ccuda.pyx", line 3671, in cuda._cuda.ccuda._cuInit
File "cuda/_cuda/ccuda.pyx", line 435, in cuda._cuda.ccuda.cuPythonInit
RuntimeError: Failed to dlopen libcuda.so
Segmentation fault (core dumped)
There was a problem hiding this comment.
I was / still am considering wrapping this with a HAS_GPU check, but have been going back and forth about that because it is slightly confusing for this variable to represent both the successful import of cudf and what HAS_GPU represents (the successful import of pynvml and it's initialization).
It's possible to be in a situation where cudf can be imported, but HAS_GPU=False (pynvml not installed or fails to initialize for some reason)
There was a problem hiding this comment.
The segmentation fault was fixed in cuda-python 11.7.1 , however that requires use of cudf 22.12 and above. With that in mind I think it might be ok to keep the import as it is and say that support for CPU-only environments that have cudf installed requries cudf >=22.12 ?
There was a problem hiding this comment.
Yeah, that sounds good. You're way ahead of me (as usual.)
Used in the Target Encoding NVTabular Operator
|
It looks like some of the NVT test failures might actually be related to these changes |
I've updated and resolved these issues. The typehint for And the functions |
…n detected (#236) * Run tests in GPU environment with no GPUs visible * Update TensorTable tests with checks for HAS_GPU * Remove unused `_HAS_GPU` variable from `test_utils` * Wrap cupy/cudf imports in HAS_GPU check in `compat` * Update tests to use HAS_GPU from compat module * Reformat test_tensor_table.py * Move HAS_GPU import to compat module * Add pynvml dependency * Update functions in `dispatch` to not use HAS_GPU * Raise RuntimeError in Dataset if we can't run on GPU when cpu=False * Update `convert_data` to handle unavailable cudf and dask_cudf * Remove use of `HAS_GPU` from dispatch * Keep cudf and cupy values representing presence of package * Revert changes to `dataset.py`. Now part of #243 * Revert changes to `dispatch.py`. Now part of #244 * Use branch-name action for branch selection * Remove unused ref_type variable * Extend reason in `test_tensor_column.py` Co-authored-by: Karl Higley <kmhigley@gmail.com> * Extend reason in `tests/unit/table/test_tensor_column.py` Co-authored-by: Karl Higley <kmhigley@gmail.com> * Remove cudf import from compat. Now unrelated to this PR * Remove use of branch-name action. `docker` not available in runner * Add HAS_GPU checks with cupy to support env without visible devices * Correct value of empty visible devices * Update deps for GPU envs to match others * Update get_lib to account for missing visible GPU * Check HAS_GPU in `make_df` to handle visible GPU devices * Update Dataset to handle default case when no visible GPUs are found * Update fixtures to handle cudf with no visible devices * Update tests to handle case of no visible GPUs --------- Co-authored-by: Karl Higley <kmhigley@gmail.com> Co-authored-by: Karl Higley <karlb@nvidia.com>
Remove use of HAS_GPU from
dispatchfunctions.Following #211 the variable
HAS_GPUno longer includes the information about whether or not cudf is installed. This can result in confusing errors when using the dispatch functions in a GPU enabled environment without cudf available. ('NoneType' object has no attribute 'DataFrame')