separate cupy import from rapids#211
Conversation
merlin/core/dispatch.py
Outdated
| try: | ||
| import cupy as cp # type: ignore[no-redef] | ||
| except ImportError: | ||
| ... |
There was a problem hiding this comment.
while functionally equivalent, in a case like this I think the pass statement is a clearer about the intent compared with this ellipsis literal which tends to be used to indicate a placeholder for something.
There was a problem hiding this comment.
replaced with pass
| from cudf.utils.dtypes import is_string_dtype as cudf_is_string_dtype | ||
|
|
||
| except ImportError: | ||
| HAS_GPU = False |
There was a problem hiding this comment.
Removing this line will change the meaning of this HAS_GPU variable. Currently it represents a combination both whether or not a GPU is available and whether cudf is installed and available.
And I think this is expected by it's use both here in Core, NVTabular and possibly other places.
We've tried changing this line to pass in #99 , but then subsequently reverted in #112 after various errors started showing up in places where the assumptions about this variable started to breakdown.
I think changing this line or changing the name of this variable would be an improvement, since it's name does not capture the full sense of what it is intended to capture. A clearer name would be something like HAS_GPU_AND_CUDF. However, making this change will require furtther changes to where this variable is referenced across our libraries.
There was a problem hiding this comment.
This is good insight, but I have not seen this breakdown. And I think it is ok because we have a change for dispatch make_df that checks both HAS_GPU and if cudf is not None. So by breaking it up, it makes it more clear I think. Where to use cudf you have to have both HAS_GPU and cudf needs to be available. Maybe you can show me where it actually trips up. I think in dispatch (and anywhere else we use it) we should check for both HAS_GPU and cudf before using cudf. to do something. We should really not be using cudf directly anywhere since, making a DF or a Series is covered in this dispatch. I have run some secondary tests to check the other libraries and I have not hit any errors. It would be great if you could tell me where you hit the errors that required #112 that would be helpful maybe we can fix those in other parts of the code. But I believe that HAS_GPU, in its current state in merlin.core.compat only detects if a GPU is available. So whether or not you have CUDF, that parameter should always represent if GPUs are available in the environment.
There was a problem hiding this comment.
One example from this file are these lines where we check this HAS_GPU variable and assume that cudf will be availble if it's True:
https://github.com/NVIDIA-Merlin/core/blob/v0.10.0/merlin/core/dispatch.py#L79-L81
And one other use as you mentioned is the make_df function with the device parameter.
In general the environment we need to check against is where GPUs exist (found by NVML device get count - resulting in the compat HAS_GPU variable being True), and where cudf is not installed. This is a setup we don't currently test against. If we can get all the libraries to work in this environment with this change then we should be ok to make this change.
There was a problem hiding this comment.
That was an excellent example. I have gone ahead and changed the code to match. Fortunately this is just used for typing and it is not referenced in dispatch or any other part of core. It seems to be used heavily by models and nvtabular for typing. I did a quick search and couldnt find any other locations in merlin where an import of cudf changes HAS_GPU or where they depend on each other. Let me know if you can find anymore.
This PR separates the import of cupy from rapids imports. Given that this package is the base for all other merlin packages. You can find yourself in an environment where cupy is available but cudf/rapids is not. In that case we should not restrict the usage of cupy because cudf/rapids is not available.