Skip to content

separate cupy import from rapids#211

Merged
karlhigley merged 7 commits intoNVIDIA-Merlin:mainfrom
jperez999:change-dispatch-imports
Feb 13, 2023
Merged

separate cupy import from rapids#211
karlhigley merged 7 commits intoNVIDIA-Merlin:mainfrom
jperez999:change-dispatch-imports

Conversation

@jperez999
Copy link
Collaborator

This PR separates the import of cupy from rapids imports. Given that this package is the base for all other merlin packages. You can find yourself in an environment where cupy is available but cudf/rapids is not. In that case we should not restrict the usage of cupy because cudf/rapids is not available.

@jperez999 jperez999 self-assigned this Feb 6, 2023
@jperez999 jperez999 added enhancement New feature or request clean up labels Feb 6, 2023
@jperez999 jperez999 added this to the Merlin 23.02 milestone Feb 6, 2023
try:
import cupy as cp # type: ignore[no-redef]
except ImportError:
...
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

while functionally equivalent, in a case like this I think the pass statement is a clearer about the intent compared with this ellipsis literal which tends to be used to indicate a placeholder for something.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

replaced with pass

from cudf.utils.dtypes import is_string_dtype as cudf_is_string_dtype

except ImportError:
HAS_GPU = False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing this line will change the meaning of this HAS_GPU variable. Currently it represents a combination both whether or not a GPU is available and whether cudf is installed and available.

And I think this is expected by it's use both here in Core, NVTabular and possibly other places.

We've tried changing this line to pass in #99 , but then subsequently reverted in #112 after various errors started showing up in places where the assumptions about this variable started to breakdown.

I think changing this line or changing the name of this variable would be an improvement, since it's name does not capture the full sense of what it is intended to capture. A clearer name would be something like HAS_GPU_AND_CUDF. However, making this change will require furtther changes to where this variable is referenced across our libraries.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is good insight, but I have not seen this breakdown. And I think it is ok because we have a change for dispatch make_df that checks both HAS_GPU and if cudf is not None. So by breaking it up, it makes it more clear I think. Where to use cudf you have to have both HAS_GPU and cudf needs to be available. Maybe you can show me where it actually trips up. I think in dispatch (and anywhere else we use it) we should check for both HAS_GPU and cudf before using cudf. to do something. We should really not be using cudf directly anywhere since, making a DF or a Series is covered in this dispatch. I have run some secondary tests to check the other libraries and I have not hit any errors. It would be great if you could tell me where you hit the errors that required #112 that would be helpful maybe we can fix those in other parts of the code. But I believe that HAS_GPU, in its current state in merlin.core.compat only detects if a GPU is available. So whether or not you have CUDF, that parameter should always represent if GPUs are available in the environment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One example from this file are these lines where we check this HAS_GPU variable and assume that cudf will be availble if it's True:

https://github.com/NVIDIA-Merlin/core/blob/v0.10.0/merlin/core/dispatch.py#L79-L81

And one other use as you mentioned is the make_df function with the device parameter.

In general the environment we need to check against is where GPUs exist (found by NVML device get count - resulting in the compat HAS_GPU variable being True), and where cudf is not installed. This is a setup we don't currently test against. If we can get all the libraries to work in this environment with this change then we should be ok to make this change.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was an excellent example. I have gone ahead and changed the code to match. Fortunately this is just used for typing and it is not referenced in dispatch or any other part of core. It seems to be used heavily by models and nvtabular for typing. I did a quick search and couldnt find any other locations in merlin where an import of cudf changes HAS_GPU or where they depend on each other. Let me know if you can find anymore.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

clean up enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants