Skip to content
Merged
Changes from 6 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 6 additions & 4 deletions merlin/core/dispatch.py
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,6 @@
if HAS_GPU:
try:
import cudf # type: ignore[no-redef]
import cupy as cp # type: ignore[no-redef]
import dask_cudf
import rmm # type: ignore[no-redef]
from cudf.core.column import as_column, build_column
Expand All @@ -47,9 +46,12 @@
# cudf < 21.08
from cudf.utils.dtypes import is_list_dtype as cudf_is_list_dtype
from cudf.utils.dtypes import is_string_dtype as cudf_is_string_dtype

except ImportError:
HAS_GPU = False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing this line will change the meaning of this HAS_GPU variable. Currently it represents a combination both whether or not a GPU is available and whether cudf is installed and available.

And I think this is expected by it's use both here in Core, NVTabular and possibly other places.

We've tried changing this line to pass in #99 , but then subsequently reverted in #112 after various errors started showing up in places where the assumptions about this variable started to breakdown.

I think changing this line or changing the name of this variable would be an improvement, since it's name does not capture the full sense of what it is intended to capture. A clearer name would be something like HAS_GPU_AND_CUDF. However, making this change will require furtther changes to where this variable is referenced across our libraries.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is good insight, but I have not seen this breakdown. And I think it is ok because we have a change for dispatch make_df that checks both HAS_GPU and if cudf is not None. So by breaking it up, it makes it more clear I think. Where to use cudf you have to have both HAS_GPU and cudf needs to be available. Maybe you can show me where it actually trips up. I think in dispatch (and anywhere else we use it) we should check for both HAS_GPU and cudf before using cudf. to do something. We should really not be using cudf directly anywhere since, making a DF or a Series is covered in this dispatch. I have run some secondary tests to check the other libraries and I have not hit any errors. It would be great if you could tell me where you hit the errors that required #112 that would be helpful maybe we can fix those in other parts of the code. But I believe that HAS_GPU, in its current state in merlin.core.compat only detects if a GPU is available. So whether or not you have CUDF, that parameter should always represent if GPUs are available in the environment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One example from this file are these lines where we check this HAS_GPU variable and assume that cudf will be availble if it's True:

https://github.com/NVIDIA-Merlin/core/blob/v0.10.0/merlin/core/dispatch.py#L79-L81

And one other use as you mentioned is the make_df function with the device parameter.

In general the environment we need to check against is where GPUs exist (found by NVML device get count - resulting in the compat HAS_GPU variable being True), and where cudf is not installed. This is a setup we don't currently test against. If we can get all the libraries to work in this environment with this change then we should be ok to make this change.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was an excellent example. I have gone ahead and changed the code to match. Fortunately this is just used for typing and it is not referenced in dispatch or any other part of core. It seems to be used heavily by models and nvtabular for typing. I did a quick search and couldnt find any other locations in merlin where an import of cudf changes HAS_GPU or where they depend on each other. Let me know if you can find anymore.

pass
try:
import cupy as cp # type: ignore[no-redef]
except ImportError:
pass


try:
Expand All @@ -76,7 +78,7 @@ def inner2(*args, **kwargs):
return inner1


if HAS_GPU:
if HAS_GPU and cudf:
DataFrameType = Union[pd.DataFrame, cudf.DataFrame] # type: ignore
SeriesType = Union[pd.Series, cudf.Series] # type: ignore
else:
Expand Down