Run with import without gpu#261
Conversation
| def _register_tensor_table_from_cudf_df(): | ||
| import cudf | ||
|
|
||
| from merlin.core.compat import cudf |
There was a problem hiding this comment.
This import forced me, via interrogate, to add in comments for all public methods in the file.
There was a problem hiding this comment.
could changing this cause unexpected runtime errors later in the case of cudf being installed but no visible GPUs available?
karlhigley
left a comment
There was a problem hiding this comment.
This seems like a good direction to go in, and I wonder if we can even take it a little farther
The segmentation fault was fixed in cuda-python 11.7.1 , however that requires use of cudf 22.12 and above. we had a discussion about this before here #244 (comment) |
|
Having the |
|
Part of those changes involved adidng checks for One reason we might consider keeping the compat.cudf separate from compat.HAS_GPU is mostly about edge cases and reporting of errors. here for example in the Dataset . And I've seen cases where pynvml fails to initialize for some reason, but cudf is still available. Which may result in some confusion if |
I mean, we cant ever use cudf without having a GPU, so that seems like the correct usage. Do you see a time when we can use cudf without a GPU? |
So if there is ever a time we cannot detect a GPU, we should not import cudf. Chances are it will also fail inside on internal imports in cudf. We should work to make sure that we detect GPUs and dask nvml should not have problems detecting GPUs. I have not seen that. We are no longer using pynvml to detect GPU. Since we are now using nvml from dask we are essentially offloading that device detection to dask. I think that is ok. I would like to get a consensus. Thoughts? |
|
@oliverholworthy in terms of the edge case you outlined Lines 249 to 261 in ec9a360 |
Yeah that's true, aside from the segfault in earlier versions of cudf. you still can't use cudf without a gpu even it it's importable. |
|
I'm onboard with this change to wrap the imports in |
b80e9e2 to
b83b159
Compare
| translator=NumpyPreprocessor("cudf", cudf_translator, attrs=["_categories"]), | ||
| ) | ||
| _dtype_registry.register("cudf", cudf_dtypes) | ||
| except ImportError as exc: |
There was a problem hiding this comment.
do we still need this try/except here? what kind of ImportError can we get now that we're checking cudf?
| if cudf: | ||
| dask_cudf = pytest.importorskip("dask_cudf") | ||
| else: | ||
| pytest.mark.skip(reason="cudf did not import successfully") |
There was a problem hiding this comment.
one intepretation of this is could suggest to someone reading this that it's only the cudf import that matters here. could be worth mentioning that we also need at lesast one visible CUDA device too?
This reverts commit 1011afb.
This reverts commit 1011afb.
This PR enables a user to run our GPU docker container without GPUs and successfully be able to run all tests. Previously we had try catches in a few places in the code for importing GPU specific packages. This PR remediates those try catches because in an environment where you have the package but no GPU you end up getting ccuda.pyx init failure. This is because, the package (i.e. cudf, dask_cudf, rmm) do exist but when they try to access information about GPUs it fails and throws an error something like:
This PR leverages the compat file, and makes it the single point of import for the main packages (cudf, cupy) and it adds a security around it that ensures you can only import those packages if GPUs are detected. So if you find yourself in a scenario where the package is installed but no GPUs are detected you can, now, still safely use the core package. Therefore, we can now run our containers with and without the
--gpus=allflag in docker. This was a customer ask and it helps developers when trying to test cpu only environment on a resource that has GPUs.