You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Restore HAS_GPU in merlin.core.dispatch to include the requirement that relevant packages are installed for gpu usage {cudf, cupy, rmm, dask_cudf}. This is to avoid an error if you have a GPU available and don't have the required packages installed
File ~/.virtualenv/evalrs/lib/python3.8/site-packages/merlin/core/dispatch.py:79, in <module>
75 return inner1
78 if HAS_GPU:
---> 79 DataFrameType = Union[pd.DataFrame, cudf.DataFrame] # type: ignore
80 SeriesType = Union[pd.Series, cudf.Series] # type: ignore
81 else:
AttributeError: 'NoneType' object has no attribute 'DataFrame'
merlin.core.compat.HAS_GPU
returns True if numba is installed and at least one GPU is visible to the process.
merlin.core.dispatch.HAS_GPU
returns True if merlin.core.compat.HAS_GPU is True and the following packages are installed: {cudf, cupy, rmm, dask_cudf}
tmpdir = local('/tmp/pytest-of-jenkins/pytest-12/test_dask_dataset_from_datafra4')
origin = 'cudf', cpu = True
@pytest.mark.parametrize("origin", ["cudf", "dask_cudf", "pd", "dd"])
@pytest.mark.parametrize("cpu", [None, True])
def test_dask_dataset_from_dataframe(tmpdir, origin, cpu):
# Generate a DataFrame-based input
if origin in ("pd", "dd"):
df = pd.DataFrame({"a": range(100)})
if origin == "dd":
df = dask.dataframe.from_pandas(df, npartitions=4)
elif origin in ("cudf", "dask_cudf"):
df = cudf.DataFrame({"a": range(100)})
if origin == "dask_cudf":
df = dask_cudf.from_cudf(df, npartitions=4)
# Convert to an NVTabular Dataset and back to a ddf
dataset = merlin.io.Dataset(df, cpu=cpu)
result = dataset.to_ddf()
# Check resulting data
assert_eq(df, result)
# Check that the cpu kwarg is working correctly
if cpu:
assert isinstance(result.compute(), pd.DataFrame)
# Should still work if we move to the GPU
# (test behavior after repetitive conversion)
dataset.to_gpu()
dataset.to_cpu()
dataset.to_cpu()
dataset.to_gpu()
result = dataset.to_ddf()
assert isinstance(result.compute(), cudf.DataFrame)
dataset.to_cpu()
else:
assert isinstance(result.compute(), cudf.DataFrame)
# Should still work if we move to the CPU
# (test behavior after repetitive conversion)
dataset.to_cpu()
dataset.to_gpu()
dataset.to_gpu()
dataset.to_cpu()
result = dataset.to_ddf()
assert isinstance(result.compute(), pd.DataFrame)
dataset.to_gpu()
# Write to disk and read back
path = str(tmpdir)
dataset.to_parquet(path, out_files_per_proc=1, shuffle=None)
/usr/local/lib/python3.8/dist-packages/dask/base.py:288: in compute
(result,) = compute(self, traverse=False, **kwargs)
/usr/local/lib/python3.8/dist-packages/dask/base.py:571: in compute
results = schedule(dsk, keys, **kwargs)
/usr/local/lib/python3.8/dist-packages/dask/local.py:553: in get_sync
return get_async(
/usr/local/lib/python3.8/dist-packages/dask/local.py:496: in get_async
for key, res_info, failed in queue_get(queue).result():
/usr/lib/python3.8/concurrent/futures/_base.py:437: in result
return self.__get_result()
/usr/lib/python3.8/concurrent/futures/_base.py:389: in __get_result
raise self._exception
/usr/local/lib/python3.8/dist-packages/dask/local.py:538: in submit
fut.set_result(fn(*args, **kwargs))
/usr/local/lib/python3.8/dist-packages/dask/local.py:234: in batch_execute_tasks
return [execute_task(a) for a in it]
/usr/local/lib/python3.8/dist-packages/dask/local.py:234: in
return [execute_task(a) for a in it]
/usr/local/lib/python3.8/dist-packages/dask/local.py:225: in execute_task
result = pack_exception(e, dumps)
/usr/local/lib/python3.8/dist-packages/dask/local.py:220: in execute_task
result = _execute_task(task, data)
/usr/local/lib/python3.8/dist-packages/dask/core.py:119: in _execute_task
return func((_execute_task(a, cache) for a in args))
/usr/local/lib/python3.8/dist-packages/dask/optimization.py:969: in call
return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))
/usr/local/lib/python3.8/dist-packages/dask/core.py:149: in get
result = _execute_task(task, cache)
/usr/local/lib/python3.8/dist-packages/dask/core.py:119: in _execute_task
return func((_execute_task(a, cache) for a in args))
/usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/core.py:87: in call
return read_parquet_part(
/usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/core.py:431: in read_parquet_part
dfs = [
/usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/core.py:432: in
func(fs, rg, columns.copy(), index, **toolz.merge(kwargs, kw))
/usr/local/lib/python3.8/dist-packages/dask_cudf/io/parquet.py:216: in read_partition
cls._read_paths(
/usr/local/lib/python3.8/dist-packages/dask_cudf/io/parquet.py:92: in _read_paths
df = cudf.read_parquet(
/usr/local/lib/python3.8/dist-packages/nvtx/nvtx.py:101: in inner
result = func(*args, **kwargs)
/usr/local/lib/python3.8/dist-packages/cudf/io/parquet.py:379: in read_parquet
) = _process_dataset(
/usr/local/lib/python3.8/dist-packages/nvtx/nvtx.py:101: in inner
result = func(*args, **kwargs)
/usr/local/lib/python3.8/dist-packages/cudf/io/parquet.py:205: in _process_dataset
dataset = ds.dataset(
/usr/local/lib/python3.8/dist-packages/pyarrow/dataset.py:683: in dataset
return _filesystem_dataset(source, **kwargs)
/usr/local/lib/python3.8/dist-packages/pyarrow/dataset.py:435: in _filesystem_dataset
return factory.finish(schema)
pyarrow/_dataset.pyx:2473: in pyarrow._dataset.DatasetFactory.finish
???
pyarrow/error.pxi:143: in pyarrow.lib.pyarrow_internal_check_status
???
???
E pyarrow.lib.ArrowInvalid: Error creating dataset. Could not read schema from '/tmp/pytest-of-jenkins/pytest-12/test_dask_dataset_from_datafra4/part_0.parquet': Could not open Parquet input source '/tmp/pytest-of-jenkins/pytest-12/test_dask_dataset_from_datafra4/part_0.parquet': Parquet file size is 0 bytes. Is this a 'parquet' file?
tmpdir = local('/tmp/pytest-of-jenkins/pytest-12/test_dask_dataset_from_datafra5')
origin = 'dask_cudf', cpu = True
@pytest.mark.parametrize("origin", ["cudf", "dask_cudf", "pd", "dd"])
@pytest.mark.parametrize("cpu", [None, True])
def test_dask_dataset_from_dataframe(tmpdir, origin, cpu):
# Generate a DataFrame-based input
if origin in ("pd", "dd"):
df = pd.DataFrame({"a": range(100)})
if origin == "dd":
df = dask.dataframe.from_pandas(df, npartitions=4)
elif origin in ("cudf", "dask_cudf"):
df = cudf.DataFrame({"a": range(100)})
if origin == "dask_cudf":
df = dask_cudf.from_cudf(df, npartitions=4)
# Convert to an NVTabular Dataset and back to a ddf
dataset = merlin.io.Dataset(df, cpu=cpu)
result = dataset.to_ddf()
# Check resulting data
assert_eq(df, result)
# Check that the cpu kwarg is working correctly
if cpu:
assert isinstance(result.compute(), pd.DataFrame)
# Should still work if we move to the GPU
# (test behavior after repetitive conversion)
dataset.to_gpu()
dataset.to_cpu()
dataset.to_cpu()
dataset.to_gpu()
result = dataset.to_ddf()
assert isinstance(result.compute(), cudf.DataFrame)
dataset.to_cpu()
else:
assert isinstance(result.compute(), cudf.DataFrame)
# Should still work if we move to the CPU
# (test behavior after repetitive conversion)
dataset.to_cpu()
dataset.to_gpu()
dataset.to_gpu()
dataset.to_cpu()
result = dataset.to_ddf()
assert isinstance(result.compute(), pd.DataFrame)
dataset.to_gpu()
# Write to disk and read back
path = str(tmpdir)
dataset.to_parquet(path, out_files_per_proc=1, shuffle=None)
/usr/local/lib/python3.8/dist-packages/dask/base.py:288: in compute
(result,) = compute(self, traverse=False, **kwargs)
/usr/local/lib/python3.8/dist-packages/dask/base.py:571: in compute
results = schedule(dsk, keys, **kwargs)
/usr/local/lib/python3.8/dist-packages/dask/local.py:553: in get_sync
return get_async(
/usr/local/lib/python3.8/dist-packages/dask/local.py:496: in get_async
for key, res_info, failed in queue_get(queue).result():
/usr/lib/python3.8/concurrent/futures/_base.py:437: in result
return self.__get_result()
/usr/lib/python3.8/concurrent/futures/_base.py:389: in __get_result
raise self._exception
/usr/local/lib/python3.8/dist-packages/dask/local.py:538: in submit
fut.set_result(fn(*args, **kwargs))
/usr/local/lib/python3.8/dist-packages/dask/local.py:234: in batch_execute_tasks
return [execute_task(a) for a in it]
/usr/local/lib/python3.8/dist-packages/dask/local.py:234: in
return [execute_task(a) for a in it]
/usr/local/lib/python3.8/dist-packages/dask/local.py:225: in execute_task
result = pack_exception(e, dumps)
/usr/local/lib/python3.8/dist-packages/dask/local.py:220: in execute_task
result = _execute_task(task, data)
/usr/local/lib/python3.8/dist-packages/dask/core.py:119: in _execute_task
return func((_execute_task(a, cache) for a in args))
/usr/local/lib/python3.8/dist-packages/dask/optimization.py:969: in call
return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))
/usr/local/lib/python3.8/dist-packages/dask/core.py:149: in get
result = _execute_task(task, cache)
/usr/local/lib/python3.8/dist-packages/dask/core.py:119: in _execute_task
return func((_execute_task(a, cache) for a in args))
/usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/core.py:87: in call
return read_parquet_part(
/usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/core.py:431: in read_parquet_part
dfs = [
/usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/core.py:432: in
func(fs, rg, columns.copy(), index, **toolz.merge(kwargs, kw))
/usr/local/lib/python3.8/dist-packages/dask_cudf/io/parquet.py:216: in read_partition
cls._read_paths(
/usr/local/lib/python3.8/dist-packages/dask_cudf/io/parquet.py:92: in _read_paths
df = cudf.read_parquet(
/usr/local/lib/python3.8/dist-packages/nvtx/nvtx.py:101: in inner
result = func(*args, **kwargs)
/usr/local/lib/python3.8/dist-packages/cudf/io/parquet.py:379: in read_parquet
) = _process_dataset(
/usr/local/lib/python3.8/dist-packages/nvtx/nvtx.py:101: in inner
result = func(*args, **kwargs)
/usr/local/lib/python3.8/dist-packages/cudf/io/parquet.py:205: in _process_dataset
dataset = ds.dataset(
/usr/local/lib/python3.8/dist-packages/pyarrow/dataset.py:683: in dataset
return _filesystem_dataset(source, **kwargs)
/usr/local/lib/python3.8/dist-packages/pyarrow/dataset.py:435: in _filesystem_dataset
return factory.finish(schema)
pyarrow/_dataset.pyx:2473: in pyarrow._dataset.DatasetFactory.finish
???
pyarrow/error.pxi:143: in pyarrow.lib.pyarrow_internal_check_status
???
???
E pyarrow.lib.ArrowInvalid: Error creating dataset. Could not read schema from '/tmp/pytest-of-jenkins/pytest-12/test_dask_dataset_from_datafra5/part_0.parquet': Could not open Parquet input source '/tmp/pytest-of-jenkins/pytest-12/test_dask_dataset_from_datafra5/part_0.parquet': Parquet file size is 0 bytes. Is this a 'parquet' file?
tmpdir = local('/tmp/pytest-of-jenkins/pytest-12/test_dask_dataset_from_datafra6')
origin = 'pd', cpu = True
@pytest.mark.parametrize("origin", ["cudf", "dask_cudf", "pd", "dd"])
@pytest.mark.parametrize("cpu", [None, True])
def test_dask_dataset_from_dataframe(tmpdir, origin, cpu):
# Generate a DataFrame-based input
if origin in ("pd", "dd"):
df = pd.DataFrame({"a": range(100)})
if origin == "dd":
df = dask.dataframe.from_pandas(df, npartitions=4)
elif origin in ("cudf", "dask_cudf"):
df = cudf.DataFrame({"a": range(100)})
if origin == "dask_cudf":
df = dask_cudf.from_cudf(df, npartitions=4)
# Convert to an NVTabular Dataset and back to a ddf
dataset = merlin.io.Dataset(df, cpu=cpu)
result = dataset.to_ddf()
# Check resulting data
assert_eq(df, result)
# Check that the cpu kwarg is working correctly
if cpu:
assert isinstance(result.compute(), pd.DataFrame)
# Should still work if we move to the GPU
# (test behavior after repetitive conversion)
dataset.to_gpu()
dataset.to_cpu()
dataset.to_cpu()
dataset.to_gpu()
result = dataset.to_ddf()
assert isinstance(result.compute(), cudf.DataFrame)
dataset.to_cpu()
else:
assert isinstance(result.compute(), cudf.DataFrame)
# Should still work if we move to the CPU
# (test behavior after repetitive conversion)
dataset.to_cpu()
dataset.to_gpu()
dataset.to_gpu()
dataset.to_cpu()
result = dataset.to_ddf()
assert isinstance(result.compute(), pd.DataFrame)
dataset.to_gpu()
# Write to disk and read back
path = str(tmpdir)
dataset.to_parquet(path, out_files_per_proc=1, shuffle=None)
/usr/local/lib/python3.8/dist-packages/dask/base.py:288: in compute
(result,) = compute(self, traverse=False, **kwargs)
/usr/local/lib/python3.8/dist-packages/dask/base.py:571: in compute
results = schedule(dsk, keys, **kwargs)
/usr/local/lib/python3.8/dist-packages/dask/local.py:553: in get_sync
return get_async(
/usr/local/lib/python3.8/dist-packages/dask/local.py:496: in get_async
for key, res_info, failed in queue_get(queue).result():
/usr/lib/python3.8/concurrent/futures/_base.py:437: in result
return self.__get_result()
/usr/lib/python3.8/concurrent/futures/_base.py:389: in __get_result
raise self._exception
/usr/local/lib/python3.8/dist-packages/dask/local.py:538: in submit
fut.set_result(fn(*args, **kwargs))
/usr/local/lib/python3.8/dist-packages/dask/local.py:234: in batch_execute_tasks
return [execute_task(a) for a in it]
/usr/local/lib/python3.8/dist-packages/dask/local.py:234: in
return [execute_task(a) for a in it]
/usr/local/lib/python3.8/dist-packages/dask/local.py:225: in execute_task
result = pack_exception(e, dumps)
/usr/local/lib/python3.8/dist-packages/dask/local.py:220: in execute_task
result = _execute_task(task, data)
/usr/local/lib/python3.8/dist-packages/dask/core.py:119: in _execute_task
return func((_execute_task(a, cache) for a in args))
/usr/local/lib/python3.8/dist-packages/dask/optimization.py:969: in call
return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))
/usr/local/lib/python3.8/dist-packages/dask/core.py:149: in get
result = _execute_task(task, cache)
/usr/local/lib/python3.8/dist-packages/dask/core.py:119: in _execute_task
return func((_execute_task(a, cache) for a in args))
/usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/core.py:87: in call
return read_parquet_part(
/usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/core.py:431: in read_parquet_part
dfs = [
/usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/core.py:432: in
func(fs, rg, columns.copy(), index, **toolz.merge(kwargs, kw))
/usr/local/lib/python3.8/dist-packages/dask_cudf/io/parquet.py:216: in read_partition
cls._read_paths(
/usr/local/lib/python3.8/dist-packages/dask_cudf/io/parquet.py:92: in _read_paths
df = cudf.read_parquet(
/usr/local/lib/python3.8/dist-packages/nvtx/nvtx.py:101: in inner
result = func(*args, **kwargs)
/usr/local/lib/python3.8/dist-packages/cudf/io/parquet.py:379: in read_parquet
) = _process_dataset(
/usr/local/lib/python3.8/dist-packages/nvtx/nvtx.py:101: in inner
result = func(*args, **kwargs)
/usr/local/lib/python3.8/dist-packages/cudf/io/parquet.py:205: in _process_dataset
dataset = ds.dataset(
/usr/local/lib/python3.8/dist-packages/pyarrow/dataset.py:683: in dataset
return _filesystem_dataset(source, **kwargs)
/usr/local/lib/python3.8/dist-packages/pyarrow/dataset.py:435: in _filesystem_dataset
return factory.finish(schema)
pyarrow/_dataset.pyx:2473: in pyarrow._dataset.DatasetFactory.finish
???
pyarrow/error.pxi:143: in pyarrow.lib.pyarrow_internal_check_status
???
???
E pyarrow.lib.ArrowInvalid: Error creating dataset. Could not read schema from '/tmp/pytest-of-jenkins/pytest-12/test_dask_dataset_from_datafra6/part_0.parquet': Could not open Parquet input source '/tmp/pytest-of-jenkins/pytest-12/test_dask_dataset_from_datafra6/part_0.parquet': Parquet file size is 0 bytes. Is this a 'parquet' file?
tmpdir = local('/tmp/pytest-of-jenkins/pytest-12/test_dask_dataset_from_datafra7')
origin = 'dd', cpu = True
@pytest.mark.parametrize("origin", ["cudf", "dask_cudf", "pd", "dd"])
@pytest.mark.parametrize("cpu", [None, True])
def test_dask_dataset_from_dataframe(tmpdir, origin, cpu):
# Generate a DataFrame-based input
if origin in ("pd", "dd"):
df = pd.DataFrame({"a": range(100)})
if origin == "dd":
df = dask.dataframe.from_pandas(df, npartitions=4)
elif origin in ("cudf", "dask_cudf"):
df = cudf.DataFrame({"a": range(100)})
if origin == "dask_cudf":
df = dask_cudf.from_cudf(df, npartitions=4)
# Convert to an NVTabular Dataset and back to a ddf
dataset = merlin.io.Dataset(df, cpu=cpu)
result = dataset.to_ddf()
# Check resulting data
assert_eq(df, result)
# Check that the cpu kwarg is working correctly
if cpu:
assert isinstance(result.compute(), pd.DataFrame)
# Should still work if we move to the GPU
# (test behavior after repetitive conversion)
dataset.to_gpu()
dataset.to_cpu()
dataset.to_cpu()
dataset.to_gpu()
result = dataset.to_ddf()
assert isinstance(result.compute(), cudf.DataFrame)
dataset.to_cpu()
else:
assert isinstance(result.compute(), cudf.DataFrame)
# Should still work if we move to the CPU
# (test behavior after repetitive conversion)
dataset.to_cpu()
dataset.to_gpu()
dataset.to_gpu()
dataset.to_cpu()
result = dataset.to_ddf()
assert isinstance(result.compute(), pd.DataFrame)
dataset.to_gpu()
# Write to disk and read back
path = str(tmpdir)
dataset.to_parquet(path, out_files_per_proc=1, shuffle=None)
/usr/local/lib/python3.8/dist-packages/dask/base.py:288: in compute
(result,) = compute(self, traverse=False, **kwargs)
/usr/local/lib/python3.8/dist-packages/dask/base.py:571: in compute
results = schedule(dsk, keys, **kwargs)
/usr/local/lib/python3.8/dist-packages/dask/local.py:553: in get_sync
return get_async(
/usr/local/lib/python3.8/dist-packages/dask/local.py:496: in get_async
for key, res_info, failed in queue_get(queue).result():
/usr/lib/python3.8/concurrent/futures/_base.py:437: in result
return self.__get_result()
/usr/lib/python3.8/concurrent/futures/_base.py:389: in __get_result
raise self._exception
/usr/local/lib/python3.8/dist-packages/dask/local.py:538: in submit
fut.set_result(fn(*args, **kwargs))
/usr/local/lib/python3.8/dist-packages/dask/local.py:234: in batch_execute_tasks
return [execute_task(a) for a in it]
/usr/local/lib/python3.8/dist-packages/dask/local.py:234: in
return [execute_task(a) for a in it]
/usr/local/lib/python3.8/dist-packages/dask/local.py:225: in execute_task
result = pack_exception(e, dumps)
/usr/local/lib/python3.8/dist-packages/dask/local.py:220: in execute_task
result = _execute_task(task, data)
/usr/local/lib/python3.8/dist-packages/dask/core.py:119: in _execute_task
return func((_execute_task(a, cache) for a in args))
/usr/local/lib/python3.8/dist-packages/dask/optimization.py:969: in call
return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))
/usr/local/lib/python3.8/dist-packages/dask/core.py:149: in get
result = _execute_task(task, cache)
/usr/local/lib/python3.8/dist-packages/dask/core.py:119: in _execute_task
return func((_execute_task(a, cache) for a in args))
/usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/core.py:87: in call
return read_parquet_part(
/usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/core.py:431: in read_parquet_part
dfs = [
/usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/core.py:432: in
func(fs, rg, columns.copy(), index, **toolz.merge(kwargs, kw))
/usr/local/lib/python3.8/dist-packages/dask_cudf/io/parquet.py:216: in read_partition
cls._read_paths(
/usr/local/lib/python3.8/dist-packages/dask_cudf/io/parquet.py:92: in _read_paths
df = cudf.read_parquet(
/usr/local/lib/python3.8/dist-packages/nvtx/nvtx.py:101: in inner
result = func(*args, **kwargs)
/usr/local/lib/python3.8/dist-packages/cudf/io/parquet.py:379: in read_parquet
) = _process_dataset(
/usr/local/lib/python3.8/dist-packages/nvtx/nvtx.py:101: in inner
result = func(*args, **kwargs)
/usr/local/lib/python3.8/dist-packages/cudf/io/parquet.py:205: in _process_dataset
dataset = ds.dataset(
/usr/local/lib/python3.8/dist-packages/pyarrow/dataset.py:683: in dataset
return _filesystem_dataset(source, **kwargs)
/usr/local/lib/python3.8/dist-packages/pyarrow/dataset.py:435: in _filesystem_dataset
return factory.finish(schema)
pyarrow/_dataset.pyx:2473: in pyarrow._dataset.DatasetFactory.finish
???
pyarrow/error.pxi:143: in pyarrow.lib.pyarrow_internal_check_status
???
???
E pyarrow.lib.ArrowInvalid: Error creating dataset. Could not read schema from '/tmp/pytest-of-jenkins/pytest-12/test_dask_dataset_from_datafra7/part_0.parquet': Could not open Parquet input source '/tmp/pytest-of-jenkins/pytest-12/test_dask_dataset_from_datafra7/part_0.parquet': Parquet file size is 0 bytes. Is this a 'parquet' file?
pyarrow/error.pxi:99: ArrowInvalid
=============================== warnings summary ===============================
tests/unit/dag/test_base_operator.py: 4 warnings
tests/unit/io/test_io.py: 71 warnings
/usr/local/lib/python3.8/dist-packages/cudf/core/frame.py:384: UserWarning: The deep parameter is ignored and is only included for pandas compatibility.
warnings.warn(
tests/unit/io/test_io.py::test_validate_and_regenerate_dataset
/var/jenkins_home/workspace/merlin_core/core/merlin/io/parquet.py:551: DeprecationWarning: 'ParquetDataset.pieces' attribute is deprecated as of pyarrow 5.0.0 and will be removed in a future version. Specify 'use_legacy_dataset=False' while constructing the ParquetDataset, and then use the '.fragments' attribute instead.
paths = [p.path for p in pa_dataset.pieces]
tests/unit/utils/test_utils.py::test_nvt_distributed[True-True]
/usr/local/lib/python3.8/dist-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 40139 instead
warnings.warn(
tests/unit/utils/test_utils.py::test_nvt_distributed[True-False]
/usr/local/lib/python3.8/dist-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 40105 instead
warnings.warn(
tests/unit/utils/test_utils.py::test_nvt_distributed[False-True]
/usr/local/lib/python3.8/dist-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 35611 instead
warnings.warn(
tests/unit/utils/test_utils.py::test_nvt_distributed[False-False]
/usr/local/lib/python3.8/dist-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 34703 instead
warnings.warn(
tests/unit/utils/test_utils.py::test_nvt_distributed_force[True]
/usr/local/lib/python3.8/dist-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 45457 instead
warnings.warn(
tests/unit/utils/test_utils.py::test_nvt_distributed_force[False]
/usr/local/lib/python3.8/dist-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 43757 instead
warnings.warn(
-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
=========================== short test summary info ============================
FAILED tests/unit/io/test_io.py::test_dask_dataset_from_dataframe[True-cudf]
FAILED tests/unit/io/test_io.py::test_dask_dataset_from_dataframe[True-dask_cudf]
FAILED tests/unit/io/test_io.py::test_dask_dataset_from_dataframe[True-pd] - ...
FAILED tests/unit/io/test_io.py::test_dask_dataset_from_dataframe[True-dd] - ...
============ 4 failed, 339 passed, 1 skipped, 82 warnings in 52.11s ============
Build step 'Execute shell' marked build as failure
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/core/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[merlin_core] $ /bin/bash /tmp/jenkins4290429288741292030.sh
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Follow up to #99
Restore
HAS_GPUinmerlin.core.dispatchto include the requirement that relevant packages are installed for gpu usage {cudf,cupy,rmm,dask_cudf}. This is to avoid an error if you have a GPU available and don't have the required packages installedmerlin.core.compat.HAS_GPUreturns True if
numbais installed and at least one GPU is visible to the process.merlin.core.dispatch.HAS_GPUreturns True if
merlin.core.compat.HAS_GPUis True and the following packages are installed: {cudf,cupy,rmm,dask_cudf}