Adds fail_on_nonconvergence option to pagerank to provide pagerank results even on non-convergence#3639
Conversation
…agerank call to not converge yet still return a result with an additional flag indicating if the results converged or not.
error_on_nonconvergence option to pagerank to provide pagerank results even on non-convergencefail_on_nonconvergence option to pagerank to provide pagerank results even on non-convergence
…edToConvergeError exception type, adds tests for MG pagerank and personalization options.
…hub.com:rlratzel/cugraph into branch-23.08-python_pagerank_convergence_option
…_algorithms.pxd, adds exceptions module to PLC, remaining updates to PLC and cugraph code for initial passing tests.
…converged bool separately.
…8-python_pagerank_convergence_option
VibhuJawa
left a comment
There was a problem hiding this comment.
Approving from the Python/Dask cugraph layer.
…hub.com:rlratzel/cugraph into branch-23.08-python_pagerank_convergence_option
seunghwak
left a comment
There was a problem hiding this comment.
LGTM except for one additional complaint.
cpp/include/cugraph/algorithms.hpp
Outdated
| raft::handle_t const& handle, | ||
| graph_view_t<vertex_t, edge_t, true, multi_gpu> const& graph_view, | ||
| std::optional<edge_property_view_t<edge_t, weight_t const*>> edge_weight_view, | ||
| std::optional<weight_t const*> precomputed_vertex_out_weight_sums, |
There was a problem hiding this comment.
Shouldn't this better be std::optional<device_span<>>?
There was a problem hiding this comment.
Nevermind. Addressed in this PR since we had to make other changes.
eriknw
left a comment
There was a problem hiding this comment.
For user-facing API, I wonder whether fail_on_nonconvergence is the clearest and most convenient:
pagerank(..., max_iter=3, fail_on_nonconvergence=False)I think I would prefer a more direct, affirmative argument, such as:
pagerank(..., num_iter=3)| result_tuples = [ | ||
| client.submit(convert_to_return_tuple, cp_arrays) for cp_arrays in result | ||
| ] | ||
|
|
||
| wait(cudf_result) | ||
| # Convert the futures to dask delayed objects so the tuples can be | ||
| # split. nout=2 is passed since each tuple/iterable is a fixed length of 2. | ||
| result_tuples = [dask.delayed(r, nout=2) for r in result_tuples] | ||
|
|
||
| # Create the ddf and get the converged bool from the delayed objs. Use a | ||
| # meta DataFrame to pass the expected dtypes for the DataFrame to prevent | ||
| # another compute to determine them automatically. | ||
| meta = cudf.DataFrame(columns=["vertex", "pagerank"]) | ||
| meta = meta.astype({"pagerank": "float64", "vertex": vertex_dtype}) | ||
| ddf = dask_cudf.from_delayed([t[0] for t in result_tuples], meta=meta).persist() | ||
| converged = all(dask.compute(*[t[1] for t in result_tuples])) |
There was a problem hiding this comment.
An alternative implementation to this could be something like:
import operator as op
...
result_tuples = client.map(convert_to_return_tuple, cp_arrays)
meta = cudf.DataFrame(columns=["vertex", "pagerank"])
meta = meta.astype({"pagerank": "float64", "vertex": vertex_dtype})
ddf = dask_cudf.from_delayed(client.map(op.itemgetter(0), result_tuples), meta=meta).persist()
converged = client.submit(all, client.map(op.itemgetter(1), result_tuples)).result()There was a problem hiding this comment.
Oh Nice, Did not know we could do op.itemgetter like this. Very cool to learn. Thanks
… exceptions using proper exception chaining.
…8-python_pagerank_convergence_option
…ps://github.com/rlratzel/cugraph into branch-23.08-python_pagerank_convergence_option
…hub.com:rlratzel/cugraph into branch-23.08-python_pagerank_convergence_option
|
/merge |
closes #3613
Prior to this PR,
pagerankwill raise aRuntimeErrorif it fails to converge, often because themax_iterparam is set too small (intentionally or otherwise). This PR adds the optional paramterfail_on_nonconvergencewhich defaults toTrue(ie. the current behavior to ensure backwards-compatibility) that allows a caller to runpagerankand get results even if it did not converge. Whenfail_on_nonconvergenceisFalse,pagerankwill return a tuple containing the pagerank results and a bool indicating if the results converged or not).