Update Uniform Neighborhood Sampling API#2997
Update Uniform Neighborhood Sampling API#2997rapids-bot[bot] merged 23 commits intorapidsai:branch-23.02from
Conversation
Codecov ReportBase: 55.31% // Head: 55.20% // Decreases project coverage by
Additional details and impacted files@@ Coverage Diff @@
## branch-23.02 #2997 +/- ##
================================================
- Coverage 55.31% 55.20% -0.11%
================================================
Files 148 142 -6
Lines 9423 9229 -194
================================================
- Hits 5212 5095 -117
+ Misses 4211 4134 -77 Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
|
rerun tests |
|
rerun tests |
seunghwak
left a comment
There was a problem hiding this comment.
I will add more reviews later.
cpp/include/cugraph/algorithms.hpp
Outdated
| std::optional< | ||
| edge_property_view_t<edge_t, | ||
| thrust::zip_iterator<thrust::tuple<edge_t const*, edge_type_t const*>>>> | ||
| edge_type_view, |
There was a problem hiding this comment.
Shouldn't this be edge_id_type_view?
There was a problem hiding this comment.
Fixed in next push
cpp/include/cugraph/algorithms.hpp
Outdated
| * This function traverses from a set of starting vertices, traversing outgoing edges and | ||
| * randomly selects from these outgoing neighbors to extract a subgraph. | ||
| * | ||
| * Output from this function a set of tuples (src, dst, edge_id, edge_type, weight, hop, label), |
There was a problem hiding this comment.
Output from this function a set of tuples=>Output from this function is a tuple of vectors?
There was a problem hiding this comment.
Updated in next push
| d_result_edge_id.resize(new_sz, handle.get_stream()); | ||
| d_result_hop.resize(new_sz, handle.get_stream()); | ||
| if (d_result_weight) d_result_weight->resize(new_sz, handle.get_stream()); | ||
| if (d_result_edge_type) d_result_edge_type->resize(new_sz, handle.get_stream()); |
There was a problem hiding this comment.
Shouldn't we resize d_result_label as well?
i.e.
if (d_result_label) d_result_label->resize(new_sz, handle.get_stream());
There was a problem hiding this comment.
And note that up-sizing involves 1) reallocating a new array and 2) copying the old contents. Copying can add significant overhead if there are many levels.
We may better store the data for each level in a separate array and concatenate once at the end to avoid frequent resizing (and copying the same data multiple times).
There was a problem hiding this comment.
Added a FIXME to mention this performance issue.
| * @param graph_view Non-owning graph object. | ||
| * @param active_majors Device vector containing all the vertex id that are processed by | ||
| * gpus in the column communicator | ||
| * @return A tuple of device vector containing the majors, minors and weights gathered locally |
There was a problem hiding this comment.
@return A tuple of device vectors containing the majors, minors, ids and optional weights, types and labels
There was a problem hiding this comment.
Fixed in next push.
| } else if constexpr (cugraph::is_thrust_tuple_of_arithmetic<EdgeProperties>::value && | ||
| (thrust::tuple_size<EdgeProperties>::value == 3)) { | ||
| return thrust::make_optional(thrust::make_tuple(src, |
There was a problem hiding this comment.
I wonder, what if we have EdgeProperties are tuple of size 4 or more?
There was a problem hiding this comment.
Use thrust_tuple_cat & to_thrust_tuple.
https://github.com/rapidsai/cugraph/blob/branch-23.02/cpp/include/cugraph/utilities/thrust_tuple_utils.hpp#L215
https://github.com/rapidsai/cugraph/blob/branch-23.02/cpp/include/cugraph/utilities/thrust_tuple_utils.hpp#L201
Something like
return thrust::make_optional(thrust_tuple_cat(thrust::make_tuple(src), thrust::make_tuple(dst), to_thrust_tuple(edge_properties)));
There was a problem hiding this comment.
I didn't think that thrust_tuple_cat worked on device. At least I couldn't get it to work on device, and the examples I saw were all in host code.
There was a problem hiding this comment.
Clearly a working thrust_tuple_cat would be a better long-term solution.
In our current code base, this set of if statements cover all of the possible cases. Since this is resolved at compile time, I'm comfortable leaving this as tech debt, unless you think there's a way to make thrust_tuple_cat do what we want.
There was a problem hiding this comment.
Added a FIXME about using thrust_tuple_cat
|
I think there may be a logic error somewhere; When testing on MG, I'm getting intermittent frontier out of range errors for valid vertex ids, and sometimes arrays of incorrect size back from |
alexbarghi-nv
left a comment
There was a problem hiding this comment.
Found a few things, might have more comments later
|
There are also a couple C++ bugs related to edge properties I've fixed and working on a PR for that will stop this from working correctly. |
|
Finally, I'm still seeing the frontier out of range errors which are blocking further testing. |
…incorporate bug fix from @seungwak
cpp/include/cugraph/algorithms.hpp
Outdated
| * This function traverses from a set of starting vertices, traversing outgoing edges and | ||
| * randomly selects from these outgoing neighbors to extract a subgraph. | ||
| * | ||
| * Output from this function is a tuple of vectors (src, dst, edge_id, edge_type, weight, hop, label), |
There was a problem hiding this comment.
(src, dst, edge_id, edge_type, wieght, ...)=>(src, dst, weight, edge_id, edge_type, ...) to match the actual return order.
There was a problem hiding this comment.
Updated documentation
cpp/include/cugraph/algorithms.hpp
Outdated
| * | ||
| * Output from this function is a tuple of vectors (src, dst, edge_id, edge_type, weight, hop, label), | ||
| * identifying the randomly selected edges. src is the source vertex, dst is the destination | ||
| * vertex, edge_id identifies the edge id, edge_type identifies the edge type, weight is the edge |
There was a problem hiding this comment.
edge_id identifies the edge id, edge_type identifies the edge type, weight is the edge weight,
=>
weight is the edge weight, edge_id identifies the edge ID, edge_type identifies the edge type,
There was a problem hiding this comment.
Updated documentation
cpp/include/cugraph/algorithms.hpp
Outdated
| * Output from this function is a tuple of vectors (src, dst, edge_id, edge_type, weight, hop, label), | ||
| * identifying the randomly selected edges. src is the source vertex, dst is the destination | ||
| * vertex, edge_id identifies the edge id, edge_type identifies the edge type, weight is the edge | ||
| * weight, hop identifies which hop the edge was encountered in. Label is optional, if input labels |
There was a problem hiding this comment.
Edge weights, edge IDs, and edge types are optional, too, right? Shouldn't we document this as well.
There was a problem hiding this comment.
This may mislead users that only label is optional.
There was a problem hiding this comment.
Updated documentation
| * | ||
| * @tparam vertex_t Type of vertex identifiers. Needs to be an integral type. | ||
| * @tparam edge_t Type of edge identifiers. Needs to be an integral type. | ||
| * @tparam weight_t Type of edge weights. Needs to be a floating point type. |
There was a problem hiding this comment.
documentation for edge_type_t & store_transposed are missing
There was a problem hiding this comment.
Updated documentation
cpp/include/cugraph/algorithms.hpp
Outdated
| * handles to various CUDA libraries) to run graph algorithms. | ||
| * @param graph_view Graph View object to generate NBR Sampling on. | ||
| * @param edge_weight_view Optional view object holding edge weights for @p graph_view. | ||
| * @param edge_type_view Optional view object holding edge types for @p graph_view. |
There was a problem hiding this comment.
edge_type_view=>edge_id_type_view
holding edge types=>holding edge IDs and types
There was a problem hiding this comment.
Updated documentation
cpp/include/cugraph/algorithms.hpp
Outdated
| * (true); or, without replacement (false); default = true; | ||
| * @param seed A seed to initialize the random number generator | ||
| * @return tuple device vectors (vertex_t source_vertex, vertex_t destination_vertex, | ||
| * optional weight_t weight, optional edge_t edge id, optional edge_type_t edge type, int32 hop, |
There was a problem hiding this comment.
Updated documentation
| @@ -78,26 +78,63 @@ count_and_remove_duplicates(raft::handle_t const& handle, | |||
| std::move(result_src), std::move(result_dst), std::move(result_wgt), std::move(result_count)); | |||
| } | |||
There was a problem hiding this comment.
Yeah... I guess better be deleted.
| thrust::optional<thrust::tuple<vertex_t, vertex_t, W>>> | ||
| __device__ operator()(vertex_t src, vertex_t dst, thrust::nullopt_t, thrust::nullopt_t, W wgt) | ||
| template <typename vertex_t> | ||
| struct sample_edges_op_t { |
| rmm::device_uvector<vertex_t> d_start_vs(starting_vertices.size(), handle.get_stream()); | ||
| raft::copy( | ||
| d_start_vs.data(), starting_vertices.data(), starting_vertices.size(), handle.get_stream()); | ||
|
|
||
| std::optional<rmm::device_uvector<int32_t>> d_start_labels{std::nullopt}; | ||
| if (starting_labels) { | ||
| d_start_labels = std::make_optional( | ||
| rmm::device_uvector<int32_t>(starting_labels->size(), handle.get_stream())); | ||
| raft::copy(d_start_labels->data(), | ||
| starting_labels->data(), | ||
| starting_labels->size(), | ||
| handle.get_stream()); | ||
| } |
There was a problem hiding this comment.
So, if I am not mistaken,
If the intention is to move them and overwrite, shouldn't we better pass d_start_labels as an R-value? If we're passing d_start_labels as an L-value reference, that implies that we're willing to use d_start_labels after the function call and d_start_labels will be in a certain well-defined state after the function call.
And I guess we don't call the shuffle function in single-GPU, so this overhead can be avoided in signle-GPU, right? Then, shouldn't we better create a copy before calling a shuffle function rather than creating a copy here? And I guess it will make the intention clearer. To groupby elements based on destination GPUs, we need to modify the input, but 'starting_labels is const, so we need to make a copy. This is clear. But the intention of creating a copy to call a detail space implementation function here is not very clear I think.
There was a problem hiding this comment.
I have a question which i think is unrelated to this PR but i still think is worth asking
We currently fail for isolated vertices in the main-line version, like check out below example. Would this PR or a followup be able to handle sampling on such a vertex. I just except us to not return any samples rather than erroring like above .
CC: @ChuckHastings , @seunghwak , @alexbarghi-nv
edges_df = dask_cudf.from_cudf(cudf.DataFrame({'src':[0,1,3],'dst':[0,1,3]}), npartitions=2)
g = cugraph.MultiGraph(directed=True)
g.from_dask_cudf_edgelist(edges_df,source='src',destination='dst',legacy_renum_only=True, renumber=True)
sample_d = cugraph.dask.uniform_neighbor_sample(g, cudf.Series([2]), with_replacement=False, fanout_vals=[10])
sample_d.compute()RuntimeError: non-success value returned from cugraph_uniform_neighbor_sample: CUGRAPH_UNKNOWN_ERROR cuGraph failure at file=/home/nfs/vjawa/dgl/cugraph/cpp/src/prims/per_v_random_select_transform_outgoing_e.cuh line=328: Invalid input argument: frontier includes out-of-range keys.
Obtained 32 stack frames
#0 in /datasets/vjawa/miniconda3/envs/all_cuda-115_arch-x86_64/lib/libcugraph.so(+0x85f434) [0x7f038c094434]
#1 in /datasets/vjawa/miniconda3/envs/all_cuda-115_arch-x86_64/lib/libcugraph.so(+0xb8579d) [0x7f038c3ba79d]
#2 in /datasets/vjawa/miniconda3/envs/all_cuda-115_arch-x86_64/lib/libcugraph.so(_ZN7cugraph6detail12sample_edgesIllfLb1EEESt5tupleIJN3rmm14device_uvectorIT_EES6_St8optionalINS4_IT1_EEEEERKN4raft8handle_tERKNS_12graph_view_tIS5_T0_Lb0EXT2_EvEES7_INS_20edge_property_view_tISH_PKS8_EEERNSC_6random8RngStateERKS6_mb+0x3f41) [0x7f038ce566f1]
#3 in /datasets/vjawa/miniconda3/envs/all_cuda-115_arch-x86_64/lib/libcugraph.so(_ZN7cugraph6detail23uniform_nbr_sample_implIllfLb0ELb1EEESt5tupleIJN3rmm14device_uvectorIT_EES6_NS4_IT1_EENS4_IT0_EEEERKN4raft8handle_tERKNS_12graph_view_tIS5_S9_XT2_EXT3_EvEESt8optionalINS_20edge_property_view_tIS9_PKS7_EEERS6_NSC_4spanIKiLb0ELm18446744073709551615EEEbm+0x4bf) [0x7f038ceeab6f]
#4 in /datasets/vjawa/miniconda3/envs/all_cuda-115_arch-x86_64/lib/libcugraph.so(_ZN7cugraph18uniform_nbr_sampleIllfLb0ELb1EEESt5tupleIJN3rmm14device_uvectorIT_EES5_NS3_IT1_EENS3_IT0_EEEERKN4raft8handle_tERKNS_12graph_view_tIS4_S8_XT2_EXT3_EvEESt8optionalINS_20edge_property_view_tIS8_PKS6_EEENSB_4spanIS4_Lb1ELm18446744073709551615EEENSP_IKiLb0ELm18446744073709551615EEEbm+0x135) [0x7f038ceebc45]
#5 in /datasets/vjawa/miniconda3/envs/all_cuda-115_arch-x86_64/lib/libcugraph_c.so(+0x1f5b37) [0x7f039a5a0b37]
#6 in /datasets/vjawa/miniconda3/envs/all_cuda-115_arch-x86_64/lib/libcugraph_c.so(cugraph_uniform_neighbor_sample+0x114) [0x7f039a5aa734]
#7 in /datasets/vjawa/miniconda3/envs/all_cuda-115_arch-x86_64/lib/python3.9/site-packages/pylibcugraph/uniform_neighbor_sample.cpython-39-x86_64-linux-gnu.so(+0x7679) [0x7f04993a6679]
#8 in /datasets/vjawa/miniconda3/envs/all_cuda-115_arch-x86_64/bin/python(_PyObject_MakeTpCall+0x347) [0x5576d07daa57]
#9 in /datasets/vjawa/miniconda3/envs/all_cuda-115_arch-x86_64/bin/python(_PyEval_EvalFrameDefault+0x55fc) [0x5576d07d6dac]
#10 in /datasets/vjawa/miniconda3/envs/all_cuda-115_arch-x86_64/bin/python(+0x12a7d7) [0x5576d07d07d7]
#11 in /datasets/vjawa/miniconda3/envs/all_cuda-115_arch-x86_64/bin/python(_PyFunction_Vectorcall+0xb9) [0x5576d07e2bb9]
#12 in /datasets/vjawa/miniconda3/envs/all_cuda-115_arch-x86_64/bin/python(PyObject_Call+0xb4) [0x5576d07f2254]
#13 in /datasets/vjawa/miniconda3/envs/all_cuda-115_arch-x86_64/bin/python(_PyEval_EvalFrameDefault+0x3d99) [0x5576d07d5549]
#14 in /datasets/vjawa/miniconda3/envs/all_cuda-115_arch-x86_64/bin/python(+0x13cec3) [0x5576d07e2ec3]
#15 in /datasets/vjawa/miniconda3/envs/all_cuda-115_arch-x86_64/bin/python(_PyEval_EvalFrameDefault+0x3c3) [0x5576d07d1b73]
#16 in /datasets/vjawa/miniconda3/envs/all_cuda-115_arch-x86_64/bin/python(+0x13cec3) [0x5576d07e2ec3]
#17 in /datasets/vjawa/miniconda3/envs/all_cuda-115_arch-x86_64/bin/python(_PyEval_EvalFrameDefault+0x3d99) [0x5576d07d5549]
#18 in /datasets/vjawa/miniconda3/envs/all_cuda-115_arch-x86_64/bin/python(+0x13cec3) [0x5576d07e2ec3]
#19 in /datasets/vjawa/miniconda3/envs/all_cuda-115_arch-x86_64/bin/python(_PyEval_EvalFrameDefault+0x672) [0x5576d07d1e22]
#20 in /datasets/vjawa/miniconda3/envs/all_cuda-115_arch-x86_64/bin/python(+0x13cec3) [0x5576d07e2ec3]
#21 in /datasets/vjawa/miniconda3/envs/all_cuda-115_arch-x86_64/bin/python(_PyEval_EvalFrameDefault+0x3d99) [0x5576d07d5549]
#22 in /datasets/vjawa/miniconda3/envs/all_cuda-115_arch-x86_64/bin/python(+0x13cec3) [0x5576d07e2ec3]
#23 in /datasets/vjawa/miniconda3/envs/all_cuda-115_arch-x86_64/bin/python(_PyEval_EvalFrameDefault+0x672) [0x5576d07d1e22]
#24 in /datasets/vjawa/miniconda3/envs/all_cuda-115_arch-x86_64/bin/python(+0x13cec3) [0x5576d07e2ec3]
#25 in /datasets/vjawa/miniconda3/envs/all_cuda-115_arch-x86_64/bin/python(_PyEval_EvalFrameDefault+0x672) [0x5576d07d1e22]
#26 in /datasets/vjawa/miniconda3/envs/all_cuda-115_arch-x86_64/bin/python(+0x13cec3) [0x5576d07e2ec3]
#27 in /datasets/vjawa/miniconda3/envs/all_cuda-115_arch-x86_64/bin/python(+0x14bc75) [0x5576d07f1c75]
#28 in /datasets/vjawa/miniconda3/envs/all_cuda-115_arch-x86_64/bin/python(+0x22f7f5) [0x5576d08d57f5]
#29 in /datasets/vjawa/miniconda3/envs/all_cuda-115_arch-x86_64/bin/python(+0x22f7a4) [0x5576d08d57a4]
#30 in /lib/x86_64-linux-gnu/libpthread.so.0(+0x76db) [0x7f05caef86db]
#31 in /lib/x86_64-linux-gnu/libc.so.6(clone+0x3f) [0x7f05ca27461f]
Edit: never mind, this is an isolated vertex case. Isolated vertices aren't supported because they are not in the edgelist. It is up to the caller to make sure the vertex ids they input as start vertices are valid. This is correct behavior. |
rlratzel
left a comment
There was a problem hiding this comment.
I only looked at the CMakeLists.txt change on behalf of cmake-codeowners
Can we support skipping isolated vertices, from a GNN standpoint the train/test node ids of common datasets have node_ids which are isolated vertices and i dont have a straight forward efficient way of filtering it out. |
|
@VibhuJawa removing isolated vertices from a list is something we would have to take up with the C++ team as a separate issue. That would have to be its own algorithm. |
cpp/include/cugraph/algorithms.hpp
Outdated
| * This function traverses from a set of starting vertices, traversing outgoing edges and | ||
| * randomly selects from these outgoing neighbors to extract a subgraph. | ||
| * | ||
| * Output from this function is a tuple of vectors (src, dst, weight_t, edge_id, edge_type, hop, |
If you want to create an issue for this, please do. The C API could be modified to filter out the vertices that are not present in the graph. But I would definitely rather do that as a separate activity at this point so we can merge this PR and unblock Alex. |
|
/merge |
Resolves #3073 Resolves rapidsai/graph_dl#72 Resolves #2562 Resolves rapidsai/graph_dl#49 Resolves #2871 Takes Chuck's changes from #2997 and implements the necessary changes in Python. Adds tests in Python for the handling of edge id, edge type, etc. Also updates Python tests to reflect the removal of `count_and_remove_duplicates` Authors: - Alex Barghi (https://github.com/alexbarghi-nv) - Rick Ratzel (https://github.com/rlratzel) - Chuck Hastings (https://github.com/ChuckHastings) - Vyas Ramasubramani (https://github.com/vyasr) - Vibhu Jawa (https://github.com/VibhuJawa) Approvers: - Chuck Hastings (https://github.com/ChuckHastings) - Rick Ratzel (https://github.com/rlratzel) URL: #3082
Update the Uniform Neighborhood Sampling C API to address some shortcomings:
closes #2994
closes #2597
closes #2774