Extract BFS paths SG implementation by ChuckHastings · Pull Request #1838 · rapidsai/cugraph

ChuckHastings · 2021-09-23T01:59:40Z

Partially addresses #1753

Creates a new C++ function to extract BFS paths.

This PR includes the SG implementation and the SG unit tests. A separate PR will provide the MG unit test.

cpp/src/traversal/extract_bfs_paths_impl.cuh

seunghwak

Review part 1 (I am still reading extract_bfs_paths_impl.cuh).

cpp/include/cugraph/algorithms.hpp

seunghwak · 2021-09-28T16:06:36Z

cpp/src/traversal/extract_bfs_paths_impl.cuh

+  vertex_t const* distances_;
+  vertex_t const* destinations_;
+
+  size_t __device__ operator()(vertex_t idx)


shouldn't this be size_t idx?

Fixed in next push

seunghwak · 2021-09-28T16:08:59Z

cpp/tests/CMakeLists.txt


 ###################################################################################################
-# - MST tests ----------------------------------------------------------------------------
+# - MST tests -------------------------------------------------------------------------------------


Thanks for fixing these :-)

seunghwak · 2021-09-28T16:11:00Z

cpp/src/traversal/extract_bfs_paths_impl.cuh

+  vertex_t const* distances_;
+
+  thrust::tuple<vertex_t, vertex_t, int> __device__
+  operator()(thrust::tuple<vertex_t, vertex_t, int> const& tuple)


What's the third tuple element (type int)?

The original rank. Each rank has to retrieve results from another rank. The current implementation is:

Construct the tuple (current_vertex, offset_in_current_frontier_array my_rank)

Shuffle the tuples based on which rank contains the predecessor information for current_vertex

Replace current_vertex with previous_vertex

Shuffle the tuples back to the original rank where the data is required

Update current_frontier in the proper location with the predecessor value

I could achieve the same result without the int parameter by writing a more sophisticated communication pattern (sort the vertices by which rank they are assigned to, do block transfers keeping track of the offsets for each rank, etc). I suspect this would be more time efficient as well as memory efficient. But I was trying to reuse existing functions as much as possible to get something working. We can optimize this more later.

Ah... OK, so I guess this code can go away if we implement collect_values_for_vertices.

So, is this functor still used? Can't find a place this function is actually used.

seunghwak · 2021-09-28T16:11:10Z

cpp/src/traversal/extract_bfs_paths_impl.cuh

+  vertex_t const* predecessors_;
+
+  thrust::tuple<vertex_t, vertex_t, int> __device__
+  operator()(thrust::tuple<vertex_t, vertex_t, int> const& tuple)


What's the third tuple element (type int)?

So, is this functor still used? Can't find a place this function is actually used.

seunghwak · 2021-09-28T16:20:15Z

cpp/tests/traversal/extract_bfs_paths_utils.cu

+  rmm::device_uvector<vertex_t> const& d_predecessors,
+  size_t num_paths_to_check,
+  uint64_t seed)
+{


So, if I am not mistaken, this randomly selects num_paths_to_check vertices (that are not sources).

I think we can achieve this by

thrust::copy_if();
thrust::shuffle();

And as this is only for test, another option is to do this in CPU, so no need to add this additional file and keep our codebase size in check.

I can look into shuffle, that's one I haven't used before.

I can certainly look at a CPU implementation. I was imagining that we might have other tests that might benefit from a function to randomly select vertices - although I started with it here because it was easier.

Yeah... so, for personalized PageRank, we're randomly selecting personalization vertices as well (it's currently implemented as host code).

https://github.com/rapidsai/cugraph/blob/branch-21.10/cpp/tests/link_analysis/pagerank_test.cpp#L178
https://github.com/rapidsai/cugraph/blob/branch-21.10/cpp/tests/link_analysis/mg_pagerank_test.cpp#L104

We may better combine these and add something under tests/utilities?

seunghwak

Review Part 2

cpp/src/traversal/extract_bfs_paths_impl.cuh

seunghwak · 2021-09-28T17:04:32Z

cpp/include/cugraph/algorithms.hpp

+ * @param graph_view Graph view object.
+ * @param distances Pointer to the distance array constructed by bfs.
+ * @param predecessors Pointer to the predecessor array constructed by bfs.
+ * @param destinations Destination vertices, extract path from source to each of these destinations


So, in multi-GPU, destinations should be local to this GPU, right? Shouldn't we explain this here?

And I think this should be mentioned in

https://github.com/rapidsai/cugraph/blob/branch-21.10/cpp/include/cugraph/algorithms.hpp#L1106
https://github.com/rapidsai/cugraph/blob/branch-21.10/cpp/include/cugraph/algorithms.hpp#L1185

as well.

Initially I started with this assumption. However, once I started implementing I realized I had to handle off-GPU vertices anyway, so I dropped that assumption. If you specify on GPU 0 a destination that resides on GPU 1, the implementation will just go to GPU 1 to get the predecessor of that destination.

I agree that should be mentioned in the other places. I will add those in this PR.

I think we may still enforce this for consistency with other APIs (and we can avoid communication for the first step).

Changed this back to this assumption.

Added a check to validate that destinations are on the correct gpu

Eliminated the initial shuffle logic (outside the loop in the implementation)

Updated the documentation (including the other locations)

3. Updated the documentation (including the other locations)

I can see updates in other locations, but I can't find documentation updates for extract_bfs_paths? Could you point out the exact line?

Somehow missed this one, updated in next push.

cpp/src/traversal/extract_bfs_paths_impl.cuh

seunghwak · 2021-09-28T18:07:52Z

cpp/src/traversal/extract_bfs_paths_impl.cuh

+  size_t count =
+    multi_gpu
+      ? cugraph::host_scalar_allreduce(
+          handle.get_comms(), current_frontier.size(), raft::comms::op_t::MAX, handle.get_stream())


Shouldn't this better be SUM reduction? (to minimize the difference in SG & MG logics)

I think MAX is semantically what I want, although SUM would suffice.

I'm actually thinking I'll change this loop to a for loop from over the range [0, max_path_length), since we already know how many iterations are required.

I'm actually thinking I'll change this loop to a for loop from over the range [0, max_path_length), since we already know how many iterations are required.

+1 for this, this will be clearer.

Changed to a for loop, definitely much clearer and saves the allreduce cost.

cpp/src/traversal/extract_bfs_paths_impl.cuh

seunghwak · 2021-09-28T18:11:53Z

cpp/src/traversal/extract_bfs_paths_impl.cuh

+
+    count = multi_gpu ? cugraph::host_scalar_allreduce(handle.get_comms(),
+                                                       current_frontier.size(),
+                                                       raft::comms::op_t::MAX,


Shouldn't this better be SUM reduction? (to minimize the difference in SG & MG logics)

Same as above.

Same fix as above

codecov-commenter · 2021-09-28T19:35:32Z

Codecov Report

❗ No coverage uploaded for pull request base (branch-21.12@50158c6). Click here to learn what that means.
The diff coverage is n/a.

@@               Coverage Diff               @@
##             branch-21.12    #1838   +/-   ##
===============================================
  Coverage                ?   70.11%           
===============================================
  Files                   ?      143           
  Lines                   ?     8827           
  Branches                ?        0           
===============================================
  Hits                    ?     6189           
  Misses                  ?     2638           
  Partials                ?        0

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 50158c6...ce12426. Read the comment docs.

ChuckHastings · 2021-09-28T22:32:32Z

Pushing out to 21.12 to incorporate more of the discussed changes.

… tests pass

seunghwak · 2021-10-11T13:39:39Z

cpp/include/cugraph/algorithms.hpp

+ * @param graph_view Graph view object.
+ * @param distances Pointer to the distance array constructed by bfs.
+ * @param predecessors Pointer to the predecessor array constructed by bfs.
+ * @param destinations Destination vertices, extract path from source to each of these destinations


3. Updated the documentation (including the other locations)

I can see updates in other locations, but I can't find documentation updates for extract_bfs_paths? Could you point out the exact line?

seunghwak · 2021-10-11T13:41:41Z

cpp/include/cugraph/utilities/collect_comm.cuh


+template <typename VertexIterator,
+          typename ValueIterator,
+          typename KeyToGPUIdOp,


I think we'd better rename this to VertexToGPUIdOp to clarify that this function behavior is different from collect_values_for_keys.

Fixed in next push

seunghwak · 2021-10-11T13:42:33Z

cpp/include/cugraph/utilities/collect_comm.cuh

+                            VertexIterator map_vertex_first,
+                            VertexIterator map_vertex_last,
+                            ValueIterator map_value_first,
+                            KeyToGPUIdOp key_to_gpu_id_op,


Better be VertexToGPUIdOp vertex_to_gpu_id_op?

Fixed in next push

Changed to vertex_partition_lasts

seunghwak · 2021-10-11T13:50:50Z

cpp/include/cugraph/utilities/collect_comm.cuh

+
+  raft::update_host(h_gpu_counts.data(), gpu_counts.data(), gpu_counts.size(), stream_view);
+
+  std::tie(shuffled_vertices, shuffled_counts) =


auto [shuffled_vertices, shuffled_counts) = (so, no need to explicitly define shuffled_vertices & shuffled_counts)

And actually, input_vertices here will not be used after this line, so better invent a name that covers both "shuffled_vertices" and "input_vertices" and do something like std::tie(new_name, ...) = shuffle_values(..., std::move(new_name), ...); (so if this function becomes a memory bottleneck, we can free memory hold by input_vertices).

input_vertices is used at the end of the function as the keys for a lookup.

seunghwak · 2021-10-11T14:00:30Z

cpp/include/cugraph/utilities/collect_comm.cuh

+  using vertex_t = typename std::iterator_traits<VertexIterator>::value_type;
+  using value_t  = typename std::iterator_traits<ValueIterator>::value_type;
+
+  size_t input_size = thrust::distance(map_vertex_first, map_vertex_last);


If I am not mistaken, [map_vertex_first, map_vertex_last) are vertices we're querying values (not the vertices in local GPU).

In the collect_values_for_keys function,

VertexIterator0 map_key_first, VertexIterator0 map_key_last, ValueIterator map_value_first,

store (key, value) pairs local to this GPU and

VertexIterator1 collect_key_first, VertexIterator1 collect_key_last,

are keys we want to query values for. So, I think map_key_first & last here are inconsistent (and misnomers).

collect_values_for_vertices(raft::comms::comms_t const& comm, VertexPartitionDeviceViewType vertex_partition_device_view (or local_vertex_first, local_vertex_last to better mirror the above collect_values_for_keys function), ValueIterator map_value_first, VertexToGPUIdOp vertex_to_gpu_id_op, VertexIterator1 collect_unique_vertex_first, VertexIterator1 collect_unique_vertex_last, ...

Or even better as vertex to GPU ID mapping is pretty much pre-determined by partitioning, we may take std::vector<vertex_t> const& vertex_partition_lasts instead of vertex_to_gpu_id_op (then no need to take vertex_partition_device_view).

collect_values_for_vertices(raft::comms::comms_t const& comm, ValueIterator local_value_first, VertexIterator1 collect_unique_vertex_first, (or sorted_unique_vertex_first, if we want to ask caller to sort if not already sorted) VertexIterator1 collect_unique_vertex_last, std::vector<vertex_t> const& vertex_partition_lasts, (size == # GPUs), ...

Restructured in the push I just made considering some of these ideas.

seunghwak · 2021-10-11T14:11:28Z

cpp/include/cugraph/utilities/collect_comm.cuh

+  thrust::sort(rmm::exec_policy(stream_view), input_vertices.begin(), input_vertices.end());
+
+  // 1: Shuffle vertices to the correct node
+  auto gpu_counts = groupby_and_count(
+    input_vertices.begin(), input_vertices.end(), key_to_gpu_id_op, comm.get_size(), stream_view);


So, this is an optimization issue, and we may not need to address this right now (having at least a FIXME statement will be nice), but there are few issues we need to think about.

So, in some cases, users may pass an already sorted vertex list to query values for. So, this sort can be redundant in such cases. And for sorted input_vertices, we can just run thrust::lower_bound for # GPUs values instead of visiting and groupbying every element in input vertices (and if we're running group_by_and_count, I think the sort above is unnecessary; please double check). See code around https://github.com/rapidsai/cugraph/blob/branch-21.12/cpp/src/structure/renumber_utils_impl.cuh#L258 for reference.

Restructured the code in the next push to accommodate this concern and some of the other comments above.

The sort here is necessary to use the keys for the lookup at the end of this function.

I added a call to thrust::unique, since there's no point in a particular GPU asking for the same key more than once in a single call.

seunghwak · 2021-10-11T14:22:35Z

cpp/src/traversal/extract_bfs_paths_impl.cuh

+  vertex_t const* distances_;
+
+  thrust::tuple<vertex_t, vertex_t, int> __device__
+  operator()(thrust::tuple<vertex_t, vertex_t, int> const& tuple)


So, is this functor still used? Can't find a place this function is actually used.

seunghwak · 2021-10-11T14:22:48Z

cpp/src/traversal/extract_bfs_paths_impl.cuh

+  vertex_t const* predecessors_;
+
+  thrust::tuple<vertex_t, vertex_t, int> __device__
+  operator()(thrust::tuple<vertex_t, vertex_t, int> const& tuple)


So, is this functor still used? Can't find a place this function is actually used.

seunghwak · 2021-10-11T14:29:50Z

cpp/src/traversal/extract_bfs_paths_impl.cuh

+  rmm::device_uvector<vertex_t> d_vertex_partition_offsets(0, handle.get_stream());
+
+  if constexpr (multi_gpu) {
+    // FIXME: There should be a better way to do this


Yeah... I think we'd better add a member function that returns std::vector<vertex_t> type (size == # GPUs) vertex_partition_lasts to graph_view_t.

Added the function to graph_view_t in next push

seunghwak · 2021-10-11T14:34:33Z

cpp/tests/traversal/randomly_select_destinations.cuh

+  vertex_t number_of_vertices,
+  vertex_t local_vertex_first,
+  rmm::device_uvector<vertex_t> const& d_predecessors,
+  size_t num_paths_to_check)


Just an issue to think about.

So, for testing, should we use CUDA code for this kind of input preparation functions (and add additional files as we cannot use thrust from .cpp files) or we'd better just use a slower host function for this (as this is only for testing.... and performance for this part may not matter much in many cases unless this will become a bottleneck in large scale benchmarking... it might be helpful in keeping our test code simpler).

Definitely worth considering. Not sure how frequent this pattern will be.

…values_for_vertices, some code cleanup

seunghwak

Looks good to me besides minor complaint.

seunghwak · 2021-10-14T14:20:43Z

cpp/include/cugraph/algorithms.hpp

+ * @param distances Pointer to the distance array constructed by bfs.
+ * @param predecessors Pointer to the predecessor array constructed by bfs.
+ * @param destinations Destination vertices, extract path from source to each of these destinations
+ * In a multi-gpu context the source vertex should be local to this GPU.


In a multi-gpu context the source vertex should be local to this GPU.
=>
In a multi-gpu context the destination vertex should be local to this GPU.

seunghwak · 2021-10-14T14:49:33Z

cpp/include/cugraph/utilities/collect_comm.cuh

+  vertex_t num_vertices,
+  ValueIterator local_value_first,
+  std::vector<vertex_t> const& vertex_partition_lasts,
+  VertexPartitionDeviceViewType vertex_partition_device_view,


Just a minor point, I feel like vertex_partition_device_view is a bit redundant as we can easily compute local_vertex_first from vertex_partition_lasts (0 if comm_rank == 0 and vertex_partition_lasts[comm_rank - 1] otherwise).

I guess since collect_comm.cuh is part of the primitive functionality it's reasonable to have it know the internal workings of the data structures. I have been trying to avoid having things outside the primitives (most of what I work on) rely on understanding the inner workings of the data structures.

I'll make these two changes, hopefully we can merge later today.

seunghwak · 2021-10-14T15:12:56Z

cpp/src/traversal/extract_bfs_paths_impl.cuh

+                       detail::update_paths<vertex_t>{paths.data(), invalid_vertex});
+  }
+
+  return std::make_tuple(std::move(paths), max_path_length);


Just want to say that this code looks beautiful :-) The MG part is just slightly more complex than the SG path.

It does look nice. Creating the collect_values_for_vertices function really cleaned this up well.

ChuckHastings · 2021-10-14T18:41:00Z

@gpucibot merge

ChuckHastings added 4 commits September 22, 2021 20:00

add function signature for extract_bfs_paths

468d811

Merge branch 'branch-21.10' into fea_1753

8704051

first version of bfs path extraction

633e40b

get everything to compile

ec00386

ChuckHastings requested review from a team as code owners September 23, 2021 01:59

ChuckHastings self-assigned this Sep 23, 2021

ChuckHastings added 2 - In Progress improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Sep 23, 2021

ChuckHastings added this to the 21.10 milestone Sep 23, 2021

debugged SG version, added unit test

4b6707c

ChuckHastings changed the title ~~[WIP] [skip-ci] Extract BFS paths implementation~~ Extract BFS paths implementation Sep 28, 2021

BradReesWork requested review from rlratzel and seunghwak September 28, 2021 15:45

ChuckHastings changed the title ~~Extract BFS paths implementation~~ Extract BFS paths SG implementation Sep 28, 2021

missed removing some debugging statements

b4396bb

BradReesWork reviewed Sep 28, 2021

View reviewed changes

cpp/src/traversal/extract_bfs_paths_impl.cuh Outdated Show resolved Hide resolved

ChuckHastings added 3 - Ready for Review and removed 2 - In Progress labels Sep 28, 2021

fix clang-format issues

815f12b

seunghwak reviewed Sep 28, 2021

View reviewed changes

ChuckHastings added 4 - Waiting on Author and removed 3 - Ready for Review labels Sep 28, 2021

ChuckHastings modified the milestones: 21.10, 21.12 Sep 28, 2021

ChuckHastings changed the base branch from branch-21.10 to branch-21.12 September 28, 2021 22:31

BradReesWork linked an issue Sep 29, 2021 that may be closed by this pull request

[ENH] Create function to extract paths from BFS output #1753

Closed

ChuckHastings added 7 commits October 4, 2021 20:03

some suggestions from PR review, add MG test, finish debugging so all…

71eb248

… tests pass

implement collect_values_for_vertices and use it

5726f66

Merge branch 'branch-21.12' into fea_1753

674230b

use vertex_partition_device_view

e354427

a little code reordering

372d2cb

finish code cleanup based on PR comments

d0c8dc7

Merge branch 'branch-21.12' into fea_1753

c41890b

ChuckHastings requested review from BradReesWork and seunghwak October 8, 2021 21:22

seunghwak reviewed Oct 11, 2021

View reviewed changes

add new get_vertex_partition_lasts to graph_view_t, refactor collect_…

3889561

…values_for_vertices, some code cleanup

ChuckHastings added 3 - Ready for Review and removed 4 - Waiting on Author labels Oct 13, 2021

fix clang-format issues

ce12426

seunghwak approved these changes Oct 14, 2021

View reviewed changes

seunghwak reviewed Oct 14, 2021

View reviewed changes

a few last minute changes

7aab988

rapids-bot bot merged commit 707fcf7 into rapidsai:branch-21.12 Oct 14, 2021

ChuckHastings deleted the fea_1753 branch February 1, 2022 16:30


		raft::update_host(h_gpu_counts.data(), gpu_counts.data(), gpu_counts.size(), stream_view);

		std::tie(shuffled_vertices, shuffled_counts) =

Conversation

ChuckHastings commented Sep 23, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

seunghwak left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

seunghwak left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

codecov-commenter commented Sep 28, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

ChuckHastings commented Sep 28, 2021

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

ChuckHastings commented Sep 23, 2021 •

edited

Loading

codecov-commenter commented Sep 28, 2021 •

edited

Loading