[REVIEW] adding test graphs - part 2#1603
[REVIEW] adding test graphs - part 2#1603rapids-bot[bot] merged 11 commits intorapidsai:branch-21.06from
Conversation
Codecov Report
@@ Coverage Diff @@
## branch-21.06 #1603 +/- ##
===============================================
Coverage ? 60.03%
===============================================
Files ? 80
Lines ? 3551
Branches ? 0
===============================================
Hits ? 2132
Misses ? 1419
Partials ? 0 Continue to review full report at Codecov.
|
| std::vector<rmm::device_uvector<vertex_t>> srcs_v{}; | ||
| std::vector<rmm::device_uvector<vertex_t>> dsts_v{}; | ||
|
|
||
| srcs_v.push_back(std::move(d_src_v)); |
There was a problem hiding this comment.
Consider using emplace_back() instead of push_back(), as it's more efficient for rvalue references (which is the case here, because of std::move()).
There was a problem hiding this comment.
OBE with next push.
| raft::copy( | ||
| copy_weights_v.begin(), d_weights_v.begin(), d_weights_v.size(), handle.get_stream()); | ||
|
|
||
| weights_v.push_back(std::move(d_weights_v)); |
There was a problem hiding this comment.
Consider using emplace_back() instead of push_back(), as it's more efficient for rvalue references (which is the case here, because of std::move()).
There was a problem hiding this comment.
OBE with next push.
|
|
||
| rmm::device_uvector<size_t> indices_v(count, handle.get_stream()); | ||
|
|
||
| handle.get_stream_view().synchronize(); |
There was a problem hiding this comment.
Residual from some earlier debugging. Deleted.
|
|
||
| size_t count = thrust::count_if(rmm::exec_policy(handle.get_stream()), | ||
| random_iterator, | ||
| random_iterator + num_vertices * num_vertices, |
There was a problem hiding this comment.
Just want to note that the Gnp model can be way more expensive than the Gnm model if num_vertices is large (but much simpler to implement as we don't need to worry about duplicates).
Shouldn't we add gnp in the function name so users can expect this will go over num_vertices * num_vertices potential edges?
There was a problem hiding this comment.
Changed function name. Add a gnm function as well, although not implemented in this PR.
| thrust::make_zip_iterator(thrust::make_tuple(src_v.begin(), src_v.end())), | ||
| [num_vertices] __device__(size_t index) { | ||
| size_t src = index / num_vertices; | ||
| size_t dst = index % num_vertices; |
There was a problem hiding this comment.
This will break if vertex_t is 64 bit.
There was a problem hiding this comment.
Technically, it will break if num_vertices > (2^31 - 1). If vertex_t is 64 bits but could be stored in 32 bits it would still work.
Since we don't want this variation called if num_vertices > (2^31 - 1)... even if it worked we don't want to do that much computation... I just added a CUGRAPH_EXPECTS at the beginning of the function.
| rmm::device_uvector<vertex_t> &d_src_v, | ||
| rmm::device_uvector<vertex_t> &d_dst_v, | ||
| vertex_t vertex_id_offset, | ||
| uint64_t seed) |
There was a problem hiding this comment.
Have you checked that this function works if num_vertices is not a power of two?
IIRC, this function works if vertices are in the range [0, 2^scale), but I am not sure this still works otherwise. Need to check.
There was a problem hiding this comment.
I guess this will still work (but the scrambled vertex IDs are no longer contiguous integers but does not matter if we renumber or allow having many isolated vertices... but needs to double check).
There was a problem hiding this comment.
Added a unit test for the scramble function. The result is correct (it validates properly). Your assertion is correct regarding the resulting ids no longer being contiguous. This will (as you observe) result in a collection of isolated vertices if we do not renumber; although if we renumber that will be corrected.
| template <typename T> | ||
| void append_all(raft::handle_t const &handle, | ||
| std::vector<rmm::device_uvector<T>> &&input, | ||
| rmm::device_uvector<T> &output) |
There was a problem hiding this comment.
Should we better return rmm::device_uvector instead of taking rmm::device_uvector<T>& output. Unless the ouput is in-out or there is another strong reason to do so, returning is more functional and side-effects-free than taking an lvalue reference.
There was a problem hiding this comment.
Done in next push
|
|
||
| size_t number_of_edges{0}; | ||
|
|
||
| if (optional_d_weights) { |
There was a problem hiding this comment.
So, should we always remove all multi-edges or better make this as an option? Graph500 input edge lists can possibly have multi-edges (and self-loops) and removing those is a task in the graph construction step.
There was a problem hiding this comment.
Added a flag to control this behavior in next push.
| rmm::device_uvector<vertex_t> &&d_src_v, | ||
| rmm::device_uvector<vertex_t> &&d_dst_v, | ||
| std::optional<rmm::device_uvector<weight_t>> &&optional_d_weights_v) | ||
| { |
There was a problem hiding this comment.
I think the code below is simpler and more efficient (in memory footprint & the amount of data movement).
auto offset = d_src_v.size();
d_src_v.resize(offset * 2, handle.get_stream_view());
d_dst_v.resize(d_src_v.size(), handle.get_stream_view());
thrust::copy(rmm::exec_policy(handle.get_stream_view()), d_dst_v.begin(), d_dst_v.begin() + offset, d_src_v.begin() + offset);
thrust::copy(rmm::exec_policy(handle.get_stream_view()), d_src_v.begin(), d_src_v.begin() + offset, d_dst_v.begin() + offset);
if (optional_d_weights_v) {
optional_d_weights_v.resize(d_src_v.size(), handle.get_stream_view());
thrust::copy(rmm::exec_policy(handle.get_stream_view()), optional_d_weight_v.begin(), optinoal_d_weight_v.begin() + offset, optional_d_weight_v.begin() + offset);
}
return std::make_tuple(std::move(d_src_v), std::move(d_dst_v), optional_d_weights_v ? std::move(optional_d_weights_v) : std::nullopt);
There was a problem hiding this comment.
Implemented this version.
| thrust::make_counting_iterator<size_t>(0), | ||
| [base_vertex_id, num_vertices, invalid_vertex] __device__(size_t index) { | ||
| size_t graph_index = index / (num_vertices * num_vertices); | ||
| size_t local_index = index % (num_vertices * num_vertices); |
There was a problem hiding this comment.
This will break if vertex_t is 64 bit.
There was a problem hiding this comment.
Added a check (like above)
|
@gpucibot merge |
This effort was originally targeted toward the WCC effort, but has been expanded a bit. This supersedes #1545 which I will close.
The goal here is to create a means for constructing test graphs in an easier fashion. Testing the capabilities of different graph algorithms might require a variety of graphs. The objective of this PR is to better organize the graph generation components and to introduce some components to help in composing graphs out of multiple components.
This PR introduces the following capabilities:
Closes #1543