Skip to content

Some MTMG code cleanup and small optimizations#3894

Merged
rapids-bot[bot] merged 5 commits intorapidsai:branch-23.12from
ChuckHastings:mtmg_small_improvements
Oct 24, 2023
Merged

Some MTMG code cleanup and small optimizations#3894
rapids-bot[bot] merged 5 commits intorapidsai:branch-23.12from
ChuckHastings:mtmg_small_improvements

Conversation

@ChuckHastings
Copy link
Collaborator

Added some missing documentation.

A couple of optimizations:

  • Modified the append logic to keep the mutex lock only long enough to compute what needs to be copied and where.
  • Modified the handle created by the resource manager for each GPU to have a stream pool to enable different threads to operate on different streams.

@ChuckHastings ChuckHastings self-assigned this Sep 27, 2023
@ChuckHastings ChuckHastings added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Sep 27, 2023
@ChuckHastings ChuckHastings added this to the 23.12 milestone Sep 27, 2023
@ChuckHastings ChuckHastings marked this pull request as ready for review October 18, 2023 21:37
@ChuckHastings ChuckHastings requested a review from a team as a code owner October 18, 2023 21:37
std::vector<std::tuple<vertex_t*, vertex_t const*, size_t>> dst_copies;
std::vector<std::tuple<weight_t*, weight_t const*, size_t>> wgt_copies;
std::vector<std::tuple<edge_t*, edge_t const*, size_t>> edge_id_copies;
std::vector<std::tuple<edge_type_t*, edge_type_t const*, size_t>> edge_type_copies;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we maintain 5 variables or one variable storing (input_start_offset, output_start_offset, size) triplets will be sufficient?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good suggestion, I'll look into that for next push.

Comment on lines +140 to +160
while (count > 0) {
size_t copy_count = std::min(count, (src_.back().size() - current_pos_));

src_copies.push_back(
std::make_tuple(src_.back().begin() + current_pos_, src.begin() + pos, copy_count));
dst_copies.push_back(
std::make_tuple(dst_.back().begin() + current_pos_, dst.begin() + pos, copy_count));
if (wgt)
wgt_copies.push_back(
std::make_tuple(wgt_->back().begin() + current_pos_, wgt->begin() + pos, copy_count));
if (edge_id)
edge_id_copies.push_back(std::make_tuple(
edge_id_->back().begin() + current_pos_, edge_id->begin() + pos, copy_count));
if (edge_type)
edge_type_copies.push_back(std::make_tuple(
edge_type_->back().begin() + current_pos_, edge_type->begin() + pos, copy_count));

count -= copy_count;
pos += copy_count;
current_pos_ += copy_count;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if count = 1000, src_.back().size() = 100, and current_pos_ = 0?

At the end of the first loop, copy_count = 100, count = 900, pos=100, current_pos_=100. From the second loop, copy_count=0 and this loop won't finish or am I missing something?

Shouldn't we allocate additional buffers and reset current_pos_ to 0 for this loop to finish?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes... not sure how I missed that, the original code had that logic, I imagine I accidentally deleted that. I'll add that back in.

});

handle.raft_handle().sync_stream();
handle.raft_handle().sync_stream(handle.get_stream());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we add get_stream() to mtmg::handle, what about adding sync_stream to mtmg::handle as well?


raft::handle_t tmp_handle;

size_t n_streams{16};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why 16?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I needed that many for one of the tests I ran :-)

I'll make that a parameter. Any suggestion on a good default?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe # of GPUs? (assuming that 1 stream per thread and # threads == # GPUs)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each GPU will have its own pool of streams. The pool so far is used by different thread ranks copying data to the GPU independently.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added it as a parameter.

});

thrust::gather(handle.raft_handle().get_thrust_policy(),
thrust::gather(rmm::exec_policy(handle.get_stream()),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about adding (mtmg::)handle.get_thrust_policy()?

Copy link
Contributor

@seunghwak seunghwak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me (besides the reason behind 4 in the code, some documentation will be helpful).


auto instance_manager = resource_manager.create_instance_manager(
resource_manager.registered_ranks(), instance_manager_id);
resource_manager.registered_ranks(), instance_manager_id, 4);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is 4 here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made this a constant, added a comment describing why it's 4 in the latest push.

@ChuckHastings
Copy link
Collaborator Author

/merge

@rapids-bot rapids-bot bot merged commit 9b28458 into rapidsai:branch-23.12 Oct 24, 2023
@ChuckHastings ChuckHastings deleted the mtmg_small_improvements branch December 1, 2023 21:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

improvement Improvement / enhancement to an existing function non-breaking Non-breaking change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants