Skip to content

ARROW-12560: [C++] Add scheduling option for Future callbacks#10258

Closed
westonpace wants to merge 9 commits into
apache:masterfrom
westonpace:feature/ARROW-12560--c-investigate-utilizing-aggressive-thread-task
Closed

ARROW-12560: [C++] Add scheduling option for Future callbacks#10258
westonpace wants to merge 9 commits into
apache:masterfrom
westonpace:feature/ARROW-12560--c-investigate-utilizing-aggressive-thread-task

Conversation

@westonpace

@westonpace westonpace commented May 6, 2021

Copy link
Copy Markdown
Member

Previously a future's callbacks would always run synchronously, either as part of Future::MarkFinished or as part of Future::AddCallback. Executor::Transfer made it possible to schedule continuations on a new thread but it would only take effect if the transferred future's callbacks were added before the source future finished. There are times when the desired behavior is to spawn a new thread task even if the source future is finished already.

This PR adds three scheduling options:

  • Never - The default (and existing) behavior, never spawn a new task
  • IfUnfinished - Spawn a new task only if the future isn't already finished when the callback is added
  • Always - Always spawn a new task, on both finished and unfinished futures, regardless of destination thread pool idleness.

The Never option doesn't make any sense for transferring so the transfer only has two choices (always or if unfinished).

@github-actions

github-actions Bot commented May 6, 2021

Copy link
Copy Markdown

@westonpace

Copy link
Copy Markdown
Member Author

CC @pitrou I think you referenced this in your latest execution engine PR.

@westonpace westonpace marked this pull request as draft May 6, 2021 09:35
@pitrou

pitrou commented May 10, 2021

Copy link
Copy Markdown
Member

I'm not sure why you're suggesting to add so much sophistication. To me there are only two interesting options: "always" and "if unfinished". So we could have Transfer (transfer always) vs. TransferUnfinished.

@westonpace westonpace force-pushed the feature/ARROW-12560--c-investigate-utilizing-aggressive-thread-task branch from c95244a to e992bb6 Compare May 24, 2021 22:05
@westonpace

Copy link
Copy Markdown
Member Author

Since I'm working on work stealing at the thread pool level I agree that idle is no longer needed. I've cleaned this up and rebased. It's much simpler than it was before.

@westonpace westonpace marked this pull request as ready for review May 25, 2021 00:33
@westonpace

Copy link
Copy Markdown
Member Author

Also, I ran into a bit of trouble with the future callback's weak reference to the future. Before we could just assume it was valid since all callbacks were completed before MarkFinished was completed. Now, it is possible for a future to schedule a callback and that callback to far outlive the call to MarkFinished. So now when a callback is scheduled (run on an executor) we make a copy of the FutureImpl's shared_ptr to keep it alive until that callback has a chance to run.

@pitrou pitrou left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay. This looks good to me on the principle.

Comment thread cpp/src/arrow/util/future.cc Outdated
Comment thread cpp/src/arrow/util/future.cc Outdated

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nullptr

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

Comment thread cpp/src/arrow/util/future.cc Outdated

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why "a copy"? It's not clear to me where a copy is being made.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The copy is a few lines down when we call shared_from_this. I'll move the comment and make it more explicit.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's only the shared_ptr copy, then I'm not sure it's worth mentioning.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the more important thing is that we are intentionally extending the lifetime of the future. I reworded the comment a bit and dropped the "copy". I can always remove it if we want.

Comment thread cpp/src/arrow/util/future.cc Outdated
Comment thread cpp/src/arrow/util/future.cc Outdated

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The coding conventions prohibit passing mutable lrefs. You could make this a CallbackRecord&&, for example.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment thread cpp/src/arrow/util/future.h Outdated

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should avoid using ALL_CAPS names, because of potential clashes with macros (this is a common issue with Windows headers, unfortunately).

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, technically the style guide prefers kAlways but I see Always used more often in Arrow. Although some of the gandiva code uses kAlways. (https://google.github.io/styleguide/cppguide.html#Enumerator_Names). Any preference?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Always sounds fine to me.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. I'll make a PR to add this to the style guide docs as well.

Comment thread cpp/src/arrow/util/future_test.cc Outdated
Comment thread cpp/src/arrow/util/future_test.cc Outdated

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bit weird to have this in a private test file, and the mock executor in a .h.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My rationale was only that DelayedExecutor is only used in future_test.cc while MockExecutor is used in future_test.cc and thread_pool_test.cc but I see your point. I'll move this into test_common.h.

Comment thread cpp/src/arrow/util/future_test.cc Outdated

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NEVER is never tested?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a test.

Comment thread cpp/src/arrow/util/thread_pool_test.cc Outdated

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit, but it would probably be nicer to be able to spell this as TransferAlways(fut).

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@westonpace

Copy link
Copy Markdown
Member Author

@pitrou Don't worry about the delay, I've been plenty busy elsewhere. I have a just a few follow-up questions and then I'll make the changes.

@westonpace westonpace force-pushed the feature/ARROW-12560--c-investigate-utilizing-aggressive-thread-task branch 3 times, most recently from c038210 to a4bcf0f Compare June 4, 2021 22:41
@westonpace

Copy link
Copy Markdown
Member Author

Ok, I've addressed the comments and this is ready for review again.

@westonpace westonpace requested a review from pitrou June 5, 2021 02:37
@pitrou pitrou changed the title ARROW-12560: [C++] Investigate utilizing aggressive thread task creation when adding callback to finished future ARROW-12560: [C++] Add scheduling option for Future callbacks Jun 7, 2021
@pitrou pitrou force-pushed the feature/ARROW-12560--c-investigate-utilizing-aggressive-thread-task branch from be5aa79 to 747c498 Compare June 7, 2021 13:30
@pitrou

pitrou commented Jun 7, 2021

Copy link
Copy Markdown
Member

Thanks for the update @westonpace . I'll merge once CI passes.

@pitrou pitrou closed this in e7b6c4a Jun 7, 2021
@westonpace westonpace deleted the feature/ARROW-12560--c-investigate-utilizing-aggressive-thread-task branch January 6, 2022 08:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants