[fix](load_stream) Fix flush token deadlock by ensuring wait_for_flush_tasks is called before destruction#60284
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
…ait_for_flush_tasks is called before destruction apache#60284 Problem: When TabletStream is destroyed without pre_close() being called (e.g., on_idle_timeout scenario), the _flush_token destructor calls shutdown() which triggers deadlock detection if called from the pool thread. Root cause: - on_idle_timeout() directly calls brpc::StreamClose() without calling LoadStream::close() - This triggers the destruction chain without calling pre_close() on TabletStreams - If flush tasks are still running, TabletStream may be destroyed in pool thread Solution: - Add IndexStream::~IndexStream() to ensure wait_for_flush_tasks() is called on all TabletStreams - Add TabletStream::wait_for_flush_tasks() to wait for all flush tasks to complete - This ensures _flush_token is properly handled before TabletStream destruction - Revert commit c18ef17 (shared_from_this) as it is no longer needed
|
run buildall |
|
PR approved by at least one committer and no changes requested. |
|
PR approved by anyone and no changes requested. |
There was a problem hiding this comment.
Pull request overview
This PR fixes a deadlock issue that occurs when TabletStream is destroyed without pre_close() being called (e.g., in the on_idle_timeout scenario). The deadlock happened because the flush token's destructor calls shutdown(), which performs deadlock detection and throws if called from the pool thread itself.
Changes:
- Reverts PR #60148's use of
shared_from_this()in async lambdas, replacing with rawthispointer - Adds
TabletStream::wait_for_flush_tasks()to ensure flush tasks complete before destruction - Adds
IndexStream::~IndexStream()to callwait_for_flush_tasks()on all tablet streams - Refactors
TabletStream::pre_close()to call the newwait_for_flush_tasks()method
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| be/src/runtime/load_stream.h | Removes std::enable_shared_from_this from TabletStream, adds wait_for_flush_tasks() method and _flush_tasks_done flag, adds IndexStream destructor |
| be/src/runtime/load_stream.cpp | Implements wait_for_flush_tasks() and IndexStream destructor, reverts lambda captures from shared_from_this() to raw this, refactors pre_close() |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
86e21d6 to
c36ec0e
Compare
…ait_for_flush_tasks is called before destruction apache#60284 Problem: When TabletStream is destroyed without pre_close() being called (e.g., on_idle_timeout scenario), the _flush_token destructor calls shutdown() which triggers deadlock detection if called from the pool thread. Root cause: - on_idle_timeout() directly calls brpc::StreamClose() without calling LoadStream::close() - This triggers the destruction chain without calling pre_close() on TabletStreams - If flush tasks are still running, TabletStream may be destroyed in pool thread Solution: - Add IndexStream::~IndexStream() to ensure wait_for_flush_tasks() is called on all TabletStreams - Add TabletStream::wait_for_flush_tasks() to wait for all flush tasks to complete - This ensures _flush_token is properly handled before TabletStream destruction - Revert commit c18ef17 (shared_from_this) as it is no longer needed
|
run buildall |
TPC-H: Total hot run time: 32864 ms |
TPC-H: Total hot run time: 32400 ms |
ClickBench: Total hot run time: 28.29 s |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
…h_tasks is called before destruction Problem: When TabletStream is destroyed without pre_close() being called (e.g., on_idle_timeout scenario), the _flush_token destructor calls shutdown() which triggers deadlock detection if called from the pool thread. Root cause: - on_idle_timeout() directly calls brpc::StreamClose() without calling LoadStream::close() - This triggers the destruction chain without calling pre_close() on TabletStreams - If flush tasks are still running, TabletStream may be destroyed in pool thread Solution: - Add IndexStream::~IndexStream() to ensure wait_for_flush_tasks() is called on all TabletStreams - Add TabletStream::wait_for_flush_tasks() to wait for all flush tasks to complete - This ensures _flush_token is properly handled before TabletStream destruction - Revert commit c18ef17 (shared_from_this) as it is no longer needed
c36ec0e to
cc77be5
Compare
|
run buildall |
TPC-H: Total hot run time: 33039 ms |
ClickBench: Total hot run time: 28.37 s |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
|
PR approved by at least one committer and no changes requested. |
…ait_for_flush_tasks is called before destruction apache#60284 (apache#60285) pick from:apache#60284
What problem does this PR solve?
Issue Number: close #xxx
Related PR: #60148
Problem Summary:
Problem:
When TabletStream is destroyed without pre_close() being called (e.g., on_idle_timeout scenario), the _flush_token destructor calls shutdown() which triggers deadlock detection if called from the pool thread.
Root cause:
Solution:
Release note
None
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)