Skip to content

branch-4.0: [fix](load_stream) Fix flush token deadlock by ensuring wait_for_flush_tasks is called before destruction #60284#60285

Merged
yiguolei merged 1 commit into
apache:branch-4.0from
liaoxin01:fix-load-stream-flush-token-deadlock
Jan 28, 2026
Merged

branch-4.0: [fix](load_stream) Fix flush token deadlock by ensuring wait_for_flush_tasks is called before destruction #60284#60285
yiguolei merged 1 commit into
apache:branch-4.0from
liaoxin01:fix-load-stream-flush-token-deadlock

Conversation

@liaoxin01

Copy link
Copy Markdown
Contributor

pick from:#60284

@liaoxin01 liaoxin01 requested a review from yiguolei as a code owner January 27, 2026 14:46
Copilot AI review requested due to automatic review settings January 27, 2026 14:46
@hello-stephen

Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@liaoxin01

Copy link
Copy Markdown
Contributor Author

run buildall

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a deadlock issue in load_stream by ensuring flush tasks are properly waited for before TabletStream destruction. The fix reverts the use of shared_from_this() and instead relies on explicit lifetime management through the IndexStream destructor.

Changes:

  • Removed std::enable_shared_from_this inheritance from TabletStream and all shared_from_this() usages
  • Added wait_for_flush_tasks() method to TabletStream to safely wait for all flush tasks before destruction
  • Added IndexStream destructor to ensure all TabletStreams wait for flush tasks before cleanup

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
be/src/runtime/load_stream.h Removed shared_from_this inheritance, added wait_for_flush_tasks() declaration, added IndexStream destructor, added _flush_tasks_done flag
be/src/runtime/load_stream.cpp Replaced shared_from_this with this in lambdas, implemented wait_for_flush_tasks() and IndexStream destructor, refactored pre_close() to use wait_for_flush_tasks()

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread be/src/runtime/load_stream.cpp
@liaoxin01 liaoxin01 force-pushed the fix-load-stream-flush-token-deadlock branch from 1529b65 to ed0add4 Compare January 27, 2026 15:08
@liaoxin01

Copy link
Copy Markdown
Contributor Author

run buildall

@hello-stephen

Copy link
Copy Markdown
Contributor

BE UT Coverage Report

Increment line coverage 77.78% (28/36) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 52.72% (18902/35852)
Line Coverage 35.84% (175554/489826)
Region Coverage 32.43% (135628/418238)
Branch Coverage 33.33% (58864/176591)

@liaoxin01 liaoxin01 force-pushed the fix-load-stream-flush-token-deadlock branch from ed0add4 to 3e507a7 Compare January 28, 2026 01:36
…h_tasks is called before destruction

Problem:
When TabletStream is destroyed without pre_close() being called (e.g., on_idle_timeout scenario),
the _flush_token destructor calls shutdown() which triggers deadlock detection if called from
the pool thread.

Root cause:
- on_idle_timeout() directly calls brpc::StreamClose() without calling LoadStream::close()
- This triggers the destruction chain without calling pre_close() on TabletStreams
- If flush tasks are still running, TabletStream may be destroyed in pool thread

Solution:
- Add IndexStream::~IndexStream() to ensure wait_for_flush_tasks() is called on all TabletStreams
- Add TabletStream::wait_for_flush_tasks() to wait for all flush tasks to complete
- This ensures _flush_token is properly handled before TabletStream destruction
- Revert commit c18ef17 (shared_from_this) as it is no longer needed
@liaoxin01 liaoxin01 force-pushed the fix-load-stream-flush-token-deadlock branch from 3e507a7 to 2a1ef37 Compare January 28, 2026 01:37
@liaoxin01

Copy link
Copy Markdown
Contributor Author

run buildall

@doris-robot

Copy link
Copy Markdown

BE UT Coverage Report

Increment line coverage 78.38% (29/37) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 52.71% (18899/35852)
Line Coverage 35.82% (175477/489826)
Region Coverage 32.48% (135823/418238)
Branch Coverage 33.33% (58860/176591)

@yiguolei

Copy link
Copy Markdown
Contributor

skip buildall

@github-actions

Copy link
Copy Markdown
Contributor

PR approved by at least one committer and no changes requested.

@github-actions github-actions Bot added approved Indicates a PR has been approved by one committer. reviewed labels Jan 28, 2026
@github-actions

Copy link
Copy Markdown
Contributor

PR approved by anyone and no changes requested.

@yiguolei yiguolei merged commit 947f495 into apache:branch-4.0 Jan 28, 2026
27 of 29 checks passed
ybtsdst pushed a commit to ybtsdst/doris that referenced this pull request Feb 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants