Skip to content

Fix race condition that causes TimerThread to hang during shutdown#2986

Merged
wwbmmm merged 3 commits into
apache:masterfrom
gitccl:fix_timer_thread
Jun 16, 2025
Merged

Fix race condition that causes TimerThread to hang during shutdown#2986
wwbmmm merged 3 commits into
apache:masterfrom
gitccl:fix_timer_thread

Conversation

@gitccl

@gitccl gitccl commented Jun 7, 2025

Copy link
Copy Markdown
Contributor

What problem does this PR solve?

Problem Summary:

During shutdown, stop_and_join() sets _stop = true, sets _nearest_run_time = 0, and calls futex_wake_private() to wake up the timer thread. However, due to a race condition, the timer thread may:

  1. See _stop == false and enter the loop.
  2. Block on acquiring _mutex while stop_and_join() holds it.
  3. After stop_and_join() releases the mutex, the timer thread acquires it and sets _nearest_run_time = int64_max, overwriting the earlier _nearest_run_time = 0.
  4. Miss the wake-up signal because at the time stop_and_join() called futex_wake_private(), the timer thread had not yet entered futex wait.

As a result, after finishing all remaining tasks, the timer thread may enter futex wait with no further wake-up, causing the program to hang on pthread_join.

What is changed and the side effects?

Changed:

Side effects:

  • Performance effects:

  • Breaking backward compatibility:


Check List:

Comment thread src/bthread/timer_thread.cpp Outdated
@wwbmmm

wwbmmm commented Jun 14, 2025

Copy link
Copy Markdown
Contributor

LGTM

@chenBright chenBright left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@wwbmmm wwbmmm merged commit d4e46bd into apache:master Jun 16, 2025
15 checks passed
@gitccl gitccl deleted the fix_timer_thread branch June 16, 2025 02:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants