Skip to content

chore: fix race condition on git checkout#701

Merged
Skn0tt merged 18 commits intomicrosoft:mainfrom
Skn0tt:race-config
Oct 29, 2025
Merged

chore: fix race condition on git checkout#701
Skn0tt merged 18 commits intomicrosoft:mainfrom
Skn0tt:race-config

Conversation

@Skn0tt
Copy link
Member

@Skn0tt Skn0tt commented Oct 16, 2025

Closes microsoft/playwright#37764. Actions like git checkout trigger config file updates so close to one another that it leads to a race condition in our current code. We trigger one _rebuildModels call per changed config file, and then they run in parallel and add the same configs multiple times.

To fix this, I'm coalescing rebuilds from rapid file changes. While this still leaves potential for a config file change going unnoticed, I think that's more theoretical.

I also considered some other options:

  1. add to the general command queue. means that config updates are potentially delayed until all current actions are done, which can be too long.
  2. add a second command queue just for _rebuildModels. Didn't like it because in git checkout case, we'd sequentially be running one rebuild per changed config file, and we'd call models.clear() for each - so the rebuild would take longer than necessary.
  3. debounce _rebuildModels. Is technically more correct than the chosen approach, but either means added delays, or duplicate builds. Definitely means complexity.

@Skn0tt Skn0tt requested a review from dgozman October 16, 2025 13:37
@Skn0tt Skn0tt self-assigned this Oct 16, 2025
src/extension.ts Outdated
this._models.clear();
this._testTree.startedLoading();
private async _rebuildModels(userGesture: boolean): Promise<void> {
// Coalesce rapid rebuilds triggered by file changes,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is error-prone, because user-gesture rebuild will run concurrently to non-user-gesture one, and again populate models multiple times. Similarly, missing a rebuild after the last config has changed would be unfortunate.

How about we introduce _needsRebuild?: { userGesture: boolean } and _isRebuliding: boolean flags, and rebulid again at the end of _rebuildModels when _needsRebuild is set?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The user-gesture rebuild is invoked via the "reload" button in the test UI, and today it behaves like a last-resort "restart" that you can invoke when the test runner is in some buggy state. I have to use it pretty often, so personally I'd like this to continue working.

This means we can't enqueue a user-gesture rebuild. Having a little raciness here is a fine tradeoff, I think.

I'll think about how we can ensure we don't miss a rebuild, though. Sounds like debouncing is the way to go.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I implemented something debouncing-style.

@Skn0tt Skn0tt requested a review from dgozman October 17, 2025 09:59
src/batched.ts Outdated

this._batch = new Promise<T>(async (resolve, reject) => {
try {
await new Promise(res => setTimeout(res, this._delay));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we avoid waiting for the first invoke()? Why do we need the delay at all?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without a delay, a rapid succession of two file changes will trigger two reloads where one would've sufficed. Not a big deal from a performance standpoint, but reload also means clearing the current models. I worry this leads to flashing UI, even though i've never seen that happen in practice.

@Skn0tt Skn0tt requested a review from dgozman October 20, 2025 13:51
@Skn0tt Skn0tt requested a review from dgozman October 24, 2025 10:04
@Skn0tt
Copy link
Member Author

Skn0tt commented Oct 28, 2025

mirroring offline discussion: let's get rid of the Batched abstraction and put a little threeliner into extension.ts.

@Skn0tt
Copy link
Member Author

Skn0tt commented Oct 29, 2025

I've implemented the above. One cancellation token wasn't enough because I really want to build only once if there's multiple file changes within a 10ms window, so there's two now.

src/extension.ts Outdated
await this._modelRebuild?.result;

const cancel = new this._vscode.CancellationTokenSource();
const rebuild = { result: this._innerRebuildModels(userGesture, cancel.token), token: cancel, needsAnother: false };
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • If we call rebuildModelsImmediately twice in a row, both will start the rebuild process concurrently.
  • I think we can make this and previous method sync - this will ensure there are no race conditions ever. Perhaps the two methods will even merge...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The bug should be fixed now.
I made the previous method sync, but can't for this one. In L123, refreshHandler uses the returned Promise as a signal for showing the spinner in the UI, so I need the Promise to continue working.

@Skn0tt Skn0tt requested a review from dgozman October 29, 2025 15:00
@Skn0tt Skn0tt merged commit 1ae356d into microsoft:main Oct 29, 2025
7 checks passed
@Skn0tt Skn0tt requested a review from pavelfeldman November 18, 2025 11:59
void this._rebuildModelsImmediately(false);
}

private async _rebuildModelsImmediately(userGesture: boolean) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The async return value of this method does not make sense because it spawns off another build upon needsAnother. It is stuck between 'do the work' (inner rebuild model) and 'schedule the work' (present _rebuildModels). Inline it in _rebuildModels (scheduleRebuildModels in the new naming).

private _watchFilesBatch?: vscodeTypes.TestItem[];
private _watchItemsBatch?: vscodeTypes.TestItem[];

private _modelRebuild?: { result: Promise<void>; token: vscodeTypes.CancellationTokenSource; needsAnother: boolean; };
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have a good idea on how much you are saving? I'm worried that if your rebuild has already started, all the commands that produce work are already issued and all the processes have started. You'll think that it canceled, but in reality you'll issue more and more async background work.

@Skn0tt
Copy link
Member Author

Skn0tt commented Nov 19, 2025

@pavelfeldman thanks for the review, I addressed most of it in #715. I need _rebuildModelsImmediately to be asynchronous for its usage in this._testController.refreshHandler (L123):

Screenshot 2025-11-19 at 09 19 55

This button works as an escape hatch. If anything goes wrong, hitting that button reloads almost everything and gets you back into a working state. This PR is a risk to that behaviour, because we're introducing a potentially stalled queue. I need _rebuildModelsImmediately to run immediately, so we don't risk waiting for a stalled queue, and I need it async because that's how VS Code knows when the reload is done.

There's possibility for scheduling more and more async background work, but only if the user keeps on hitting that button. The cancellation token is a best-effort to prevent race conditions, but it's not meant for preventing background work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: VS Code Extension shows configs multiple times

3 participants