chore: fix race condition on git checkout#701
Conversation
src/extension.ts
Outdated
| this._models.clear(); | ||
| this._testTree.startedLoading(); | ||
| private async _rebuildModels(userGesture: boolean): Promise<void> { | ||
| // Coalesce rapid rebuilds triggered by file changes, |
There was a problem hiding this comment.
I think this is error-prone, because user-gesture rebuild will run concurrently to non-user-gesture one, and again populate models multiple times. Similarly, missing a rebuild after the last config has changed would be unfortunate.
How about we introduce _needsRebuild?: { userGesture: boolean } and _isRebuliding: boolean flags, and rebulid again at the end of _rebuildModels when _needsRebuild is set?
There was a problem hiding this comment.
The user-gesture rebuild is invoked via the "reload" button in the test UI, and today it behaves like a last-resort "restart" that you can invoke when the test runner is in some buggy state. I have to use it pretty often, so personally I'd like this to continue working.
This means we can't enqueue a user-gesture rebuild. Having a little raciness here is a fine tradeoff, I think.
I'll think about how we can ensure we don't miss a rebuild, though. Sounds like debouncing is the way to go.
There was a problem hiding this comment.
I implemented something debouncing-style.
src/batched.ts
Outdated
|
|
||
| this._batch = new Promise<T>(async (resolve, reject) => { | ||
| try { | ||
| await new Promise(res => setTimeout(res, this._delay)); |
There was a problem hiding this comment.
Can we avoid waiting for the first invoke()? Why do we need the delay at all?
There was a problem hiding this comment.
Without a delay, a rapid succession of two file changes will trigger two reloads where one would've sufficed. Not a big deal from a performance standpoint, but reload also means clearing the current models. I worry this leads to flashing UI, even though i've never seen that happen in practice.
|
mirroring offline discussion: let's get rid of the |
|
I've implemented the above. One cancellation token wasn't enough because I really want to build only once if there's multiple file changes within a 10ms window, so there's two now. |
src/extension.ts
Outdated
| await this._modelRebuild?.result; | ||
|
|
||
| const cancel = new this._vscode.CancellationTokenSource(); | ||
| const rebuild = { result: this._innerRebuildModels(userGesture, cancel.token), token: cancel, needsAnother: false }; |
There was a problem hiding this comment.
- If we call
rebuildModelsImmediatelytwice in a row, both will start the rebuild process concurrently. - I think we can make this and previous method sync - this will ensure there are no race conditions ever. Perhaps the two methods will even merge...
There was a problem hiding this comment.
The bug should be fixed now.
I made the previous method sync, but can't for this one. In L123, refreshHandler uses the returned Promise as a signal for showing the spinner in the UI, so I need the Promise to continue working.
| void this._rebuildModelsImmediately(false); | ||
| } | ||
|
|
||
| private async _rebuildModelsImmediately(userGesture: boolean) { |
There was a problem hiding this comment.
The async return value of this method does not make sense because it spawns off another build upon needsAnother. It is stuck between 'do the work' (inner rebuild model) and 'schedule the work' (present _rebuildModels). Inline it in _rebuildModels (scheduleRebuildModels in the new naming).
| private _watchFilesBatch?: vscodeTypes.TestItem[]; | ||
| private _watchItemsBatch?: vscodeTypes.TestItem[]; | ||
|
|
||
| private _modelRebuild?: { result: Promise<void>; token: vscodeTypes.CancellationTokenSource; needsAnother: boolean; }; |
There was a problem hiding this comment.
Do you have a good idea on how much you are saving? I'm worried that if your rebuild has already started, all the commands that produce work are already issued and all the processes have started. You'll think that it canceled, but in reality you'll issue more and more async background work.
|
@pavelfeldman thanks for the review, I addressed most of it in #715. I need
This button works as an escape hatch. If anything goes wrong, hitting that button reloads almost everything and gets you back into a working state. This PR is a risk to that behaviour, because we're introducing a potentially stalled queue. I need There's possibility for scheduling more and more async background work, but only if the user keeps on hitting that button. The cancellation token is a best-effort to prevent race conditions, but it's not meant for preventing background work. |

Closes microsoft/playwright#37764. Actions like
git checkouttrigger config file updates so close to one another that it leads to a race condition in our current code. We trigger one_rebuildModelscall per changed config file, and then they run in parallel and add the same configs multiple times.To fix this, I'm coalescing rebuilds from rapid file changes. While this still leaves potential for a config file change going unnoticed, I think that's more theoretical.
I also considered some other options:
git checkoutcase, we'd sequentially be running one rebuild per changed config file, and we'd callmodels.clear()for each - so the rebuild would take longer than necessary._rebuildModels. Is technically more correct than the chosen approach, but either means added delays, or duplicate builds. Definitely means complexity.