You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This PR changes the optimistically aggressive 10ms yield time for HalfJoin to a more conservative 1M records, which ensures that at least some amount of work gets done, at the expense of potentially stalling for longer. This has the potential to mitigate some "spinning" behavior that folks are seeing with this operator.
It is also possible that there is just a bug in the HalfJoin implementation, rather than the yielding policy. At the moment we don't have a reproduction of this issue, so it is hard to know. This PR is meant to be something they can use to test.
This reportedly fixes the issue for at least one instance of 100% spinning with no progress.
Some more diagnosis: the removed 10ms yielding policy has the defect that it (due to the half join implementation) will yield after an initial consolidate of the work to do, and if that work takes 10ms or more will result in no work being done. That consolidation .. shouldn't need to be expensive once things are sorted, but you still need to go and do a whole bunch of memory dereferences to confirm everything.
The issue could also be ameliorated in the half_join implementation by stashing consolidated data in a representation that makes it clear that it has been consolidated (e.g. an enum), which had no value when we wouldn't revisit that work, but now makes sense if we will repeatedly yield and revisit the work (consider 1B inbound records; reviewing them each time we pluck off 1M work items means a massive amount of redundant work).
Merging with permission from @philip-stoev. Follow-up work on introducing tests to the nightly stress tests to look for this sort of issue, and others like it.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR changes the optimistically aggressive 10ms yield time for
HalfJointo a more conservative 1M records, which ensures that at least some amount of work gets done, at the expense of potentially stalling for longer. This has the potential to mitigate some "spinning" behavior that folks are seeing with this operator.It is also possible that there is just a bug in the
HalfJoinimplementation, rather than the yielding policy. At the moment we don't have a reproduction of this issue, so it is hard to know. This PR is meant to be something they can use to test.fixes MaterializeInc/database-issues#2699