Skip to content

[HLK] Modify Wave match test logic to support modifications in different lanes and vector position#7991

Merged
joaosaffran merged 20 commits into
microsoft:mainfrom
joaosaffran:hlk/improve-wave-match-test-coverage
Jan 14, 2026
Merged

[HLK] Modify Wave match test logic to support modifications in different lanes and vector position#7991
joaosaffran merged 20 commits into
microsoft:mainfrom
joaosaffran:hlk/improve-wave-match-test-coverage

Conversation

@joaosaffran
Copy link
Copy Markdown
Collaborator

This patch modifies the Wave Match test to test modifications in different lanes and vector
indexes. This is achieved by forcing lanes 0, WAVE_SIZE/2 and WAVE_SIZE -1, to modify
the vector at indexes 0, WAVE_SIZE/2 or WAVE_SIZE -1, respectively.

Comment thread tools/clang/unittests/HLSLExec/LongVectors.cpp Outdated
Comment thread tools/clang/unittests/HLSLExec/LongVectors.cpp Outdated
Comment thread tools/clang/unittests/HLSLExec/LongVectors.cpp Outdated
Comment thread tools/clang/unittests/HLSLExec/LongVectors.cpp Outdated
@joaosaffran joaosaffran requested a review from damyanp December 9, 2025 19:57
Copy link
Copy Markdown
Contributor

@alsepkow alsepkow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made some comments. It looks like we need a few fixes on this iteration still.

Comment thread tools/clang/unittests/HLSLExec/LongVectors.cpp Outdated
const UINT HighWaves = NumWaves - LowWaves;
LowWaveMask = (LowWaves < 64) ? (1ULL << LowWaves) - 1 : ~0ULL;
HighWaveMask = (HighWaves < 64) ? (1ULL << HighWaves) - 1 : ~0ULL;
LowBits &= LowWaveMask;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The assignment to LowBits and HighBits seems redundant? They're initialized to zero so the result of these operations will still always just be 0?

Comment thread tools/clang/unittests/HLSLExec/LongVectors.cpp Outdated
Comment thread tools/clang/unittests/HLSLExec/LongVectors.cpp Outdated
Comment thread tools/clang/unittests/HLSLExec/LongVectors.cpp Outdated
Comment thread tools/clang/unittests/HLSLExec/LongVectors.cpp Outdated
Comment thread tools/clang/unittests/HLSLExec/LongVectors.cpp Outdated
Comment thread tools/clang/unittests/HLSLExec/LongVectors.cpp Outdated
@github-project-automation github-project-automation Bot moved this from New to In progress in HLSL Roadmap Dec 12, 2025
@joaosaffran joaosaffran requested a review from alsepkow January 5, 2026 23:35
Comment thread tools/clang/unittests/HLSLExec/LongVectors.cpp Outdated
@joaosaffran joaosaffran force-pushed the hlk/improve-wave-match-test-coverage branch from d884ec7 to f0d2cfd Compare January 7, 2026 20:04
@joaosaffran joaosaffran force-pushed the hlk/improve-wave-match-test-coverage branch from 79cc228 to 2138c2a Compare January 7, 2026 20:05
@joaosaffran joaosaffran requested a review from alsepkow January 7, 2026 20:40
Comment thread tools/clang/unittests/HLSLExec/LongVectors.cpp Outdated
Comment thread tools/clang/unittests/HLSLExec/LongVectors.cpp Outdated
Comment thread tools/clang/unittests/HLSLExec/LongVectors.cpp Outdated
Comment thread tools/clang/unittests/HLSLExec/LongVectors.cpp Outdated
return Word;
}

void StoreWords(UINT *Dest, std::bitset<128> LanesState) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update LanesState to be a const reference to avoid the copy every time this is called.

Comment thread tools/clang/unittests/HLSLExec/LongVectors.cpp

const uint64_t LowExpected = ~1ULL & LowWaveMask;
const uint64_t HighExpected = ~0ULL & HighWaveMask;
const uint64_t LowActiveLanes = (LanesState & LowWaveMask).to_ullong();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think LanesState is intended to represent if we expect the values between 'this' lane and other lanes to match? So, I don't think it actually represents a state? In other words, 'ChangedLanes' and 'UnchangedLanes' are intended to represent the expected values on lane for the comparison of each vector element across lanes (the definition of the WaveMatch intrinsic).

You could better convey this by changing the names 'UnchangedLanes' to 'DefaultExpectedValue'. And 'ChangedLanes' to 'ExpectedValue'.

const UINT MidLaneID = WaveSize / 2;
const UINT LastLaneID = std::min(WaveSize - 1, VectorSize - 1);

std::bitset<128> UnchangedLanes;
Copy link
Copy Markdown
Contributor

@alsepkow alsepkow Jan 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I suggesting adding a comment to further clarify the reason for a bitest.
// Use a std::bitset<128> to represent the uint4 returned by WaveMatch as its convenient this way in c++ #Resolved

@joaosaffran joaosaffran requested a review from alsepkow January 9, 2026 20:00
}

void WriteExpectedValueForLane(UINT *Dest, const UINT Lane,
const std::bitset<128> &LanesState) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
const std::bitset<128> &LanesState) {
const std::bitset<128> &ExpectedValue) {

HighWaveMask = ComputeWaveMask(HighWaves);
}

void WriteExpectedValueForLane(UINT *Dest, const UINT Lane,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
void WriteExpectedValueForLane(UINT *Dest, const UINT Lane,
void WriteExpectedValueForLane(UINT *Dest, const UINT LaneID,

// WaveMatch as well.
// For this test, the shader arranges it so that lanes 0, WAVE_SIZE/2 and
// WAVE_SIZE-1 are different from all the other lanes, also those
// lanes modify the vector at positions 0, WAVE_SIZE/2 and WAVE_SIZE-1.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'WAVE_SIZE-1' Isn't accurate if the wave size is larger than the vector size


struct WaveMatchExpectedResultWritter {
private:
UINT LowWaves;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LowWave and HighWaves don't need to be members. You only use them to compute the wave mask when constructing the WaveMatchExpectedResultWriter

const uint64_t HighActiveLanes =
((LanesState >> 64) & HighWaveMask).to_ullong();

const UINT LaneIndex = 4 * Lane;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Naming, this isn't a 'LaneIndex'.
Just change it to something simple like 'I'


const UINT LaneIndex = 4 * Lane;
Dest[LaneIndex + 0] = static_cast<UINT>(LowActiveLanes);
Dest[LaneIndex + 1] = static_cast<UINT>(LowActiveLanes << 32);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you test this with values in the 'upper' half of the low/high lanes?
I think this and the subsequent shift should be a right shift, not a left shift?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 32 bit shifts are correct, I just investigated it. I fixed another bug that I found investigating this though

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You still swapped them to RHS though?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, I change those to right shift, sorry for the confusion

@joaosaffran joaosaffran requested a review from alsepkow January 9, 2026 23:50
WaveMatchExpectedResultWritter(UINT WaveSize) {
const UINT LowWaves = std::min(64U, WaveSize);
const UINT HighWaves = WaveSize - LowWaves;
LowWaveMask = ComputeWaveMask(LowWaves);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that we only ever call ComputeWaveMask with a value of 64 or less we have no need for the helper anymore.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the wrapper and made that a one-liner, however, I still need to check if the shift is below 64, because there is a bug when shifting 64 bits, that cause the number to be zeroed instead of being filled with one, damyan and I investigated and found this issue together.

(LowWaves < 64) ? (1ULL << LowWaves) - 1 : ~0ULL;
void WriteExpectedValueForLane(UINT *Dest, const UINT LaneID,
const std::bitset<128> &ExpectedValue) {
const uint64_t LowActiveLanes = (ExpectedValue & LowWaveMask).to_ullong();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: naming, while there is a relationship between the active lanes and the results. These values don't explicitly represent active/inactive lanes.

I would just call these low/high (or lo/hi as I've seen that in multiple places referencing similar things).

// all the other lanes. Besides that all other lines write their result of
// WaveMatch as well.
static constexpr std::bitset<128> ComputeWaveMask(UINT NumWaves) {
return (NumWaves < 64) ? (1ULL << NumWaves) - 1 : ~0UL;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a note that ~0UL should have been ~0ULL, but we're getting rid of that anyways.

@joaosaffran joaosaffran requested a review from alsepkow January 12, 2026 20:32
Copy link
Copy Markdown
Contributor

@alsepkow alsepkow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Copy Markdown
Member

@damyanp damyanp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, some suggestions and a question.

Comment thread tools/clang/unittests/HLSLExec/LongVectors.cpp
Comment thread tools/clang/unittests/HLSLExec/LongVectors.cpp Outdated
Comment thread tools/clang/unittests/HLSLExec/LongVectors.cpp
Comment on lines +4426 to +4427
// Making sure all lanes finish updating their vectors.
AllMemoryBarrierWithGroupSync();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of interest, did you find that this was needed for the test to work?

I was under impression that this test arranged for only a single wave to run at a time, so I wouldn't think that this would have any effect.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree that the small size, and the test complexity, do make that likely unnecessary.

I just didn't want to rely on chance, I added that as a precaution to make sure my test results are always consistent.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this test is designed to run a single wave then things will go wrong if more than one wave ends up running. Sprinkling memory barriers in because they might be needed can hide other bugs as well as make it harder to understand what the code is doing.

Seeing this I spent a good few minutes trying to read through to code to figure out where more than one wave might be dispatched.

Copy link
Copy Markdown
Member

@damyanp damyanp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, one editorial suggestion.


static void WriteExpectedValueForLane(UINT *Dest, const UINT LaneID,
const std::bitset<128> &ExpectedValue) {
// We need the mask to always be 32 bits, this calculation assurers that.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix typo:

Suggested change
// We need the mask to always be 32 bits, this calculation assurers that.
// We need the mask to always be 32 bits, this calculation assures that.

My preference would be to remove this comment, it's pretty clear what the code is doing without it and the IMO the comment makes it harder to read.

Suggested change
// We need the mask to always be 32 bits, this calculation assurers that.

@joaosaffran joaosaffran requested a review from alsepkow January 13, 2026 21:55
@joaosaffran joaosaffran requested a review from damyanp January 13, 2026 21:55
@joaosaffran joaosaffran enabled auto-merge (squash) January 13, 2026 22:13
Copy link
Copy Markdown
Contributor

@alsepkow alsepkow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@joaosaffran joaosaffran merged commit f255809 into microsoft:main Jan 14, 2026
12 checks passed
@github-project-automation github-project-automation Bot moved this from In progress to Done in HLSL Roadmap Jan 14, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

4 participants