This repository was archived by the owner on Jun 26, 2020. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 199
Improve extractlane and replacelane on x86 #943
Merged
Merged
Changes from all commits
Commits
Show all changes
8 commits
Select commit
Hold shift + click to select a range
4e33dc2
Log compiled and legalized functions
abrown 80af95d
Enable SSSE3 setting when detected on CPU
abrown 9c24a35
Use raw_bitcast when legalizing splat
abrown 5216c2f
Translate the sign-extended and zero-extended versions of extract_lane
abrown b5e6a6c
Avoid extra register movement when lowering the x86 extractlane of a …
abrown 2a03f24
Avoid extra register movement when lowering the x86 scalar_to_vector …
abrown ef7bf77
Avoid extra register movement when lowering an x86 insertlane to a fl…
abrown 8255ef0
Fix documentation
abrown File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I do:
Then my understanding is that, because of the previous commit and this change, the high bits of v3 will be non-zeroes, in contrary to what the comment of scalar_to_vector suggests. Am I understanding correctly? If so, I think this code should be reworked, and it'd be nice to add a test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, good point; I think I added this because when legalizing splat I needed to get ints/booleans into an XMM register but I realized that floats already were in the right place. Perhaps the scalar_to_vector definition should change to not specify that the higher bits are zeroed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a good question: it would introduce more nondeterminism, which wouldn't be confusing if it's properly documented.
One question is, if we decided to do this, what would be the difference between insertlane 0 and scalar_to_vector? I think they'd do the same thing, conceptually, so we could:
For the sake of having a minimal IR, and if having a different instruction doesn't bring any new optimization opportunity (I can't think of any brought by scalar_to_vector with zeroing semantics), I think removing scalar_to_vector would be fine. @sunfishcode, any opinions here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I agree that scalar_to_vector zeroing the high lanes has overlap with insertlane.
What if make
scalar_to_vectorleave nondeterministic values in the high lanes, but document it as a low-level legalization instruction, and that frontends should generally preferinsertlane 0? Nondeterminism is worth avoiding when we can, but there are a variety of situations where it's useful to be able to get a scalar value into a vector register, where one knows that subsequent operations won't care about the high lanes of the vector, and the extra zeroing would be needless overhead.Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I changed the documentation of
scalar_to_vectorto look like:See b2e9e09#diff-7c1b843a5d2e8c61f75e0d350b3f3914R2774-R2778.