x86.man/manx86/cmpps.x86 at master · edd255/x86.man · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
'\" t
.nh
.TH "X86-CMPPS" "7" "May 2019" "TTMO" "Intel x86-64 ISA Manual"
.SH NAME
CMPPS - COMPARE PACKED SINGLE PRECISION FLOATING-POINT VALUES
.TS
allbox;
l l l l l
l l l l l .
\fBOpcode/Instruction\fP	\fBOp / En\fP	\fB64/32 bit Mode Support\fP	\fBCPUID Feature Flag\fP	\fBDescription\fP
T{
NP 0F C2 /r ib CMPPS xmm1, xmm2/m128, imm8
T}	A	V/V	SSE	T{
Compare packed single precision floating-point values in xmm2/m128 and xmm1 using bits 2:0 of imm8 as a comparison predicate.
T}
T{
VEX.128.0F.WIG C2 /r ib VCMPPS xmm1, xmm2, xmm3/m128, imm8
T}	B	V/V	AVX	T{
Compare packed single precision floating-point values in xmm3/m128 and xmm2 using bits 4:0 of imm8 as a comparison predicate.
T}
T{
VEX.256.0F.WIG C2 /r ib VCMPPS ymm1, ymm2, ymm3/m256, imm8
T}	B	V/V	AVX	T{
Compare packed single precision floating-point values in ymm3/m256 and ymm2 using bits 4:0 of imm8 as a comparison predicate.
T}
T{
EVEX.128.0F.W0 C2 /r ib VCMPPS k1 {k2}, xmm2, xmm3/m128/m32bcst, imm8
T}	C	V/V	AVX512VL AVX512F	T{
Compare packed single precision floating-point values in xmm3/m128/m32bcst and xmm2 using bits 4:0 of imm8 as a comparison predicate with writemask k2 and leave the result in mask register k1.
T}
T{
EVEX.256.0F.W0 C2 /r ib VCMPPS k1 {k2}, ymm2, ymm3/m256/m32bcst, imm8
T}	C	V/V	AVX512VL AVX512F	T{
Compare packed single precision floating-point values in ymm3/m256/m32bcst and ymm2 using bits 4:0 of imm8 as a comparison predicate with writemask k2 and leave the result in mask register k1.
T}
T{
EVEX.512.0F.W0 C2 /r ib VCMPPS k1 {k2}, zmm2, zmm3/m512/m32bcst{sae}, imm8
T}	C	V/V	AVX512F	T{
Compare packed single precision floating-point values in zmm3/m512/m32bcst and zmm2 using bits 4:0 of imm8 as a comparison predicate with writemask k2 and leave the result in mask register k1.
T}
.TE

.SH INSTRUCTION OPERAND ENCODING
.TS
allbox;
l l l l l l
l l l l l l .
\fBOp/En\fP	\fBTuple Type\fP	\fBOperand 1\fP	\fBOperand 2\fP	\fBOperand 3\fP	\fBOperand 4\fP
A	N/A	ModRM:reg (r, w)	ModRM:r/m (r)	imm8	N/A
B	N/A	ModRM:reg (w)	VEX.vvvv (r)	ModRM:r/m (r)	imm8
C	Full	ModRM:reg (w)	EVEX.vvvv (r)	ModRM:r/m (r)	imm8
.TE

.SH DESCRIPTION
Performs a SIMD compare of the packed single precision floating-point
values in the second source operand and the first source operand and
returns the result of the comparison to the destination operand. The
comparison predicate operand (immediate byte) specifies the type of
comparison performed on each of the pairs of packed values.

.PP
EVEX encoded versions: The first source operand (second operand) is a
ZMM/YMM/XMM register. The second source operand can be a ZMM/YMM/XMM
register, a 512/256/128-bit memory location or a 512/256/128-bit vector
broadcasted from a 32-bit memory location. The destination operand
(first operand) is an opmask register. Comparison results are written to
the destination operand under the writemask k2. Each comparison result
is a single mask bit of 1 (comparison true) or 0 (comparison false).

.PP
VEX.256 encoded version: The first source operand (second operand) is a
YMM register. The second source operand (third operand) can be a YMM
register or a 256-bit memory location. The destination operand (first
operand) is a YMM register. Eight comparisons are performed with results
written to the destination operand. The result of each comparison is a
doubleword mask of all 1s (comparison true) or all 0s (comparison
false).

.PP
128-bit Legacy SSE version: The first source and destination operand
(first operand) is an XMM register. The second source operand (second
operand) can be an XMM register or 128-bit memory location. Bits
(MAXVL-1:128) of the corresponding ZMM destination register remain
unchanged. Four comparisons are performed with results written to bits
127:0 of the destination operand. The result of each comparison is a
doubleword mask of all 1s (comparison true) or all 0s (comparison
false).

.PP
VEX.128 encoded version: The first source operand (second operand) is an
XMM register. The second source operand (third operand) can be an XMM
register or a 128-bit memory location. Bits (MAXVL-1:128) of the
destination ZMM register are zeroed. Four comparisons are performed with
results written to bits 127:0 of the destination operand.

.PP
The comparison predicate operand is an 8-bit immediate:
.IP \(bu 2
For instructions encoded using the VEX prefix and EVEX prefix, bits
4:0 define the type of comparison to be performed (see Table 3-1).
Bits 5 through 7 of the immediate are reserved.
.IP \(bu 2
For instruction encodings that do not use VEX prefix, bits 2:0 define
the type of comparison to be made (see the first 8 rows of Table 3-1).
Bits 3 through 7 of the immediate are reserved.

.PP
The unordered relationship is true when at least one of the two source
operands being compared is a NaN; the ordered relationship is true when
neither source operand is a NaN.

.PP
A subsequent computational instruction that uses the mask result in the
destination operand as an input operand will not generate an exception,
because a mask of all 0s corresponds to a floating-point value of +0.0
and a mask of all 1s corresponds to a QNaN.

.PP
Note that processors with “CPUID.1H:ECX.AVX =0” do not implement the
“greater-than”, “greater-than-or-equal”, “not-greater than”, and
“not-greater-than-or-equal relations” predicates. These comparisons can
be made either by using the inverse relationship (that is, use the
“not-less-than-or-equal” to make a “greater-than” comparison) or by
using software emulation. When using software emulation, the program
must swap the operands (copying registers when necessary to protect the
data that will now be in the destination), and then perform the compare
using a different predicate. The predicate to be used for these
emulations is listed in the first 8 rows of Table 3-7 (Intel® 64 and
IA-32 Architectures Software Developer’s Manual, Volume 2A) under the
heading Emulation.

.PP
Compilers and assemblers may implement the following two-operand
pseudo-ops in addition to the three-operand CMPPS instruction, for
processors with “CPUID.1H:ECX.AVX =0”. See Table 3-4. The compiler
should treat reserved imm8 values as illegal syntax.

.PP
The greater-than relations that the processor does not implement require
more than one instruction to emulate in software and therefore should
not be implemented as pseudo-ops. (For these, the programmer should
reverse the operands of the corresponding less than relations and use
move instructions to ensure that the mask is moved to the correct
destination register and that the source operand is left intact.)
.TP
32 predicates shown in Table 3-5.

.TP
32 predicates shown in Table 3-5.

.TP
32 predicates shown in Table 3-5.

.TP
-5.

.PP
:

.SH OPERATION
.EX
CASE (COMPARISON PREDICATE) OF
    0: OP3 := EQ_OQ; OP5 := EQ_OQ;
    1: OP3 := LT_OS; OP5 := LT_OS;
    2: OP3 := LE_OS; OP5 := LE_OS;
    3: OP3 := UNORD_Q; OP5 := UNORD_Q;
    4: OP3 := NEQ_UQ; OP5 := NEQ_UQ;
    5: OP3 := NLT_US; OP5 := NLT_US;
    6: OP3 := NLE_US; OP5 := NLE_US;
    7: OP3 := ORD_Q; OP5 := ORD_Q;
    8: OP5 := EQ_UQ;
    9: OP5 := NGE_US;
    10: OP5 := NGT_US;
    11: OP5 := FALSE_OQ;
    12: OP5 := NEQ_OQ;
    13: OP5 := GE_OS;
    14: OP5 := GT_OS;
    15: OP5 := TRUE_UQ;
    16: OP5 := EQ_OS;
    17: OP5 := LT_OQ;
    18: OP5 := LE_OQ;
    19: OP5 := UNORD_S;
    20: OP5 := NEQ_US;
    21: OP5 := NLT_UQ;
    22: OP5 := NLE_UQ;
    23: OP5 := ORD_S;
    24: OP5 := EQ_US;
    25: OP5 := NGE_UQ;
    26: OP5 := NGT_UQ;
    27: OP5 := FALSE_OS;
    28: OP5 := NEQ_OS;
    29: OP5 := GE_OQ;
    30: OP5 := GT_OQ;
    31: OP5 := TRUE_US;
    DEFAULT: Reserved
ESAC;
.EE

.SS VCMPPS (EVEX Encoded Versions)
.EX
(KL, VL) = (4, 128), (8, 256), (16, 512)
FOR j := 0 TO KL-1
    i := j * 32
    IF k2[j] OR *no writemask*
        THEN
            IF (EVEX.b = 1) AND (SRC2 *is memory*)
                THEN
                    CMP := SRC1[i+31:i] OP5 SRC2[31:0]
                ELSE
                    CMP := SRC1[i+31:i] OP5 SRC2[i+31:i]
            FI;
            IF CMP = TRUE
                THEN DEST[j] := 1;
                ELSE DEST[j] := 0; FI;
        ELSE DEST[j] := 0
                        ; zeroing-masking onlyFI;
    FI;
ENDFOR
DEST[MAX_KL-1:KL] := 0
.EE

.SS VCMPPS (VEX.256 Encoded Version)
.EX
CMP0 := SRC1[31:0] OP5 SRC2[31:0];
CMP1 := SRC1[63:32] OP5 SRC2[63:32];
CMP2 := SRC1[95:64] OP5 SRC2[95:64];
CMP3 := SRC1[127:96] OP5 SRC2[127:96];
CMP4 := SRC1[159:128] OP5 SRC2[159:128];
CMP5 := SRC1[191:160] OP5 SRC2[191:160];
CMP6 := SRC1[223:192] OP5 SRC2[223:192];
CMP7 := SRC1[255:224] OP5 SRC2[255:224];
IF CMP0 = TRUE
    THEN DEST[31:0] :=FFFFFFFFH;
    ELSE DEST[31:0] := 000000000H; FI;
IF CMP1 = TRUE
    THEN DEST[63:32] := FFFFFFFFH;
    ELSE DEST[63:32] :=000000000H; FI;
IF CMP2 = TRUE
    THEN DEST[95:64] := FFFFFFFFH;
    ELSE DEST[95:64] := 000000000H; FI;
IF CMP3 = TRUE
    THEN DEST[127:96] := FFFFFFFFH;
    ELSE DEST[127:96] := 000000000H; FI;
IF CMP4 = TRUE
    THEN DEST[159:128] := FFFFFFFFH;
    ELSE DEST[159:128] := 000000000H; FI;
IF CMP5 = TRUE
    THEN DEST[191:160] := FFFFFFFFH;
    ELSE DEST[191:160] := 000000000H; FI;
IF CMP6 = TRUE
    THEN DEST[223:192] := FFFFFFFFH;
    ELSE DEST[223:192] :=000000000H; FI;
IF CMP7 = TRUE
    THEN DEST[255:224] := FFFFFFFFH;
    ELSE DEST[255:224] := 000000000H; FI;
DEST[MAXVL-1:256] := 0
.EE

.SS VCMPPS (VEX.128 Encoded Version)
.EX
CMP0 := SRC1[31:0] OP5 SRC2[31:0];
CMP1 := SRC1[63:32] OP5 SRC2[63:32];
CMP2 := SRC1[95:64] OP5 SRC2[95:64];
CMP3 := SRC1[127:96] OP5 SRC2[127:96];
IF CMP0 = TRUE
    THEN DEST[31:0] :=FFFFFFFFH;
    ELSE DEST[31:0] := 000000000H; FI;
IF CMP1 = TRUE
    THEN DEST[63:32] := FFFFFFFFH;
    ELSE DEST[63:32] := 000000000H; FI;
IF CMP2 = TRUE
    THEN DEST[95:64] := FFFFFFFFH;
    ELSE DEST[95:64] := 000000000H; FI;
IF CMP3 = TRUE
    THEN DEST[127:96] := FFFFFFFFH;
    ELSE DEST[127:96] :=000000000H; FI;
DEST[MAXVL-1:128] := 0
.EE

.SS CMPPS (128-bit Legacy SSE Version)
.EX
CMP0 := SRC1[31:0] OP3 SRC2[31:0];
CMP1 := SRC1[63:32] OP3 SRC2[63:32];
CMP2 := SRC1[95:64] OP3 SRC2[95:64];
CMP3 := SRC1[127:96] OP3 SRC2[127:96];
IF CMP0 = TRUE
    THEN DEST[31:0] :=FFFFFFFFH;
    ELSE DEST[31:0] := 000000000H; FI;
IF CMP1 = TRUE
    THEN DEST[63:32] := FFFFFFFFH;
    ELSE DEST[63:32] := 000000000H; FI;
IF CMP2 = TRUE
    THEN DEST[95:64] := FFFFFFFFH;
    ELSE DEST[95:64] := 000000000H; FI;
IF CMP3 = TRUE
    THEN DEST[127:96] := FFFFFFFFH;
    ELSE DEST[127:96] :=000000000H; FI;
DEST[MAXVL-1:128] (Unmodified)
.EE

.SH INTEL C/C++ COMPILER INTRINSIC EQUIVALENT
.EX
VCMPPS __mmask16 _mm512_cmp_ps_mask( __m512 a, __m512 b, int imm);

VCMPPS __mmask16 _mm512_cmp_round_ps_mask( __m512 a, __m512 b, int imm, int sae);

VCMPPS __mmask16 _mm512_mask_cmp_ps_mask( __mmask16 k1, __m512 a, __m512 b, int imm);

VCMPPS __mmask16 _mm512_mask_cmp_round_ps_mask( __mmask16 k1, __m512 a, __m512 b, int imm, int sae);

VCMPPS __mmask8 _mm256_cmp_ps_mask( __m256 a, __m256 b, int imm);

VCMPPS __mmask8 _mm256_mask_cmp_ps_mask( __mmask8 k1, __m256 a, __m256 b, int imm);

VCMPPS __mmask8 _mm_cmp_ps_mask( __m128 a, __m128 b, int imm);

VCMPPS __mmask8 _mm_mask_cmp_ps_mask( __mmask8 k1, __m128 a, __m128 b, int imm);

VCMPPS __m256 _mm256_cmp_ps(__m256 a, __m256 b, int imm)

CMPPS __m128 _mm_cmp_ps(__m128 a, __m128 b, int imm)
.EE

.SH SIMD FLOATING-POINT EXCEPTIONS
Invalid if SNaN operand and invalid if QNaN and predicate as listed in
Table 3-1, Denormal.

.SH OTHER EXCEPTIONS
VEX-encoded instructions, see Table
2-19, “Type 2 Class Exception Conditions.”

.PP
EVEX-encoded instructions, see Table
2-46, “Type E2 Class Exception Conditions.”

.SH SEE ALSO
x86-manpages(7) for a list of other x86-64 man pages.

.SH COLOPHON
This UNOFFICIAL, mechanically-separated, non-verified reference is
provided for convenience, but it may be
incomplete or
broken in various obvious or non-obvious ways.
Refer to Intel® 64 and IA-32 Architectures Software Developer’s Manual
for anything serious.

.br
This page is generated by scripts; therefore may contain visual or semantical bugs. Please report them (or better, fix them) on https://github.com/ttmo-O/x86-manpages.

.br
MIT licensed by TTMO 2025 (Turkish Unofficial Chamber of Reverse Engineers - https://ttmo.re).