Skip to content
Merged
Changes from 1 commit
Commits
Show all changes
102 commits
Select commit Hold shift + click to select a range
ca22e28
Rename sgemm_direct_sme1.S to sgemm_direct_sme1_2VLx2VL.S
martin-frbg Aug 18, 2025
22c6607
Use ASMNAME to get symbol name from build system; leave x18 unused as…
martin-frbg Aug 18, 2025
89898fc
Add sgemm_direct_performant for switching between direct and regular …
martin-frbg Aug 18, 2025
08a0032
Build symbol name from build system variables
martin-frbg Aug 18, 2025
53d3bb5
Get symbol name from build system; change b.first to b.mi for AppleCl…
martin-frbg Aug 18, 2025
731f4dd
Add VORTEXM4 settings
martin-frbg Aug 18, 2025
e82bcd2
Update ARM64 sgemm_direct object generation
martin-frbg Aug 18, 2025
0203657
Add sgemm_direct_performant for ARM64
martin-frbg Aug 18, 2025
de91afd
Move SGEMM_DIRECT after the CBLAS parameter check and add sgemm_direc…
martin-frbg Aug 18, 2025
202a7a0
Separate VORTEXM4 from VORTEX and ARMV9SME
martin-frbg Aug 18, 2025
e76c390
Add sgemm_direct_performant for ARM64
martin-frbg Aug 18, 2025
ef0b883
Add sgemm_direct_performant for ARM64
martin-frbg Aug 18, 2025
ccfd017
Enable SME on MacOS and add VORTEXM4 to DYNAMIC_ARCH list
martin-frbg Aug 18, 2025
b0a00fb
Add minimal compiler flags for VORTEXM4
martin-frbg Aug 18, 2025
3097046
Add VORTEXM4 target
martin-frbg Aug 18, 2025
4e2a8c1
Split VORTEXM4 from VORTEX target due to SME support
martin-frbg Aug 18, 2025
18f9582
Add VORTEXM4
martin-frbg Aug 18, 2025
ca542f3
Add VORTEXM4
martin-frbg Aug 18, 2025
a4f5fec
Add compiler options for VORTEXM4
martin-frbg Aug 18, 2025
c794d0a
Add VORTEXM4
martin-frbg Aug 18, 2025
4328c91
relax requirements in compiler SME capability check
martin-frbg Aug 18, 2025
426b5f2
Add compiler options for VORTEXM4
martin-frbg Aug 18, 2025
0bc19a1
Update SME kernel details
martin-frbg Aug 18, 2025
bf98e44
Add VORTEXM4 to DYNAMIC_ARCH list
martin-frbg Aug 18, 2025
4609732
Relax version number requirement for AppleClang
martin-frbg Aug 18, 2025
05dbb54
Delete misplaced file
martin-frbg Aug 19, 2025
107c883
Update SME-related kernels
martin-frbg Aug 19, 2025
501728a
adjust register 20 accesses to 21 after moving x18
martin-frbg Aug 20, 2025
edaa73f
Hide the local 2VLx2VL symbol as static is insufficient for this with…
martin-frbg Aug 20, 2025
1ee8879
Add VORTEXM4
martin-frbg Aug 20, 2025
7f89c6f
smh-based direct sgemm currently requires leading dimensions to be sa…
martin-frbg Aug 23, 2025
8e50b8d
Add d8 to d15 to clobber lists as the code does not expressly save them
martin-frbg Aug 23, 2025
b4fc09e
Add registers d8 to d15 to clobber lists as the code does not express…
martin-frbg Aug 23, 2025
1b88c9c
remove debugging printouts
martin-frbg Aug 24, 2025
2b5d8c7
remove debugging printout
martin-frbg Aug 24, 2025
fc516af
Merge branch 'develop' into issue5414
martin-frbg Oct 1, 2025
ba9d2d2
remove sme from M4 Fortran flags as gfortran couples it with sve
martin-frbg Oct 2, 2025
b3d0bc4
Update Makefile.L3
martin-frbg Oct 2, 2025
4ae3e37
restore 2VLx2VL naming
martin-frbg Oct 2, 2025
c889558
Rework for DYNAMIC_ARCH use and use of SGEMM functions by SSYMM
martin-frbg Oct 2, 2025
20f5ed1
Merge branch 'OpenMathLib:develop' into issue5414
martin-frbg Oct 8, 2025
47a66ae
Update limits based on benchmarking the SME code on Apple M4
martin-frbg Oct 8, 2025
9bfc361
Merge branch 'OpenMathLib:develop' into issue5414
martin-frbg Oct 12, 2025
8211db6
Don't enable SME for VortexM4 when the compiler is gcc (which does no…
martin-frbg Oct 19, 2025
2346d0b
Add HAVE_SME for VortexM4 only with non-gcc compilers
martin-frbg Oct 19, 2025
d7b0fcc
Enable SME-based kernels for VortexM4 with clang-based compilers only
martin-frbg Oct 19, 2025
643a0b5
Allow VortexM4 on the direct_SME fast path only for clang-based compi…
martin-frbg Oct 19, 2025
e01b109
Allow VortexM4 on the same fast path only with non-gcc compilers
martin-frbg Oct 19, 2025
f4ee3ae
Allow VortexM4 on the SME fast path only with non-gcc compilers
martin-frbg Oct 19, 2025
1b591ea
export HAVE_SME setting and exclude VortexM4 from DYNAMIC_ARCH if gcc…
martin-frbg Oct 19, 2025
83d3e0e
fix copy/paste
martin-frbg Oct 19, 2025
682f61e
Add prototype for gotoblas_corename
martin-frbg Oct 19, 2025
ea85b66
Merge branch 'OpenMathLib:develop' into issue5414
martin-frbg Nov 23, 2025
9c0965b
Merge branch 'OpenMathLib:develop' into issue5414
martin-frbg Nov 23, 2025
8c0b13c
Merge branch 'OpenMathLib:develop' into issue5414
martin-frbg Nov 23, 2025
7d35bf6
Add cpuid for Apple M5 (from a PR to the archspec project)
martin-frbg Nov 24, 2025
7e44f62
fix sequence of arm64 sgemm_direct_performance and sgemm_direct_ab
martin-frbg Nov 24, 2025
b0bd49a
Add compiler guard around the M4 HAVE_SME property
martin-frbg Nov 24, 2025
4af1870
Only add dedicated VORTEXM4 if building with LLVM
martin-frbg Nov 24, 2025
b185c9a
small fixes for separating sme and dummy parts
martin-frbg Nov 24, 2025
a683287
rework for dynamic_arch
martin-frbg Nov 24, 2025
705259c
remove redundant HAVE_SME
martin-frbg Nov 24, 2025
7ab8dc1
rework ARM64 SME dependency handling
martin-frbg Nov 24, 2025
c3c857c
fix sequence
martin-frbg Nov 24, 2025
825d3ad
AppleClang does not define feature local_streaming
martin-frbg Nov 28, 2025
e85efb8
remove za from clobber lists
martin-frbg Dec 3, 2025
275eb6f
Add workaround for current LLVM SME bug on Windows
martin-frbg Dec 31, 2025
5c8cf37
Add workaround for current LLVM SME bug on Windows
martin-frbg Dec 31, 2025
b183182
Add workaround for current LLVM SME bug on Windows
martin-frbg Dec 31, 2025
7beba94
Add workaround for current LLVM SME bug on Windows
martin-frbg Dec 31, 2025
f4383d0
syntax fix
martin-frbg Dec 31, 2025
67fd33e
syntax fix
martin-frbg Dec 31, 2025
618bcbd
adjust M4 options to avoid undefined references with non-Apple LLVM
martin-frbg Jan 5, 2026
a18a536
Adjust M4 options to avoid unresolved reference with non-Apple LLVM
martin-frbg Jan 5, 2026
02bc005
reset SVE and SME capabilities between targets
martin-frbg Jan 5, 2026
e384396
Use the armv9 capability set in the compiler test for SME
martin-frbg Jan 5, 2026
2d46f1e
Merge branch 'develop' into issue5414
martin-frbg Jan 9, 2026
a9a6eda
Adapt for DYNAMIC_ARCH with multiple ...preprocess symbols
martin-frbg Jan 9, 2026
6de062c
Merge branch 'OpenMathLib:develop' into issue5414
martin-frbg Jan 11, 2026
aafd3cb
Merge branch 'OpenMathLib:develop' into issue5414
martin-frbg Jan 11, 2026
0a53d91
Move early exit up; don't rely on support_sme() for now
martin-frbg Jan 12, 2026
31150eb
Move early exit up; don't rely on support_sme() for now
martin-frbg Jan 12, 2026
3149408
Merge branch 'OpenMathLib:develop' into issue5414
martin-frbg Jan 12, 2026
10ba0e6
fix missing parentheses on endif
martin-frbg Jan 12, 2026
770ad68
Distinguish AppleClang from LLVM on ARM64
martin-frbg Jan 13, 2026
5e5f9a3
Apple Clang absolutely needs the +sme in the arch string
martin-frbg Jan 13, 2026
31bb6ca
Apple Clang requires +sme in the arch string for M4
martin-frbg Jan 13, 2026
533cab2
add prototype
martin-frbg Jan 13, 2026
bdcb9b7
add prototype
martin-frbg Jan 13, 2026
fa021e1
fix missing endif() and add AppleClang options for M4
martin-frbg Jan 13, 2026
6137236
fix os variable reference
martin-frbg Jan 13, 2026
6735872
drop the cpu=apple-m4 part as nonessential
martin-frbg Jan 14, 2026
d3e4b41
remove cpu=apple-m4 as not required and less portable
martin-frbg Jan 14, 2026
88c583e
Update Makefile
martin-frbg Jan 14, 2026
7ffce1c
fix spurious change of (S)BGEMM parameters for NeoverseV1
martin-frbg Jan 14, 2026
d49df4c
force linking to clang_rt_builtins when using LLVM for AppleM4
martin-frbg Jan 14, 2026
93cd7b9
Force linking to clang_rt_builtins when using LLVM for AppleM4
martin-frbg Jan 14, 2026
7acf919
typo
martin-frbg Jan 14, 2026
faa1875
typo fix
martin-frbg Jan 14, 2026
5133aac
Make VORTEXM4 available in DYNAMIC_ARCH on Apple
martin-frbg Jan 15, 2026
55a10c7
Make VortexM4 available in DYNAMIC_ARCH on MacOS only
martin-frbg Jan 15, 2026
6f225da
make VORTEXM4 MacOS-only for now
martin-frbg Jan 15, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Get symbol name from build system; change b.first to b.mi for AppleCl…
…ang compatibility
  • Loading branch information
martin-frbg authored Aug 18, 2025
commit 53d3bb50cc643c741442c74213ce6568396af33b
8 changes: 4 additions & 4 deletions kernel/arm64/sgemm_direct_sme1_preprocess.S
Original file line number Diff line number Diff line change
Expand Up @@ -37,9 +37,9 @@
#define C6 x15 //Constant6: 3*ncol

.text
.global sgemm_direct_sme1_preprocess
.global ASMNAME //sgemm_direct_sme1_preprocess

sgemm_direct_sme1_preprocess:
ASMNAME: //sgemm_direct_sme1_preprocess:

stp x19, x20, [sp, #-48]!
stp x21, x22, [sp, #16]
Expand Down Expand Up @@ -114,14 +114,14 @@

addvl mat_ptr0, mat_ptr0, #1 //mat_ptr0 += SVLb
whilelt p8.b, mat_ptr0, inner_loop_exit
b.first .Loop_process
b.mi .Loop_process

add mat_mod, mat_mod, C3, lsl #2 //mat_mod+=SVLs*nbc FP32 elements
add mat, mat, C3, lsl #2 //mat+=SVLs*nbc FP32 elements
incw outer_loop_cntr

whilelt p0.s, outer_loop_cntr, nrow
b.first .M_Loop
b.mi .M_Loop

smstop

Expand Down