Skip to content

DanielShuey/amx_apple_m4_benchmarks.txt

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 

Repository files navigation

AMX on Apple M4

This repo provides evidence that Apple AMX instructions remain functional on M4 hardware.

Thanks to corsix/amx and dougallj, I was able to verify on a 10-core M4 iMac.

M4 iMac

fma16_mat_f16f16_x*y+z (far z)

ZACs 1 Thread 2 Threads 3 Threads 4 Threads 5 Threads 6 Threads
1 per thread 2009.4 GFLOPS 2454.8 GFLOPS 3649.5 GFLOPS 3653.2 GFLOPS 3701.1 GFLOPS 4162.0 GFLOPS
2 per thread 3986.7 GFLOPS 4706.4 GFLOPS 4527.6 GFLOPS 4571.6 GFLOPS 4606.0 GFLOPS 4621.4 GFLOPS

M1 Max

from corsix/amx/fma.md

fma16 in matrix mode, each Z accumulator being f16[32][32]

ZACs 1 Thread 2 Threads 3 Threads 4 Threads 5 Threads 6 Threads
1 per thread 1453.0 GFLOPS 2958.4 GFLOPS 2705.5 GFLOPS 3553.5 GFLOPS 4609.2 GFLOPS 5268.5 GFLOPS
2 per thread 2958.9 GFLOPS 5915.7 GFLOPS 4862.3 GFLOPS 5355.6 GFLOPS 5546.6 GFLOPS 6263.4 GFLOPS

About

Apple M4 AMX benchmarks: AMX lives! multi-TFLOP compute

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors