Adsk contrib - Add support for neon intrinsic integration#1828
Conversation
|
It seems this PR fixes the universal build from Intel based macOS, so it could fix the failing Python wheels nightly build, will be good to check, once this is merged. |
michdolan
left a comment
There was a problem hiding this comment.
Added a couple really minor notes which could be cleaned up later.
|
Turning off SSE2NEON with On apple arm64 i'm getting 5 test_cpu_exec failed tests I don't think they are related to this pull request as I get the same ones from the current main branch. I haven't dug into it but they look kind of like rounding errors. All the cpu test pass under rosetta. |
remia
left a comment
There was a problem hiding this comment.
Commented on a few minor points. Looking at @markreidvfx latest comment, do we need to add / update the macOS build variants in CI to cover all the new configurations?
Signed-off-by: Cédrik Fuoco <cedrik.fuoco@autodesk.com>
…on unifying the way related cmake variables Signed-off-by: Cédrik Fuoco <cedrik.fuoco@autodesk.com>
…ey serve the same purpose Signed-off-by: Cédrik Fuoco <cedrik.fuoco@autodesk.com>
…n false positive. Stubbing cpuinfo for Apple ARM plateform. Handling Apple M1 correctly and adding support for SSE2NEON. Signed-off-by: Cédrik Fuoco <cedrik.fuoco@autodesk.com>
Signed-off-by: Cédrik Fuoco <cedrik.fuoco@autodesk.com>
…cmake Signed-off-by: Cédrik Fuoco <cedrik.fuoco@autodesk.com>
Signed-off-by: Cédrik Fuoco <cedrik.fuoco@autodesk.com>
… code and fixed CheckSupport compiler flags. Signed-off-by: Cédrik Fuoco <cedrik.fuoco@autodesk.com>
Signed-off-by: Mark Reid <mindmark@gmail.com> Signed-off-by: Cédrik Fuoco <cedrik.fuoco@autodesk.com>
Signed-off-by: Mark Reid <mindmark@gmail.com> Signed-off-by: Cédrik Fuoco <cedrik.fuoco@autodesk.com>
This isolates the compulation units and avoids executing illegal hardware instructions Signed-off-by: Mark Reid <mindmark@gmail.com> Signed-off-by: Cédrik Fuoco <cedrik.fuoco@autodesk.com>
Signed-off-by: Cédrik Fuoco <cedrik.fuoco@autodesk.com>
Signed-off-by: Cédrik Fuoco <cedrik.fuoco@autodesk.com>
… on APPLE platform Signed-off-by: Cédrik Fuoco <cedrik.fuoco@autodesk.com>
Signed-off-by: Cédrik Fuoco <cedrik.fuoco@autodesk.com>
bcd853f to
5bb952a
Compare
…logic into CPUInfoConfig.h.in as well as fixing issue on ARM when building with OCIO_USE_SSE2NEON=OFF. Signed-off-by: Cédrik Fuoco <cedrik.fuoco@autodesk.com>
…clearer. Signed-off-by: Cédrik Fuoco <cedrik.fuoco@autodesk.com>
|
Thanks @remia, @markreidvfx and @michdolan for your comments. Everything should be covered by the new commits.
At least, I think adding an entry for OCIO_USE_SSE2NEON=OFF in the macOS build variants would be good. |
Signed-off-by: Cédrik Fuoco <cedrik.fuoco@autodesk.com>
|
@cedrik-fuoco-adsk I think only 20 jobs can run in parallel on a workflow but nothing prevent us from having more than 20, it will just take longer. Another possibility is also to add new variants to a nightly build if these are rare build settings. |
…takes double the time to do the universal build. OCIO no longuer build a universal binary by default Signed-off-by: Cédrik Fuoco <cedrik.fuoco@autodesk.com>
…integration Signed-off-by: Cédrik Fuoco <105517825+cedrik-fuoco-adsk@users.noreply.github.com>
| #cmakedefine01 OCIO_USE_SSE3 | ||
| #cmakedefine01 OCIO_USE_SSSE3 | ||
| #cmakedefine01 OCIO_USE_SSE4 | ||
| #cmakedefine01 OCIO_USE_SSE42 |
There was a problem hiding this comment.
Building non universal arm64, I still needed to specify -DOCIO_USE_SSE2=ON etc to turn all these on. Even with -DOCIO_USE_SSE2NEON=ON. -DOCIO_USE_SSE2NEON=ON should probably imply all these SSE features are ON.
There was a problem hiding this comment.
They should already be ON by default if building a non-universal arm64.
See in CMakeLists.txt
They are not turned off since the CheckSupportX86SIMD.cmake is not run on an arm64-only build. I've tested on my M1 and I get the expected CPUInfoConfig.h, It should go in that path here:

I've tested with ocioperf using only OCIO_USE_SSE2NEON=ON and OFF. With my heavy_transforms.clf and the line by line test, I get about 226 ms with -DOCIO_USE_SSE2NEON=ON and about 467 ms with -DOCIO_USE_SSE2NEON=OFF.
What do you mean by "still needed"? Did you do another build beforehand or it was from a cleaned directory? @markreidvfx
There was a problem hiding this comment.
They are not turning on for me when I build natively on arm64. I've tried cleaning my env several times.
I'm just doing the basic mkdir build && cd build && cmake .. && cmake --build .
and my CPUInfoConfig.h shows
#if (OCIO_ARCH_X86 && !defined(__aarch64__)) || (defined(__aarch64__) && OCIO_USE_SSE2NEON)
#define OCIO_USE_SSE2 0
#define OCIO_USE_SSE3 0
#define OCIO_USE_SSSE3 0
#define OCIO_USE_SSE4 0
#define OCIO_USE_SSE42 0
It looks like its because CMAKE_OSX_ARCHITECTURES isn't set to anything when I build it this way.
That section in CMakeLists.txt needs it to be arm64.
(APPLE AND "${CMAKE_SYSTEM_PROCESSOR}" MATCHES "arm64" AND "${CMAKE_OSX_ARCHITECTURES}" MATCHES "arm64")
cmake .. -DCMAKE_OSX_ARCHITECTURES=arm64 works.
Should we set CMAKE_OSX_ARCHITECTURES = CMAKE_SYSTEM_PROCESSOR if undefined?
There was a problem hiding this comment.
Oh right, I get it now. I'll prepare a new commit.
There was a problem hiding this comment.
The latest commit should have fixed the issue (I've tested it on my M1). @markreidvfx
There was a problem hiding this comment.
That fixes it for me on my M2 mac too.
… documentations. Signed-off-by: Cédrik Fuoco <cedrik.fuoco@autodesk.com>
|
|
||
| On MacOS, the default is to build for the native architecture. The ``-DCMAKE_OSX_ARCHITECTURES`` option | ||
| may be set to ``arm64;x86_64`` to build the universal binaries. | ||
|
|
There was a problem hiding this comment.
I doubled checked, the order the architectures doesn't cause any issues, but we might want to reword this to show you need quotes around the architectures -DCMAKE_OSX_ARCHITECTURES="arm64;x86_64". It took me like 10mins to realize the semicolon ; in bash or zshell ends a command :p
Signed-off-by: Cédrik Fuoco <cedrik.fuoco@autodesk.com>
…' of https://github.com/autodesk-forks/OpenColorIO into adsk_contrib/add-support-for-neon-intrinsic-integration
|
Thanks for your help with this @markreidvfx ! We updated the install instructions accordingly. Going to merge this as the expected tests passed before we updated the documentation. |
…twareFoundation#1828) * Merging the previous ARM Neon branch Signed-off-by: Cédrik Fuoco <cedrik.fuoco@autodesk.com> * Testing each SIMD variants using a small code snippet and first pass on unifying the way related cmake variables Signed-off-by: Cédrik Fuoco <cedrik.fuoco@autodesk.com> * Removing the usage of USE_SSE in favor of the new OCIO_USE_SSE2 as they serve the same purpose Signed-off-by: Cédrik Fuoco <cedrik.fuoco@autodesk.com> * Using try_compile instead of check_cxx_source_compiles as it was given false positive. Stubbing cpuinfo for Apple ARM plateform. Handling Apple M1 correctly and adding support for SSE2NEON. Signed-off-by: Cédrik Fuoco <cedrik.fuoco@autodesk.com> * Comments clean up and refactor some comments and documentations Signed-off-by: Cédrik Fuoco <cedrik.fuoco@autodesk.com> * Added something in the documentation for Rosetta and small change in cmake Signed-off-by: Cédrik Fuoco <cedrik.fuoco@autodesk.com> * Fixing the build under Rosetta and fixing issue in the MacOS CI. Signed-off-by: Cédrik Fuoco <cedrik.fuoco@autodesk.com> * Using try_compile for CheckSupportSSEUsingSSE2NEON to standardize the code and fixed CheckSupport compiler flags. Signed-off-by: Cédrik Fuoco <cedrik.fuoco@autodesk.com> * use emmintrin.h for only sse2 features Signed-off-by: Mark Reid <mindmark@gmail.com> Signed-off-by: Cédrik Fuoco <cedrik.fuoco@autodesk.com> * Allow F16C to be completely turned off Signed-off-by: Mark Reid <mindmark@gmail.com> Signed-off-by: Cédrik Fuoco <cedrik.fuoco@autodesk.com> * Seperate SIMD test code from code that adds unit tests. This isolates the compulation units and avoids executing illegal hardware instructions Signed-off-by: Mark Reid <mindmark@gmail.com> Signed-off-by: Cédrik Fuoco <cedrik.fuoco@autodesk.com> * remove uneeded checks Signed-off-by: Cédrik Fuoco <cedrik.fuoco@autodesk.com> * use software implementations of f16c intrinsics for SSE2 Signed-off-by: Cédrik Fuoco <cedrik.fuoco@autodesk.com> * Added preprocessor checks for ARM as it is needed for universal build on APPLE platform Signed-off-by: Cédrik Fuoco <cedrik.fuoco@autodesk.com> * Adding missing checks for "not arm64" Signed-off-by: Cédrik Fuoco <cedrik.fuoco@autodesk.com> * Ease the future maintainability a the new OCIO_USE_xyz be moving the logic into CPUInfoConfig.h.in as well as fixing issue on ARM when building with OCIO_USE_SSE2NEON=OFF. Signed-off-by: Cédrik Fuoco <cedrik.fuoco@autodesk.com> * Fixing some spacing, documentations and making some cmake conditions clearer. Signed-off-by: Cédrik Fuoco <cedrik.fuoco@autodesk.com> * Adding a build in ci_workflow for macos USE_OCIO_SSE2NEON=OFF Signed-off-by: Cédrik Fuoco <cedrik.fuoco@autodesk.com> * typo Signed-off-by: Cédrik Fuoco <cedrik.fuoco@autodesk.com> * Changing back all the macOS (except one) builds to x86_64 only as it takes double the time to do the universal build. OCIO no longuer build a universal binary by default Signed-off-by: Cédrik Fuoco <cedrik.fuoco@autodesk.com> * Update the CMakeLists.txt logic to accomodate all scenario and fixing documentations. Signed-off-by: Cédrik Fuoco <cedrik.fuoco@autodesk.com> * Update documentation and remove ocio_use_sse2neon from the CI matrix Signed-off-by: Cédrik Fuoco <cedrik.fuoco@autodesk.com> --------- Signed-off-by: Cédrik Fuoco <cedrik.fuoco@autodesk.com> Signed-off-by: Mark Reid <mindmark@gmail.com> Signed-off-by: Cédrik Fuoco <105517825+cedrik-fuoco-adsk@users.noreply.github.com> Co-authored-by: Mark Reid <mindmark@gmail.com> Co-authored-by: Doug Walker <doug.walker@autodesk.com> Signed-off-by: Brooke <beg9562@rit.edu>
…twareFoundation#1828) * Merging the previous ARM Neon branch Signed-off-by: Cédrik Fuoco <cedrik.fuoco@autodesk.com> * Testing each SIMD variants using a small code snippet and first pass on unifying the way related cmake variables Signed-off-by: Cédrik Fuoco <cedrik.fuoco@autodesk.com> * Removing the usage of USE_SSE in favor of the new OCIO_USE_SSE2 as they serve the same purpose Signed-off-by: Cédrik Fuoco <cedrik.fuoco@autodesk.com> * Using try_compile instead of check_cxx_source_compiles as it was given false positive. Stubbing cpuinfo for Apple ARM plateform. Handling Apple M1 correctly and adding support for SSE2NEON. Signed-off-by: Cédrik Fuoco <cedrik.fuoco@autodesk.com> * Comments clean up and refactor some comments and documentations Signed-off-by: Cédrik Fuoco <cedrik.fuoco@autodesk.com> * Added something in the documentation for Rosetta and small change in cmake Signed-off-by: Cédrik Fuoco <cedrik.fuoco@autodesk.com> * Fixing the build under Rosetta and fixing issue in the MacOS CI. Signed-off-by: Cédrik Fuoco <cedrik.fuoco@autodesk.com> * Using try_compile for CheckSupportSSEUsingSSE2NEON to standardize the code and fixed CheckSupport compiler flags. Signed-off-by: Cédrik Fuoco <cedrik.fuoco@autodesk.com> * use emmintrin.h for only sse2 features Signed-off-by: Mark Reid <mindmark@gmail.com> Signed-off-by: Cédrik Fuoco <cedrik.fuoco@autodesk.com> * Allow F16C to be completely turned off Signed-off-by: Mark Reid <mindmark@gmail.com> Signed-off-by: Cédrik Fuoco <cedrik.fuoco@autodesk.com> * Seperate SIMD test code from code that adds unit tests. This isolates the compulation units and avoids executing illegal hardware instructions Signed-off-by: Mark Reid <mindmark@gmail.com> Signed-off-by: Cédrik Fuoco <cedrik.fuoco@autodesk.com> * remove uneeded checks Signed-off-by: Cédrik Fuoco <cedrik.fuoco@autodesk.com> * use software implementations of f16c intrinsics for SSE2 Signed-off-by: Cédrik Fuoco <cedrik.fuoco@autodesk.com> * Added preprocessor checks for ARM as it is needed for universal build on APPLE platform Signed-off-by: Cédrik Fuoco <cedrik.fuoco@autodesk.com> * Adding missing checks for "not arm64" Signed-off-by: Cédrik Fuoco <cedrik.fuoco@autodesk.com> * Ease the future maintainability a the new OCIO_USE_xyz be moving the logic into CPUInfoConfig.h.in as well as fixing issue on ARM when building with OCIO_USE_SSE2NEON=OFF. Signed-off-by: Cédrik Fuoco <cedrik.fuoco@autodesk.com> * Fixing some spacing, documentations and making some cmake conditions clearer. Signed-off-by: Cédrik Fuoco <cedrik.fuoco@autodesk.com> * Adding a build in ci_workflow for macos USE_OCIO_SSE2NEON=OFF Signed-off-by: Cédrik Fuoco <cedrik.fuoco@autodesk.com> * typo Signed-off-by: Cédrik Fuoco <cedrik.fuoco@autodesk.com> * Changing back all the macOS (except one) builds to x86_64 only as it takes double the time to do the universal build. OCIO no longuer build a universal binary by default Signed-off-by: Cédrik Fuoco <cedrik.fuoco@autodesk.com> * Update the CMakeLists.txt logic to accomodate all scenario and fixing documentations. Signed-off-by: Cédrik Fuoco <cedrik.fuoco@autodesk.com> * Update documentation and remove ocio_use_sse2neon from the CI matrix Signed-off-by: Cédrik Fuoco <cedrik.fuoco@autodesk.com> --------- Signed-off-by: Cédrik Fuoco <cedrik.fuoco@autodesk.com> Signed-off-by: Mark Reid <mindmark@gmail.com> Signed-off-by: Cédrik Fuoco <105517825+cedrik-fuoco-adsk@users.noreply.github.com> Co-authored-by: Mark Reid <mindmark@gmail.com> Co-authored-by: Doug Walker <doug.walker@autodesk.com> Signed-off-by: Doug Walker <Doug.Walker@autodesk.com>
This is a PR that is replacing the PR about ARM Neon here: #1775.
The new PR contains all the previous code changes that were already accepted in the previous PR and contains new commits that were needed to integrate it with the code from the PR Add AVX2/AVX/SSE2 SIMD accelerated 1D/3D LUTS (merged in main already).
Therefore, please ignore the first commit named "Merging the previous ARM Neon branch".