Skip to content

Add AVX512 accelerated 1D/3D LUTS#1932

Merged
doug-walker merged 9 commits into
AcademySoftwareFoundation:mainfrom
markreidvfx:avx512_v1
Mar 21, 2024
Merged

Add AVX512 accelerated 1D/3D LUTS#1932
doug-walker merged 9 commits into
AcademySoftwareFoundation:mainfrom
markreidvfx:avx512_v1

Conversation

@markreidvfx
Copy link
Copy Markdown
Contributor

@markreidvfx markreidvfx commented Jan 17, 2024

ocioperf.exe --transform tests/data/files/clf/lut1d_32f_example.clf

Line by Line Average, lut dim 65, 3840x2160 image, Intel(R) Xeon(R) Gold 6226R CPU @ 2.90GHz
image

ocioperf.exe --transform tests/data/files/clf/lut3d_preview_tier_test.clf

Line by Line Average, lut dim 33x33x33, 3840x2160 image, Intel(R) Xeon(R) Gold 6226R CPU @ 2.90GHz
image

I've only been able to test on one machine with AVX512. Not exactly the performance gains I was hoping for. I'm still new to the instructions set, maybe there are some more optimizations we could do. There are quite a few AVX512 extensions. I've limited this implementation to just the AVX512F (foundation) instructions. That basically means any AVX512 capable CPU should be able run it.

Github actions use to have more intel CPU's with AVX512 available. Lately I've been getting only AMD EPYC CPU's without AVX512 for CI. I don't think there is anyway to request a specific cpu. This is very frustrating and will make this more difficult to maintain and test.

Copy link
Copy Markdown
Collaborator

@doug-walker doug-walker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Mark!

I'd like to clarify the F16C option in relation to this. I guess if AVX512 is supported then we should assume F16C is always supported too, right? I have a few comments below related to this.

Yes, it's surprising the timings are not faster. Maybe it's bound by memory accesses? I've seen cases where rearranging how the LUTs are stored in memory (at the cost of taking more space) resulted in a speed-up, though not sure if that would help here.

Comment thread tests/cpu/CMakeLists.txt
Comment thread src/OpenColorIO/ops/lut1d/Lut1DOpCPU_AVX512.cpp Outdated
Comment thread tests/cpu/UnitTestMain.cpp Outdated
Comment thread tests/cpu/AVX512_tests.cpp Outdated
Comment thread tests/cpu/AVX512_tests.cpp Outdated
Comment thread src/OpenColorIO/AVX512.h Outdated
Comment thread src/OpenColorIO/AVX512.h Outdated
@markreidvfx
Copy link
Copy Markdown
Contributor Author

markreidvfx commented Jan 22, 2024

I'd like to clarify the F16C option in relation to this. I guess if AVX512 is supported then we should assume F16C is always supported too, right? I have a few comments below related to this.

Yes, the half float conversion instructions are all part of the AVX512F (Foundation) extension.

The exact overlap between AVX and AVX2 and F16c support has never been exactly clear to me. I think AVX2 pretty much guarantees F16c but I think its best to check with those extensions.

@markreidvfx
Copy link
Copy Markdown
Contributor Author

I did a bit more perf testing of this with my old lut3d_perf tool

image

It also turns out that github runners on a private repos are different then the public repo ones. The private ones can have avx512.

I was able to test this pull request on windows with avx512 by setting up a private fork. I kinda used up all my free minutes for the month doing it but all the tests pass 😆

Copy link
Copy Markdown
Collaborator

@doug-walker doug-walker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@doug-walker doug-walker requested a review from remia March 18, 2024 04:19
Comment thread tests/cpu/AVX512_tests.cpp Outdated
Signed-off-by: Mark Reid <mindmark@gmail.com>
Signed-off-by: Mark Reid <mindmark@gmail.com>
Signed-off-by: Mark Reid <mindmark@gmail.com>
Signed-off-by: Mark Reid <mindmark@gmail.com>
Signed-off-by: Mark Reid <mindmark@gmail.com>
Signed-off-by: Mark Reid <mindmark@gmail.com>
Signed-off-by: Mark Reid <mindmark@gmail.com>
Signed-off-by: Mark Reid <mindmark@gmail.com>
Signed-off-by: Mark Reid <mindmark@gmail.com>
@markreidvfx
Copy link
Copy Markdown
Contributor Author

@remia I added your suggestion to all the SIMD tests. I also rebased on top of the current main.

@doug-walker doug-walker merged commit 91e8826 into AcademySoftwareFoundation:main Mar 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants