ACES 2.0 Output Transform performance optimisation by KevinJW · Pull Request #2119 · AcademySoftwareFoundation/OpenColorIO

KevinJW · 2025-02-19T21:32:51Z

As discussed in the meeting this pull request is not complete, but we have a deadline and so I am putting in the request in the current state, I would encourage others to check the code.

There are some test failures, some come from not recalculating the golden values in the CPU path,.

There must still be some lingering differences between the GPU and CPU code paths as the GPU/CPU comparisons fail with larger deltas than typical.

Overall the main algorithm changes come from:

Precomputing a number of the scaling factors, etc during the init stage
Some micro optimisations, reformulating things to use fewer divisions
Redistribution of the sampled hues to be more linear whilst still including explicit sampling of the corners of the gamut. This allows us to have a much narrower search range for the hue table lookup.
Some calculations have been migrated up the "call stack" to allow the results to be reused
Adjustment of the upper hull slope calculation parameters to allow for the elimination of a call to pow replaced with a multiply
Splitting some functions to allow exposure of the Achromatic and using that in the tonescale calculation
The gamut compression was calculating various slopes which nominally should all have been the same value

Things not really done, but could be in the future mostly CPU side and primarily by vectorising:

A number of the functions could be looked at and reformulated to take/return only the attributes they change, e.g. J or M rather than full JMh, might help with a vectorised implementation
The 3x3 matrix uses could be much faster may want to try 3x4 and a simple vectorised code, alternatively...
Generally the whole transform is run per pixel, may be worth looking at running smaller parts of the code on lines of pixels and using intermediate buffers for partial results.
May be worth reforming the pow() calls into appropriate logarithm/exponent form
The Hue lookup binary search whilst heavily improved could probably be unrolled for many of the common gamuts as the search range is often within 3 positions

General observations

The algorithm makes heavy use pow(), so it is heavily influenced by the system C library implementations.

We need finer grained testing for C++ functions rather than at an "Op/Transform" level

GPU shader generation notes:

We may want to add wrappers for compatibility to the shader generator

log10()
atan2()

We have a sign() wrapper but none of the code uses it.

Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com>

replace 100.0 entries when referring to the scale of J Extract calculation of nonlinear compression into functions Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com>

…te values Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com>

Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com>

…CHANGES PIXEL OUTPUT Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com>

Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com>

…th opponent calculation Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com>

Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com>

Avoid looking up cusp twice during inverse Whilst searching for the cusp we have already constrained the search so we do not need to clamp Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com>

Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com>

Migrate rescaling into tonescale s_2 parameter Rename model_gamma to reflect it is actually the inverse Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com>

…evel to avoid repeat init of precomputed values. Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com>

…e recomputation during inverse gamut mapping Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com>

…known. also reduces size of object on stack by not passing the whole table. Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com>

mark up conversion points from external inputs etc Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com>

move more magic constants into const variables factor some of the complex expressions into function (temporarily makes things slower) Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com>

Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com>

restructure find_gamut_boundary_intersection to highlight common patterns. Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com>

rework get_focus_gain to directly computer the slope_gain Share calculation of analytical thereshold Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com>

…ulate J Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com>

next steps would be to factor hue into separate table to improve cache hits followed by redistribution to more uniform hues which should narrow search range Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com>

presmooth cusp values Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com>

…er pixel Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com>

Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com>

…100% the same GPU still recalculates some values Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com>

Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com>

…ional precomputation is now applied during shader generation Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com>

Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com>

Some minor micro optimisations. Further alignment of GPU with CPU code, Tests values need evaluating Some GPU results are different - TBD Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com>

Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com>

alexfry · 2025-02-20T02:09:02Z

Getting some build errors on my side, Doug thinks it might related to AVX intrinsics? :

[ 17%] Building CXX object src/OpenColorIO/CMakeFiles/OpenColorIO.dir/ops/fixedfunction/FixedFunctionOpCPU.cpp.o [ 17%] Building CXX object src/OpenColorIO/CMakeFiles/OpenColorIO.dir/ops/fixedfunction/FixedFunctionOpData.cpp.o [ 17%] Building CXX object src/OpenColorIO/CMakeFiles/OpenColorIO.dir/ops/fixedfunction/FixedFunctionOpGPU.cpp.o [ 17%] Building CXX object src/OpenColorIO/CMakeFiles/OpenColorIO.dir/ops/fixedfunction/FixedFunctionOp.cpp.o In file included from /Users/afry/GitHub/OpenColorIO_kevin/src/OpenColorIO/ops/fixedfunction/ACES2/Transform.cpp:4: In file included from /Users/afry/GitHub/OpenColorIO_kevin/src/OpenColorIO/ops/fixedfunction/ACES2/Transform.h:7: /Users/afry/GitHub/OpenColorIO_kevin/src/OpenColorIO/ops/fixedfunction/ACES2/Common.h:84:44: warning: taking the max of unsigned zero and a value is always equal to the other value [-Wmax-unsigned-zero] 84 | return std::min(nominal_size - 1U, std::max(0U, entry)); | ^~~~~~~~ ~~ /Users/afry/GitHub/OpenColorIO_kevin/src/OpenColorIO/ops/fixedfunction/ACES2/Common.h:84:44: note: remove call to max function and unsigned zero argument 84 | return std::min(nominal_size - 1U, std::max(0U, entry)); | ^~~~~~~~ ~~~ In file included from /Users/afry/GitHub/OpenColorIO_kevin/src/OpenColorIO/ops/fixedfunction/FixedFunctionOpCPU.cpp:10: In file included from /Users/afry/GitHub/OpenColorIO_kevin/src/OpenColorIO/ops/fixedfunction/ACES2/Transform.h:7: /Users/afry/GitHub/OpenColorIO_kevin/src/OpenColorIO/ops/fixedfunction/ACES2/Common.h:84:44: warning: taking the max of unsigned zero and a value is always equal to the other value [-Wmax-unsigned-zero] 84 | return std::min(nominal_size - 1U, std::max(0U, entry)); | ^~~~~~~~ ~~ /Users/afry/GitHub/OpenColorIO_kevin/src/OpenColorIO/ops/fixedfunction/ACES2/Common.h:84:44: note: remove call to max function and unsigned zero argument 84 | return std::min(nominal_size - 1U, std::max(0U, entry)); | ^~~~~~~~ ~~~ /Users/afry/GitHub/OpenColorIO_kevin/src/OpenColorIO/ops/fixedfunction/ACES2/Transform.cpp:273:29: error: unknown type name '__m256' 273 | inline float hsum256_ps_avx(__m256 v) { // v = [ H G | F E | D C | B A ] | ^ 1 warning and 1 error generated. [ 17%] Building CXX object src/OpenColorIO/CMakeFiles/OpenColorIO.dir/ops/gamma/GammaOpCPU.cpp.o make[2]: *** [src/OpenColorIO/CMakeFiles/OpenColorIO.dir/ops/fixedfunction/ACES2/Transform.cpp.o] Error 1 make[2]: *** Waiting for unfinished jobs.... [ 18%] Building CXX object src/OpenColorIO/CMakeFiles/OpenColorIO.dir/ops/gamma/GammaOpData.cpp.o In file included from /Users/afry/GitHub/OpenColorIO_kevin/src/OpenColorIO/ops/fixedfunction/FixedFunctionOpGPU.cpp:10: In file included from /Users/afry/GitHub/OpenColorIO_kevin/src/OpenColorIO/ops/fixedfunction/ACES2/Transform.h:7: /Users/afry/GitHub/OpenColorIO_kevin/src/OpenColorIO/ops/fixedfunction/ACES2/Common.h:84:44: warning: taking the max of unsigned zero and a value is always equal to the other value [-Wmax-unsigned-zero] 84 | return std::min(nominal_size - 1U, std::max(0U, entry)); | ^~~~~~~~ ~~ /Users/afry/GitHub/OpenColorIO_kevin/src/OpenColorIO/ops/fixedfunction/ACES2/Common.h:84:44: note: remove call to max function and unsigned zero argument 84 | return std::min(nominal_size - 1U, std::max(0U, entry)); | ^~~~~~~~ ~~~ 1 warning generated. 1 warning generated. make[1]: *** [src/OpenColorIO/CMakeFiles/OpenColorIO.dir/all] Error 2 make: *** [all] Error 2 [syd-silver0017:~/GitHub/OpenColorIO_kevin/build] afry%

doug-walker · 2025-02-20T03:44:10Z

Thank you for this excellent work @KevinJW !

I've created a new branch here named aces2_optimization. I'd like to propose that you make your PR against that branch rather than main. That way we can merge what you've got so far and the group of us could work in parallel to propose further PRs against that branch.

Don't worry about the AVX-related compilation failures that Alex found on macOS, Cuneyt and I will fix those ASAP and pull in the Metal fix.

Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com>

KevinJW · 2025-02-20T10:22:45Z

I note some of the failing errors relate to C++14 features, e.g. constexpr std::min/max I thought we had set that as our assumed base requirement, but I note it is not explicitly checked for in the cmake files and I see the documentation still allows for C++ 11 https://github.com/AcademySoftwareFoundation/OpenColorIO/blob/main/docs/quick_start/installation.rst

There is a simple fix for this specific failure, but it does open up the supported matrix question.

… to it Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com>

Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com>

KevinJW · 2025-02-20T13:24:35Z

I note some of the build failures appear to come from the build asking for certain instruction levels ,, e.g. sse2 or avx2 etc but when building, the options for flagging to the compiler to target that specific architecture e.g. -mavx2 in the case of GCC don't appear to be on the command lines for the compilation steps, so the code has pre-processor guards around the intrinsics but they are enabled via OCIO_USE_AVX2 even though the compiler has not been allowed to issue those instruction classes.

Trying to force the flags into the specific source file property doesn't seam to work, when I've tried it, perhaps somebody else with some familiarity would like a go.

Brute forcing the option into the CMAKE_CXX_FLAGS does work for my manual builds with gcc.

doug-walker · 2025-02-21T00:08:46Z

Thank you Kevin! I will go ahead and merge this into the aces2_optimization branch. Cuneyt will now work on getting the CPU side finished, with unit tests passing, to unblock Alex. He will then proceed to get the GPU to match the CPU.

I suggest you raise the C++ 11 question on the OCIO Slack. The last time the TSC discussed it, we were open to raising to C++ 14, if there was a good justification.

* ACES 2.0 Output Transform performance optimisation (#2119) * Extend ocioperf to take config file parameter on CLI Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Extend ocioconvert to take config on command line Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Extract tonescale_fwd function Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Extract inverse tonescale function Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Combine c and Z variables in J calculation exponent replace 100.0 entries when referring to the scale of J Extract calculation of nonlinear compression into functions Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Split RGB<->JMh function into two parts to expose opponent intermediate values Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Use function to compute matrix multiply for LMS calculations Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Remove unused member variable from JMhParams structure Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Combine chromatic adaptation weights into LMS matrix (and inverse) - CHANGES PIXEL OUTPUT Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Use matrix form for transforming cone responses to Aab Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Normalise the F_L parameter Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Remove ra and ba related variables to avoid them being out of sync with opponent calculation Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Make A<->J conversion function generic Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Deduplicate Y<->J conversions Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Factor JMh scaling parameters into Aab matrices Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * factor our references to PI, 360 and 180 constants Avoid looking up cusp twice during inverse Whilst searching for the cusp we have already constrained the search so we do not need to clamp Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Add functions to explain some of the calculations Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Further clarify when 100 means reference luminance Migrate rescaling into tonescale s_2 parameter Rename model_gamma to reflect it is actually the inverse Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * migrate init steps performed within other init functions to the top level to avoid repeat init of precomputed values. Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * extract some of the fixed values that only depend on the hue to reduce recomputation during inverse gamut mapping Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Avoid double lookup for reachMaxM value by resolving once the hue is known. also reduces size of object on stack by not passing the whole table. Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Push wrapping of hues to the boundary, mark up conversion points from external inputs etc Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Store gamma values as reciprocals move more magic constants into const variables factor some of the complex expressions into function (temporarily makes things slower) Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Add some missing includes to headers Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * minor cleanup to use std::array instead of plain array for test samples Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Inline reach boundary finding restructure find_gamut_boundary_intersection to highlight common patterns. Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Extract gamut mapper compression function rework get_focus_gain to directly computer the slope_gain Share calculation of analytical thereshold Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Rework gamut mapper to compress absolute M then only recalculate calculate J Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Precalculate maximum search range for cusp lookup next steps would be to factor hue into separate table to improve cache hits followed by redistribution to more uniform hues which should narrow search range Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Experiment with reusing slope calculations in gamut mapper presmooth cusp values Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Add a collection of TODO's Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Restore function mapping table index to hue Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Minor tweaks to tonescale inverse clamp Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Remove duplicate table whilst calculating upper hull gamma Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Add some additional sample points for the upper hull gamma finder Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Slight tidy up of gamma fitting code Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Experiment with alternate smin implementation Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Remove unused function and tidy up comments Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Extract hue search into separate function Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Extract hues into separate table, merge gamma values into their place (gamma values now sampled on cusp hue intervals). Removes extra texture from GPU path. Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Simplify upper hull gamma hue lookup to avoid unneeded lerping as we are sampling the table entries directly Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Split out tonescale function, minor tweaks to Aab->JMh Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Build tables more uniformly, needs some clean up and lots of testing Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Speed up reach corner finding by switching to testing against the Achromatic rather than J limit Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Speed up hull gamma finding by computing values which depend only on the test points and not the gamma values themselves Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Adjust GPU hue lookup to take advantage of more uniform distribution Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Fix GLSL compatibility with hue lookup Remove compiler warnings for unused parameters Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Attempt to simplify table generation code Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Explicilty allow GCC to perform additional optimisations - Needs some discussion Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Add extra entries to reach table to avoid needing to clamp to range during pixel processing Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * GPU move reach Max M sampling to avoid looking it up multiple times per pixel Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Remove smoothing from GPU path, it is baked into the csup Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Fix bug with reach lookup Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Try only wrap hues on input to the shaders Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * rework GPU camut compressor to follow the same algorithm as CPU. Not 100% the same GPU still recalculates some values Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Rework solve_J_intersect to have fewer div instructions Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Adjust GPU code to better align with CPU code's structure, some additional precomputation is now applied during shader generation Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Precompute more scaling factors into matrices and nonlinear functions Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Experiment with unsigned integers for array access Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Bypass one J-> A conversion by saving the Aab computed earlier Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Test intrinsics for compression Norm calculation Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Attempt to calculate sin/cos only once per pixel. Some minor micro optimisations. Further alignment of GPU with CPU code, Tests values need evaluating Some GPU results are different - TBD Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Remove unused parameters Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Try tree vectoriser for gcc Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Add Vectorise option for MSVC Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Remove unused function Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Constexpr std::max is only available in C++ 14 for now avoid the call to it Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Try to fir intrinsic based errors on osome build configurations Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Another C++ 14 usage fix Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Remove check for CLANG left over from testing Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> --------- Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Update ACES2 CPU non-SIMD path (#2122) * - Commenting out the ACES2 SIMD implementation for now to focus on validity of the scalar math. For SIMD we need to do implement run-time switching logic too. - Slight improvements to the unit tests so that we print out the computed error metric as well as the actual and expected values. Helps to see the magnitude of the error. - FixedFunctionOpCPU and BuiltinTransform tests now produce error lines with the same structure & syntax, including the computed error. - Updated the expected values for ACES2 tests with the values the new optimized code produces, this makes all of the of CPU tests pass now. - For ACES2 ops and builtin transforms, the error threshold is increased to 1e-4 - added few, temporary code snippets that dumps the currently produced results, making it easier to update the golden values if needed again. Signed-off-by: cuneyt.ozdas <cuneyt.ozdas@autodesk.com> * - Fixing Linux build Signed-off-by: cuneyt.ozdas <cuneyt.ozdas@autodesk.com> * Making Linux build happy is never easy. Signed-off-by: cuneyt.ozdas <cuneyt.ozdas@autodesk.com> --------- Signed-off-by: cuneyt.ozdas <cuneyt.ozdas@autodesk.com> * Address GPU unit test failures (#2123) * - Weights for cos(3h) and sin(h) in chroma_compress_norm() looks wrong. Fixing the weights makes the GPU tests pass now (except of the inverse output transform which seems to have a separate issue). - If the new weights are correct, I'll need to update the CPU test target values too. Signed-off-by: cuneyt.ozdas <cuneyt.ozdas@autodesk.com> * - Updating the expected values in the CPU tests Signed-off-by: cuneyt.ozdas <cuneyt.ozdas@autodesk.com> * - The remaining GPU test failures were caused by a simple typo where we were passing h instead of J to ocio_tonescale_inv() function. With the fix all the unit tests are happy now. - Since we decided not to include any SIMD implementation in this version, I removed the conditional code paths and left the current SSE & AVX implementations as commented out for future guidence. Signed-off-by: cuneyt.ozdas <cuneyt.ozdas@autodesk.com> --------- Signed-off-by: cuneyt.ozdas <cuneyt.ozdas@autodesk.com> * Remove unused code for old gamut table calculations (#2124) Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> Signed-off-by: cuneyt.ozdas <cuneyt.ozdas@autodesk.com> * Minor code cleanup Signed-off-by: cuneyt.ozdas <cuneyt.ozdas@autodesk.com> * Adding negative A trap on Aab_to_JMh_Shader() per code review Signed-off-by: cuneyt.ozdas <cuneyt.ozdas@autodesk.com> * Adding copysign to tonescale to make it aligned with the CPU implementation. It's possible that on GPU we may never receive negative J due to prior guarding, but for now aligning with the CPU to be on the safer side. Signed-off-by: cuneyt.ozdas <cuneyt.ozdas@autodesk.com> * Add built-in transform round-trip test Signed-off-by: Doug Walker <doug.walker@autodesk.com> * Loosen tolerance for other machines Signed-off-by: Doug Walker <doug.walker@autodesk.com> * Add GPU round-trip tests Signed-off-by: Doug Walker <doug.walker@autodesk.com> * Loosen tolerances for other GPUs Signed-off-by: Doug Walker <doug.walker@autodesk.com> --------- Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> Signed-off-by: cuneyt.ozdas <cuneyt.ozdas@autodesk.com> Signed-off-by: Doug Walker <doug.walker@autodesk.com> Co-authored-by: Kevin Wheatley <kevin.wheatley@framestore.com> Co-authored-by: Doug Walker <doug.walker@autodesk.com>

…undation#2127) * ACES 2.0 Output Transform performance optimisation (AcademySoftwareFoundation#2119) * Extend ocioperf to take config file parameter on CLI Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Extend ocioconvert to take config on command line Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Extract tonescale_fwd function Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Extract inverse tonescale function Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Combine c and Z variables in J calculation exponent replace 100.0 entries when referring to the scale of J Extract calculation of nonlinear compression into functions Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Split RGB<->JMh function into two parts to expose opponent intermediate values Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Use function to compute matrix multiply for LMS calculations Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Remove unused member variable from JMhParams structure Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Combine chromatic adaptation weights into LMS matrix (and inverse) - CHANGES PIXEL OUTPUT Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Use matrix form for transforming cone responses to Aab Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Normalise the F_L parameter Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Remove ra and ba related variables to avoid them being out of sync with opponent calculation Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Make A<->J conversion function generic Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Deduplicate Y<->J conversions Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Factor JMh scaling parameters into Aab matrices Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * factor our references to PI, 360 and 180 constants Avoid looking up cusp twice during inverse Whilst searching for the cusp we have already constrained the search so we do not need to clamp Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Add functions to explain some of the calculations Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Further clarify when 100 means reference luminance Migrate rescaling into tonescale s_2 parameter Rename model_gamma to reflect it is actually the inverse Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * migrate init steps performed within other init functions to the top level to avoid repeat init of precomputed values. Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * extract some of the fixed values that only depend on the hue to reduce recomputation during inverse gamut mapping Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Avoid double lookup for reachMaxM value by resolving once the hue is known. also reduces size of object on stack by not passing the whole table. Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Push wrapping of hues to the boundary, mark up conversion points from external inputs etc Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Store gamma values as reciprocals move more magic constants into const variables factor some of the complex expressions into function (temporarily makes things slower) Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Add some missing includes to headers Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * minor cleanup to use std::array instead of plain array for test samples Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Inline reach boundary finding restructure find_gamut_boundary_intersection to highlight common patterns. Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Extract gamut mapper compression function rework get_focus_gain to directly computer the slope_gain Share calculation of analytical thereshold Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Rework gamut mapper to compress absolute M then only recalculate calculate J Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Precalculate maximum search range for cusp lookup next steps would be to factor hue into separate table to improve cache hits followed by redistribution to more uniform hues which should narrow search range Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Experiment with reusing slope calculations in gamut mapper presmooth cusp values Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Add a collection of TODO's Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Restore function mapping table index to hue Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Minor tweaks to tonescale inverse clamp Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Remove duplicate table whilst calculating upper hull gamma Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Add some additional sample points for the upper hull gamma finder Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Slight tidy up of gamma fitting code Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Experiment with alternate smin implementation Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Remove unused function and tidy up comments Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Extract hue search into separate function Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Extract hues into separate table, merge gamma values into their place (gamma values now sampled on cusp hue intervals). Removes extra texture from GPU path. Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Simplify upper hull gamma hue lookup to avoid unneeded lerping as we are sampling the table entries directly Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Split out tonescale function, minor tweaks to Aab->JMh Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Build tables more uniformly, needs some clean up and lots of testing Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Speed up reach corner finding by switching to testing against the Achromatic rather than J limit Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Speed up hull gamma finding by computing values which depend only on the test points and not the gamma values themselves Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Adjust GPU hue lookup to take advantage of more uniform distribution Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Fix GLSL compatibility with hue lookup Remove compiler warnings for unused parameters Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Attempt to simplify table generation code Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Explicilty allow GCC to perform additional optimisations - Needs some discussion Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Add extra entries to reach table to avoid needing to clamp to range during pixel processing Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * GPU move reach Max M sampling to avoid looking it up multiple times per pixel Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Remove smoothing from GPU path, it is baked into the csup Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Fix bug with reach lookup Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Try only wrap hues on input to the shaders Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * rework GPU camut compressor to follow the same algorithm as CPU. Not 100% the same GPU still recalculates some values Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Rework solve_J_intersect to have fewer div instructions Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Adjust GPU code to better align with CPU code's structure, some additional precomputation is now applied during shader generation Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Precompute more scaling factors into matrices and nonlinear functions Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Experiment with unsigned integers for array access Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Bypass one J-> A conversion by saving the Aab computed earlier Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Test intrinsics for compression Norm calculation Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Attempt to calculate sin/cos only once per pixel. Some minor micro optimisations. Further alignment of GPU with CPU code, Tests values need evaluating Some GPU results are different - TBD Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Remove unused parameters Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Try tree vectoriser for gcc Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Add Vectorise option for MSVC Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Remove unused function Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Constexpr std::max is only available in C++ 14 for now avoid the call to it Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Try to fir intrinsic based errors on osome build configurations Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Another C++ 14 usage fix Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Remove check for CLANG left over from testing Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> --------- Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Update ACES2 CPU non-SIMD path (AcademySoftwareFoundation#2122) * - Commenting out the ACES2 SIMD implementation for now to focus on validity of the scalar math. For SIMD we need to do implement run-time switching logic too. - Slight improvements to the unit tests so that we print out the computed error metric as well as the actual and expected values. Helps to see the magnitude of the error. - FixedFunctionOpCPU and BuiltinTransform tests now produce error lines with the same structure & syntax, including the computed error. - Updated the expected values for ACES2 tests with the values the new optimized code produces, this makes all of the of CPU tests pass now. - For ACES2 ops and builtin transforms, the error threshold is increased to 1e-4 - added few, temporary code snippets that dumps the currently produced results, making it easier to update the golden values if needed again. Signed-off-by: cuneyt.ozdas <cuneyt.ozdas@autodesk.com> * - Fixing Linux build Signed-off-by: cuneyt.ozdas <cuneyt.ozdas@autodesk.com> * Making Linux build happy is never easy. Signed-off-by: cuneyt.ozdas <cuneyt.ozdas@autodesk.com> --------- Signed-off-by: cuneyt.ozdas <cuneyt.ozdas@autodesk.com> * Address GPU unit test failures (AcademySoftwareFoundation#2123) * - Weights for cos(3h) and sin(h) in chroma_compress_norm() looks wrong. Fixing the weights makes the GPU tests pass now (except of the inverse output transform which seems to have a separate issue). - If the new weights are correct, I'll need to update the CPU test target values too. Signed-off-by: cuneyt.ozdas <cuneyt.ozdas@autodesk.com> * - Updating the expected values in the CPU tests Signed-off-by: cuneyt.ozdas <cuneyt.ozdas@autodesk.com> * - The remaining GPU test failures were caused by a simple typo where we were passing h instead of J to ocio_tonescale_inv() function. With the fix all the unit tests are happy now. - Since we decided not to include any SIMD implementation in this version, I removed the conditional code paths and left the current SSE & AVX implementations as commented out for future guidence. Signed-off-by: cuneyt.ozdas <cuneyt.ozdas@autodesk.com> --------- Signed-off-by: cuneyt.ozdas <cuneyt.ozdas@autodesk.com> * Remove unused code for old gamut table calculations (AcademySoftwareFoundation#2124) Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> Signed-off-by: cuneyt.ozdas <cuneyt.ozdas@autodesk.com> * Minor code cleanup Signed-off-by: cuneyt.ozdas <cuneyt.ozdas@autodesk.com> * Adding negative A trap on Aab_to_JMh_Shader() per code review Signed-off-by: cuneyt.ozdas <cuneyt.ozdas@autodesk.com> * Adding copysign to tonescale to make it aligned with the CPU implementation. It's possible that on GPU we may never receive negative J due to prior guarding, but for now aligning with the CPU to be on the safer side. Signed-off-by: cuneyt.ozdas <cuneyt.ozdas@autodesk.com> * Add built-in transform round-trip test Signed-off-by: Doug Walker <doug.walker@autodesk.com> * Loosen tolerance for other machines Signed-off-by: Doug Walker <doug.walker@autodesk.com> * Add GPU round-trip tests Signed-off-by: Doug Walker <doug.walker@autodesk.com> * Loosen tolerances for other GPUs Signed-off-by: Doug Walker <doug.walker@autodesk.com> --------- Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> Signed-off-by: cuneyt.ozdas <cuneyt.ozdas@autodesk.com> Signed-off-by: Doug Walker <doug.walker@autodesk.com> Co-authored-by: Kevin Wheatley <kevin.wheatley@framestore.com> Co-authored-by: Doug Walker <doug.walker@autodesk.com> (cherry picked from commit 1931542) Signed-off-by: Doug Walker <doug.walker@autodesk.com>

* Add support for Windows ARM64 (#2089) * Add support for Windows ARM64 Signed-off-by: Anthony Roberts <anthony.roberts@linaro.org> * Fix improper compiler flag check Signed-off-by: Anthony Roberts <anthony.roberts@linaro.org> * Fix sse2neon issues on Windows ARM64 Signed-off-by: Anthony Roberts <anthony.roberts@linaro.org> * Fix cross-compilation on Windows for X64 -> ARM64 Signed-off-by: Anthony Roberts <anthony.roberts@linaro.org> * Fix comment to match with corresponding if directive Signed-off-by: Anthony Roberts <anthony.roberts@linaro.org> * Check for MSVC before setting MSVC-style flag Signed-off-by: Anthony Roberts <anthony.roberts@linaro.org> * Fix comment to resolve ambiguity Signed-off-by: Anthony Roberts <anthony.roberts@linaro.org> --------- Signed-off-by: Anthony Roberts <anthony.roberts@linaro.org> Co-authored-by: Doug Walker <doug.walker@autodesk.com> (cherry picked from commit c09951e) Signed-off-by: Doug Walker <doug.walker@autodesk.com> * Fix issue with ocio_depts handling spaces in file paths (#2109) Signed-off-by: Taegyun Ha <taegyun.ha@disguise.one> (cherry picked from commit c5c85b0) Signed-off-by: Doug Walker <doug.walker@autodesk.com> * Issue #2116 : Fixes Metal backend's generated shaders with float/int constant Array Performance (#2117) * Issue #2116 : Improves Metal Backend Perf. moves the constant float/int declaration to constant space so it doesnt get initialized per thread. This improved color correction performance on M4 Max 3-4 times better. Signed-off-by: Morteza <smostajabodaveh@apple.com> * Tiny refactoring to improve code maintainability Signed-off-by: Morteza <smostajabodaveh@apple.com> --------- Signed-off-by: Morteza <smostajabodaveh@apple.com> (cherry picked from commit d807b38) Signed-off-by: Doug Walker <doug.walker@autodesk.com> * Adsk Contrib - Issue #2111 Absolute paths not working through proxy (#2112) * Ticket #2111 - Do not use config proxy for absolute paths while computing file hash or loading LUT data. - Added the unit test provided in the ticket. Signed-off-by: cuneyt.ozdas <cuneyt.ozdas@autodesk.com> * - Changing the logic so that for abs paths we first try the configProxy and if that fails fall back to file system. For relative paths, we don't fall back to file system though, proxy is expected to handle those. - Removed the unnecessary closeLutStream() function. We're using unique pointers, that means RAII is in place. The whole idea behind RAII is we don't need to worry about the cleanup or the type of the object wrapped by the RAII handler (unique_ptr in this case). - Cleaned up some unnecessary conversions, type shuffling and copies around the code I touched. - Cleaned up some unsafe type casts which are prone to dereferencing null pointers. Signed-off-by: cuneyt.ozdas <cuneyt.ozdas@autodesk.com> * - Ah! make_unique is a c++14 feature and we support C++11. I wonder why windows build is configured to use c++14+ while other platforms use C++11. Replacing make_unique with the new syntax to make the other platforms happy too. Signed-off-by: cuneyt.ozdas <cuneyt.ozdas@autodesk.com> * - Minor cleanup - Added a test for absolute path to inexistent file. Signed-off-by: cuneyt.ozdas <cuneyt.ozdas@autodesk.com> --------- Signed-off-by: cuneyt.ozdas <cuneyt.ozdas@autodesk.com> Co-authored-by: Doug Walker <doug.walker@autodesk.com> (cherry picked from commit af69f39) Signed-off-by: Doug Walker <doug.walker@autodesk.com> * Change recommended Imath version to 3.1.12. This should fix Issue #1764. (#2120) Signed-off-by: Mark Titchener <mark.titchener@foundry.com> Co-authored-by: Doug Walker <doug.walker@autodesk.com> (cherry picked from commit 7237eaa) Signed-off-by: Doug Walker <doug.walker@autodesk.com> * Integrating matrix multiplication fix from OSL (#2121) See AcademySoftwareFoundation/OpenShadingLanguage#1513 for more details. Signed-off-by: Jerry Gamache <jerry.gamache@autodesk.com> Co-authored-by: Doug Walker <doug.walker@autodesk.com> (cherry picked from commit fed973f) Signed-off-by: Doug Walker <doug.walker@autodesk.com> * Add missing setConfigIOProxy call to the Python API (#2128) * Add missing setConfigIOProxy call to the Python API Signed-off-by: Rémi Achard <remiachard@gmail.com> * Restore a clean cache for other unit tests Signed-off-by: Rémi Achard <remiachard@gmail.com> --------- Signed-off-by: Rémi Achard <remiachard@gmail.com> Co-authored-by: Doug Walker <doug.walker@autodesk.com> (cherry picked from commit 30db204) Signed-off-by: Doug Walker <doug.walker@autodesk.com> * ACES 2.0 Output Transform performance optimisation (#2127) * ACES 2.0 Output Transform performance optimisation (#2119) * Extend ocioperf to take config file parameter on CLI Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Extend ocioconvert to take config on command line Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Extract tonescale_fwd function Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Extract inverse tonescale function Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Combine c and Z variables in J calculation exponent replace 100.0 entries when referring to the scale of J Extract calculation of nonlinear compression into functions Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Split RGB<->JMh function into two parts to expose opponent intermediate values Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Use function to compute matrix multiply for LMS calculations Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Remove unused member variable from JMhParams structure Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Combine chromatic adaptation weights into LMS matrix (and inverse) - CHANGES PIXEL OUTPUT Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Use matrix form for transforming cone responses to Aab Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Normalise the F_L parameter Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Remove ra and ba related variables to avoid them being out of sync with opponent calculation Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Make A<->J conversion function generic Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Deduplicate Y<->J conversions Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Factor JMh scaling parameters into Aab matrices Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * factor our references to PI, 360 and 180 constants Avoid looking up cusp twice during inverse Whilst searching for the cusp we have already constrained the search so we do not need to clamp Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Add functions to explain some of the calculations Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Further clarify when 100 means reference luminance Migrate rescaling into tonescale s_2 parameter Rename model_gamma to reflect it is actually the inverse Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * migrate init steps performed within other init functions to the top level to avoid repeat init of precomputed values. Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * extract some of the fixed values that only depend on the hue to reduce recomputation during inverse gamut mapping Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Avoid double lookup for reachMaxM value by resolving once the hue is known. also reduces size of object on stack by not passing the whole table. Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Push wrapping of hues to the boundary, mark up conversion points from external inputs etc Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Store gamma values as reciprocals move more magic constants into const variables factor some of the complex expressions into function (temporarily makes things slower) Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Add some missing includes to headers Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * minor cleanup to use std::array instead of plain array for test samples Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Inline reach boundary finding restructure find_gamut_boundary_intersection to highlight common patterns. Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Extract gamut mapper compression function rework get_focus_gain to directly computer the slope_gain Share calculation of analytical thereshold Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Rework gamut mapper to compress absolute M then only recalculate calculate J Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Precalculate maximum search range for cusp lookup next steps would be to factor hue into separate table to improve cache hits followed by redistribution to more uniform hues which should narrow search range Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Experiment with reusing slope calculations in gamut mapper presmooth cusp values Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Add a collection of TODO's Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Restore function mapping table index to hue Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Minor tweaks to tonescale inverse clamp Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Remove duplicate table whilst calculating upper hull gamma Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Add some additional sample points for the upper hull gamma finder Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Slight tidy up of gamma fitting code Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Experiment with alternate smin implementation Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Remove unused function and tidy up comments Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Extract hue search into separate function Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Extract hues into separate table, merge gamma values into their place (gamma values now sampled on cusp hue intervals). Removes extra texture from GPU path. Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Simplify upper hull gamma hue lookup to avoid unneeded lerping as we are sampling the table entries directly Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Split out tonescale function, minor tweaks to Aab->JMh Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Build tables more uniformly, needs some clean up and lots of testing Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Speed up reach corner finding by switching to testing against the Achromatic rather than J limit Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Speed up hull gamma finding by computing values which depend only on the test points and not the gamma values themselves Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Adjust GPU hue lookup to take advantage of more uniform distribution Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Fix GLSL compatibility with hue lookup Remove compiler warnings for unused parameters Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Attempt to simplify table generation code Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Explicilty allow GCC to perform additional optimisations - Needs some discussion Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Add extra entries to reach table to avoid needing to clamp to range during pixel processing Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * GPU move reach Max M sampling to avoid looking it up multiple times per pixel Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Remove smoothing from GPU path, it is baked into the csup Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Fix bug with reach lookup Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Try only wrap hues on input to the shaders Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * rework GPU camut compressor to follow the same algorithm as CPU. Not 100% the same GPU still recalculates some values Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Rework solve_J_intersect to have fewer div instructions Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Adjust GPU code to better align with CPU code's structure, some additional precomputation is now applied during shader generation Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Precompute more scaling factors into matrices and nonlinear functions Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Experiment with unsigned integers for array access Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Bypass one J-> A conversion by saving the Aab computed earlier Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Test intrinsics for compression Norm calculation Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Attempt to calculate sin/cos only once per pixel. Some minor micro optimisations. Further alignment of GPU with CPU code, Tests values need evaluating Some GPU results are different - TBD Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Remove unused parameters Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Try tree vectoriser for gcc Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Add Vectorise option for MSVC Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Remove unused function Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Constexpr std::max is only available in C++ 14 for now avoid the call to it Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Try to fir intrinsic based errors on osome build configurations Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Another C++ 14 usage fix Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Remove check for CLANG left over from testing Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> --------- Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> * Update ACES2 CPU non-SIMD path (#2122) * - Commenting out the ACES2 SIMD implementation for now to focus on validity of the scalar math. For SIMD we need to do implement run-time switching logic too. - Slight improvements to the unit tests so that we print out the computed error metric as well as the actual and expected values. Helps to see the magnitude of the error. - FixedFunctionOpCPU and BuiltinTransform tests now produce error lines with the same structure & syntax, including the computed error. - Updated the expected values for ACES2 tests with the values the new optimized code produces, this makes all of the of CPU tests pass now. - For ACES2 ops and builtin transforms, the error threshold is increased to 1e-4 - added few, temporary code snippets that dumps the currently produced results, making it easier to update the golden values if needed again. Signed-off-by: cuneyt.ozdas <cuneyt.ozdas@autodesk.com> * - Fixing Linux build Signed-off-by: cuneyt.ozdas <cuneyt.ozdas@autodesk.com> * Making Linux build happy is never easy. Signed-off-by: cuneyt.ozdas <cuneyt.ozdas@autodesk.com> --------- Signed-off-by: cuneyt.ozdas <cuneyt.ozdas@autodesk.com> * Address GPU unit test failures (#2123) * - Weights for cos(3h) and sin(h) in chroma_compress_norm() looks wrong. Fixing the weights makes the GPU tests pass now (except of the inverse output transform which seems to have a separate issue). - If the new weights are correct, I'll need to update the CPU test target values too. Signed-off-by: cuneyt.ozdas <cuneyt.ozdas@autodesk.com> * - Updating the expected values in the CPU tests Signed-off-by: cuneyt.ozdas <cuneyt.ozdas@autodesk.com> * - The remaining GPU test failures were caused by a simple typo where we were passing h instead of J to ocio_tonescale_inv() function. With the fix all the unit tests are happy now. - Since we decided not to include any SIMD implementation in this version, I removed the conditional code paths and left the current SSE & AVX implementations as commented out for future guidence. Signed-off-by: cuneyt.ozdas <cuneyt.ozdas@autodesk.com> --------- Signed-off-by: cuneyt.ozdas <cuneyt.ozdas@autodesk.com> * Remove unused code for old gamut table calculations (#2124) Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> Signed-off-by: cuneyt.ozdas <cuneyt.ozdas@autodesk.com> * Minor code cleanup Signed-off-by: cuneyt.ozdas <cuneyt.ozdas@autodesk.com> * Adding negative A trap on Aab_to_JMh_Shader() per code review Signed-off-by: cuneyt.ozdas <cuneyt.ozdas@autodesk.com> * Adding copysign to tonescale to make it aligned with the CPU implementation. It's possible that on GPU we may never receive negative J due to prior guarding, but for now aligning with the CPU to be on the safer side. Signed-off-by: cuneyt.ozdas <cuneyt.ozdas@autodesk.com> * Add built-in transform round-trip test Signed-off-by: Doug Walker <doug.walker@autodesk.com> * Loosen tolerance for other machines Signed-off-by: Doug Walker <doug.walker@autodesk.com> * Add GPU round-trip tests Signed-off-by: Doug Walker <doug.walker@autodesk.com> * Loosen tolerances for other GPUs Signed-off-by: Doug Walker <doug.walker@autodesk.com> --------- Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> Signed-off-by: cuneyt.ozdas <cuneyt.ozdas@autodesk.com> Signed-off-by: Doug Walker <doug.walker@autodesk.com> Co-authored-by: Kevin Wheatley <kevin.wheatley@framestore.com> Co-authored-by: Doug Walker <doug.walker@autodesk.com> (cherry picked from commit 1931542) Signed-off-by: Doug Walker <doug.walker@autodesk.com> * Increment library version to 2.4.2 Signed-off-by: Doug Walker <doug.walker@autodesk.com> * Propose NaN fix for the ACES2 inverse output transforms (#2132) * Propose Aab_to_RGB NaN fix Signed-off-by: Doug Walker <doug.walker@autodesk.com> * Fix for test on ARM Signed-off-by: Doug Walker <doug.walker@autodesk.com> * Fix for tests on Linux/Windows Signed-off-by: Doug Walker <doug.walker@autodesk.com> * Fix for GPU test on Linux Signed-off-by: Doug Walker <doug.walker@autodesk.com> * NaN fix for gamma and double log fixed functions Signed-off-by: Doug Walker <doug.walker@autodesk.com> * Remove commented-out code Signed-off-by: Doug Walker <doug.walker@autodesk.com> --------- Signed-off-by: Doug Walker <doug.walker@autodesk.com> (cherry picked from commit 0546612) Signed-off-by: Doug Walker <doug.walker@autodesk.com> --------- Signed-off-by: Anthony Roberts <anthony.roberts@linaro.org> Signed-off-by: Doug Walker <doug.walker@autodesk.com> Signed-off-by: Taegyun Ha <taegyun.ha@disguise.one> Signed-off-by: Morteza <smostajabodaveh@apple.com> Signed-off-by: cuneyt.ozdas <cuneyt.ozdas@autodesk.com> Signed-off-by: Mark Titchener <mark.titchener@foundry.com> Signed-off-by: Jerry Gamache <jerry.gamache@autodesk.com> Signed-off-by: Rémi Achard <remiachard@gmail.com> Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com> Co-authored-by: Anthony Roberts <anthony.roberts@linaro.org> Co-authored-by: Taegyun Ha <110908525+DevTGHa@users.noreply.github.com> Co-authored-by: Morteza Mostajab <92918486+Morteeza@users.noreply.github.com> Co-authored-by: Cuneyt Ozdas <cuneyt.ozdas@autodesk.com> Co-authored-by: Mark Titchener <mark.titchener@foundry.com> Co-authored-by: JGamache-autodesk <56274617+JGamache-autodesk@users.noreply.github.com> Co-authored-by: Rémi Achard <remiachard@gmail.com> Co-authored-by: Kevin Wheatley <kevin.wheatley@framestore.com>

KevinJW added 30 commits December 23, 2024 15:12

Extend ocioperf to take config file parameter on CLI

5c784aa

Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com>

Extend ocioconvert to take config on command line

da0ab98

Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com>

Extract tonescale_fwd function

b5ef4f2

Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com>

Extract inverse tonescale function

14244a8

Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com>

Combine c and Z variables in J calculation exponent

3733698

replace 100.0 entries when referring to the scale of J Extract calculation of nonlinear compression into functions Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com>

Split RGB<->JMh function into two parts to expose opponent intermedia…

2286c57

…te values Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com>

Use function to compute matrix multiply for LMS calculations

a399575

Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com>

Remove unused member variable from JMhParams structure

dab7c7a

Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com>

Combine chromatic adaptation weights into LMS matrix (and inverse) - …

bb80581

…CHANGES PIXEL OUTPUT Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com>

Use matrix form for transforming cone responses to Aab

2fb497b

Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com>

Normalise the F_L parameter

9e70f69

Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com>

Remove ra and ba related variables to avoid them being out of sync wi…

26a0881

…th opponent calculation Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com>

Make A<->J conversion function generic

1f3b100

Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com>

Deduplicate Y<->J conversions

b621391

Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com>

Factor JMh scaling parameters into Aab matrices

593e2ca

Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com>

factor our references to PI, 360 and 180 constants

db07f82

Avoid looking up cusp twice during inverse Whilst searching for the cusp we have already constrained the search so we do not need to clamp Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com>

Add functions to explain some of the calculations

c7415c7

Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com>

Further clarify when 100 means reference luminance

3651734

Migrate rescaling into tonescale s_2 parameter Rename model_gamma to reflect it is actually the inverse Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com>

migrate init steps performed within other init functions to the top l…

590b3eb

…evel to avoid repeat init of precomputed values. Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com>

extract some of the fixed values that only depend on the hue to reduc…

ae7ddaa

…e recomputation during inverse gamut mapping Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com>

Avoid double lookup for reachMaxM value by resolving once the hue is …

f466d84

…known. also reduces size of object on stack by not passing the whole table. Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com>

Push wrapping of hues to the boundary,

34794cd

mark up conversion points from external inputs etc Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com>

Store gamma values as reciprocals

07cab5d

move more magic constants into const variables factor some of the complex expressions into function (temporarily makes things slower) Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com>

Add some missing includes to headers

a6c7800

Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com>

minor cleanup to use std::array instead of plain array for test samples

e567701

Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com>

Inline reach boundary finding

4a9e1f7

restructure find_gamut_boundary_intersection to highlight common patterns. Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com>

Extract gamut mapper compression function

b0f3962

rework get_focus_gain to directly computer the slope_gain Share calculation of analytical thereshold Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com>

Rework gamut mapper to compress absolute M then only recalculate calc…

05da229

…ulate J Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com>

Precalculate maximum search range for cusp lookup

f8498c6

next steps would be to factor hue into separate table to improve cache hits followed by redistribution to more uniform hues which should narrow search range Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com>

Experiment with reusing slope calculations in gamut mapper

90c6765

presmooth cusp values Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com>

KevinJW added 15 commits January 28, 2025 18:26

GPU move reach Max M sampling to avoid looking it up multiple times p…

a40be0c

…er pixel Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com>

Remove smoothing from GPU path, it is baked into the csup

a77c9f2

Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com>

Fix bug with reach lookup

f4d0641

Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com>

Try only wrap hues on input to the shaders

caf7b6c

Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com>

rework GPU camut compressor to follow the same algorithm as CPU. Not …

a4f4596

…100% the same GPU still recalculates some values Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com>

Rework solve_J_intersect to have fewer div instructions

fd981f8

Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com>

Adjust GPU code to better align with CPU code's structure, some addit…

903a98d

…ional precomputation is now applied during shader generation Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com>

Precompute more scaling factors into matrices and nonlinear functions

8213ba3

Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com>

Experiment with unsigned integers for array access

4a13c36

Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com>

Bypass one J-> A conversion by saving the Aab computed earlier

6d609bc

Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com>

Test intrinsics for compression Norm calculation

da17d7c

Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com>

Attempt to calculate sin/cos only once per pixel.

7e96bdf

Some minor micro optimisations. Further alignment of GPU with CPU code, Tests values need evaluating Some GPU results are different - TBD Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com>

Remove unused parameters

36db212

Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com>

Try tree vectoriser for gcc

e3aa39f

Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com>

Add Vectorise option for MSVC

07abe0a

Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com>

KevinJW changed the base branch from main to aces2_optimization February 20, 2025 09:46

Remove unused function

abe5658

Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com>

KevinJW added 4 commits February 20, 2025 10:26

Constexpr std::max is only available in C++ 14 for now avoid the call…

bb59154

… to it Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com>

Try to fir intrinsic based errors on osome build configurations

e7c0dac

Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com>

Another C++ 14 usage fix

430043b

Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com>

Remove check for CLANG left over from testing

eab4a65

Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com>

doug-walker merged commit 6befba7 into AcademySoftwareFoundation:aces2_optimization Feb 21, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ACES 2.0 Output Transform performance optimisation#2119

ACES 2.0 Output Transform performance optimisation#2119
doug-walker merged 70 commits into
AcademySoftwareFoundation:aces2_optimizationfrom
KevinJW:ACES2.0_OutputTransformPerformance

KevinJW commented Feb 19, 2025

Uh oh!

alexfry commented Feb 20, 2025

Uh oh!

doug-walker commented Feb 20, 2025

Uh oh!

KevinJW commented Feb 20, 2025

Uh oh!

KevinJW commented Feb 20, 2025

Uh oh!

doug-walker commented Feb 21, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

KevinJW commented Feb 19, 2025

Uh oh!

alexfry commented Feb 20, 2025

Uh oh!

doug-walker commented Feb 20, 2025

Uh oh!

KevinJW commented Feb 20, 2025

Uh oh!

KevinJW commented Feb 20, 2025

Uh oh!

doug-walker commented Feb 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

doug-walker commented Feb 21, 2025 •

edited

Loading