argon2: optimize with AVX2 SIMD#440
Conversation
|
Ok, I was able to verify that I ported all the macros correctly (at least as far as I could tell), and I fixed some stuff. I'm still not entirely sure where I'm going wrong here, as the tests are still failing. |
|
I found out there was a way to tell the compiler to generate code based on a target feature being enabled with the https://godbolt.org/z/843z7eGYb demonstrates the code gen using this method, which is implemented in 70e34de. Running benchmarks on my Ryzen 9 3900X, it shows about a 20-30% performance improvement. Would this solution be acceptable? I can clean up the commit history here if it matters. |
|
So the compress function implementations are identical and the only difference is |
I've tried my hardest to get this to work, but I think I'm in a little over my head. The most I've managed to do is port the macros from the optimized reference version.
My main problem is that I have no idea how these indexes intostatework:https://github.com/P-H-C/phc-winner-argon2/blob/92cd2e1/src/opt.c#L94-L102If someone could give me a hand, I'd be happy to fix/finish this. This is the first time I've ever tried to do stuff with SIMD, so bear with me as I'm still learning.
related to #104