Skip to content

Optimized dataset read#211

Merged
tevador merged 3 commits intotevador:masterfrom
SChernykh:opt-dataset-read
May 22, 2021
Merged

Optimized dataset read#211
tevador merged 3 commits intotevador:masterfrom
SChernykh:opt-dataset-read

Conversation

@SChernykh
Copy link
Collaborator

There was a false dependency on readReg2 and readReg3 (caused by xor rbp, rax instruction) when reading dataset item (see design.md - 4.6.2 Loop execution, steps 5 and 7). This change uses ma register to read dataset item before the whole rbp (ma and mx) is changed, so superscalar and out-of-order CPU can start executing it earlier.

Results: https://i.imgur.com/Bpeq9mx.png

~1% speedup on modern Intel/AMD CPUs.

SChernykh added 2 commits May 20, 2021 14:25
There was a false dependency on readReg2 and readReg3 (caused by `xor rbp, rax` instruction) when reading dataset item (see design.md - 4.6.2 Loop execution, steps 5 and 7). This change uses `ma` register to read dataset item before the whole `rbp` (`ma` and `mx`) is changed, so superscalar and out-of-order CPU can start executing it earlier.

Results: https://i.imgur.com/Bpeq9mx.png

~1% speedup on modern Intel/AMD CPUs.
Break dependency from readReg2 and readReg3.
@tevador
Copy link
Owner

tevador commented May 22, 2021

I can confirm a 0.6% speedup on Ryzen 3700X.

@tevador
Copy link
Owner

tevador commented May 22, 2021

However, if you run randomx-tests, test 84 fails with an invalid hash.

@SChernykh
Copy link
Collaborator Author

This is weird. I got a correct hash after 100k iterations, see screenshot. What's in test 84?

@SChernykh
Copy link
Collaborator Author

I've just double checked that randomx-benchmark.exe gives correct hashes. I'll look into it.

@tevador
Copy link
Owner

tevador commented May 22, 2021

Everything works now.

@tevador tevador merged commit 3c8c7ee into tevador:master May 22, 2021
malbit pushed a commit to malbit/RandomARQ that referenced this pull request Dec 1, 2021
* Optimized dataset read

There was a false dependency on readReg2 and readReg3 (caused by `xor rbp, rax` instruction) when reading dataset item (see design.md - 4.6.2 Loop execution, steps 5 and 7). This change uses `ma` register to read dataset item before the whole `rbp` (`ma` and `mx`) is changed, so superscalar and out-of-order CPU can start executing it earlier.

Results: https://i.imgur.com/Bpeq9mx.png

~1% speedup on modern Intel/AMD CPUs.

* ARMv8: optimized dataset read

Break dependency from readReg2 and readReg3.

* Fixed light mode hashing
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants