WilsonGagueAction deriv#473
Merged
Merged
Conversation
Bugfix/nvtx
* patched version + modifications to deriv -> staple in qcd/gauge * Cleaning up and aligning variable naming between action deriv versions * Removing the regresion test files that were also in this branch for a clean PR * Reverting whitespace changes * Fixing after revering too much! --------- Co-authored-by: Mashy Green <mashy@me.com>
…uge test to be small (#22) Co-authored-by: Mashy Green <mashy@me.com> merging no-su3 patch
Contributor
|
Thanks Mashy! Just to summarise the speedups in a table:
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR includes two performance-related improvements in the WilsonGaugeAction
deriv(). The first moves thePeekIndexloop for theGaugeLinkFieldvector outside ofStaple()so that it doesn't redundantly get called for eachNd. This change measurably improves performance in thederivfunction for the WilsonGagueAction. The second is ensuring that thestaplevariable used to update the the derivative of the GaugeLinkField in each direction in theStaple()function gets initialised to zero correctly without needless HtoD and DtoH copies. This resulted in dramatic performance improvements inStaple()calls.The PR also contains changes from PR #465 and #471 which were used to asses the performance gains in the WilsonFlow and sp2n test cases.
Some minor variable name changes were applied as well to keep the GaugeActions consistent with FermionActions and the modified / returned variable names from the calling functions.
Three test cases were run on Tursa using the A100-80GB Nvidia GPUs, each with
--grid 24.24.24.48 --mpi 1.1.1.1 --Trajectories 5 --Thermalizations 3(single GPU) and --grid 32.32.32.64 --mpi 1.1.1.4 --Trajectories 5 --Thermalizations 3 (multi GPU). The Plaquette number was verified to be exactly the same up to and including the last decimal.The tests are:
Results:
As expected, no measurable changes as the WilsonGagueAction
derivforms a very small part of this benchmark case.These test are all WilsonGagueAction and show a large improvement of approximately 39% in the single-GPU case and 57% in the multi-GPU case.
These tests are mixed WilsonGagueAction and FermionAction with approximately 50/50 split between them in the develop branch. As expected, the improvements are about half of those seen in Test 2 and provide a 18% improvement in the single-GPU case and 25% in the multi-GPU case.
Additionally, we tested /tests/sp2n/Test_hmc_Sp_WilsonFundFermionGauge.cc with a range of settings which showed approximately the same gains as Test 3.
For validation, we checked that the checksums of the
ckpoint_lat.1files are identical before and after the changes for /tests/sp2n/Test_hmc_Sp_WilsonFundFermionGauge.cc, test/hmc/Test_hmc_WilsonFermionGauge.cc and test/hmc/Test_hmc_IwasakiGauge.cc.