Skip to content

WilsonGagueAction deriv#473

Merged
paboyle merged 19 commits into
paboyle:developfrom
UCL-ARC:gauge_action_deriv
Apr 24, 2025
Merged

WilsonGagueAction deriv#473
paboyle merged 19 commits into
paboyle:developfrom
UCL-ARC:gauge_action_deriv

Conversation

@qiUip
Copy link
Copy Markdown
Contributor

@qiUip qiUip commented Feb 24, 2025

This PR includes two performance-related improvements in the WilsonGaugeAction deriv(). The first moves the PeekIndex loop for the GaugeLinkField vector outside of Staple() so that it doesn't redundantly get called for each Nd. This change measurably improves performance in the deriv function for the WilsonGagueAction. The second is ensuring that the staple variable used to update the the derivative of the GaugeLinkField in each direction in the Staple() function gets initialised to zero correctly without needless HtoD and DtoH copies. This resulted in dramatic performance improvements in Staple() calls.

The PR also contains changes from PR #465 and #471 which were used to asses the performance gains in the WilsonFlow and sp2n test cases.

Some minor variable name changes were applied as well to keep the GaugeActions consistent with FermionActions and the modified / returned variable names from the calling functions.

Three test cases were run on Tursa using the A100-80GB Nvidia GPUs, each with --grid 24.24.24.48 --mpi 1.1.1.1 --Trajectories 5 --Thermalizations 3 (single GPU) and --grid 32.32.32.64 --mpi 1.1.1.4 --Trajectories 5 --Thermalizations 3 (multi GPU). The Plaquette number was verified to be exactly the same up to and including the last decimal.

The tests are:

  1. HCM/Mobius2p1f.cc
  2. tests/smearing/Test_WilsonFlow.cc (using the configuration from the Mobius2p1f case)
  3. test/hmc/Test_hmc_WilsonFermionGauge.cc

Results:

  • Test 1 Single GPU - Develop = 14093.66s, Gauge action deriv = 14045.44s;
  • Test 1 Multi GPU - Develop = 17834.63, Gauge action deriv = 17834.75;

As expected, no measurable changes as the WilsonGagueAction deriv forms a very small part of this benchmark case.

  • Test 2 Single GPU - Develop = 168.96s, Gauge action deriv = 101.72s;
  • Test 2 Multi GPU - Develop = 225.55s, Gauge action deriv = 95.47s;

These test are all WilsonGagueAction and show a large improvement of approximately 39% in the single-GPU case and 57% in the multi-GPU case.

  • Test 3 Single GPU - Develop = 1961.20s, Gauge action deriv = 1608.12s;
  • Test 3 Multi GPU - Develop = 2439.99s, Gauge action deriv = 1827.85s;

These tests are mixed WilsonGagueAction and FermionAction with approximately 50/50 split between them in the develop branch. As expected, the improvements are about half of those seen in Test 2 and provide a 18% improvement in the single-GPU case and 25% in the multi-GPU case.

Additionally, we tested /tests/sp2n/Test_hmc_Sp_WilsonFundFermionGauge.cc with a range of settings which showed approximately the same gains as Test 3.

For validation, we checked that the checksums of the ckpoint_lat.1 files are identical before and after the changes for /tests/sp2n/Test_hmc_Sp_WilsonFundFermionGauge.cc, test/hmc/Test_hmc_WilsonFermionGauge.cc and test/hmc/Test_hmc_IwasakiGauge.cc.

@qiUip qiUip changed the title Gauge action deriv WilsonGagueAction deriv Feb 24, 2025
@edbennett
Copy link
Copy Markdown
Contributor

Thanks Mashy!

Just to summarise the speedups in a table:

Test --mpi Current develop time Time after this PR Speedup
Möbius 2+1f HMC 1.1.1.1 14093.66 14045.44 0.3%
Möbius 2+1f HMC 1.1.1.4 17834.63 17834.75 0.0%
Wilson flow 1.1.1.1 168.96 101.72 39%
Wilson flow 1.1.1.4 225.55 95.47 57%
Wilson 2f HMC 1.1.1.1 1961.20 1608.12 18%
Wilson 2f HMC 1.1.1.4. 2439.99 1827.85 25%

@paboyle paboyle merged commit ab3de50 into paboyle:develop Apr 24, 2025
0 of 3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants