klas - kernel launchers and SIMD vectorization#72
Closed
valassi wants to merge 155 commits into
Closed
Conversation
…plify the code.
Prepare to improve kernel launchers by moving c++ event loops further inside.
= 16384
NumThreadsPerBlock = 32
NumIterations = 1
-----------------------------------------------------------------------
FP precision = DOUBLE (nan=0)
Complex type = THRUST::COMPLEX
RanNumb memory layout = AOSOA[4]
Momenta memory layout = AOSOA[4]
Wavefunction GPU memory = LOCAL
Random number generation = CURAND DEVICE (CUDA code)
-----------------------------------------------------------------------
NumberOfEntries = 1
TotalTime[Rnd+Rmb+ME] (123)= ( 7.378591e-03 ) sec
TotalTime[Rambo+ME] (23)= ( 6.737728e-03 ) sec
TotalTime[RndNumGen] (1)= ( 6.408630e-04 ) sec
TotalTime[Rambo] (2)= ( 5.967797e-03 ) sec
TotalTime[MatrixElems] (3)= ( 7.699310e-04 ) sec
MeanTimeInMatrixElems = ( 7.699310e-04 ) sec
[Min,Max]TimeInMatrixElems = [ 7.699310e-04 , 7.699310e-04 ] sec
-----------------------------------------------------------------------
TotalEventsComputed = 524288
EvtsPerSec[Rnd+Rmb+ME](123)= ( 7.105530e+07 ) sec^-1
EvtsPerSec[Rmb+ME] (23)= ( 7.781377e+07 ) sec^-1
EvtsPerSec[MatrixElems] (3)= ( 6.809545e+08 ) sec^-1
***********************************************************************
NumMatrixElements(notNan) = 524288
MeanMatrixElemValue = ( 1.371958e-02 +- 1.132119e-05 ) GeV^0
[Min,Max]MatrixElemValue = [ 6.071582e-03 , 3.374915e-02 ] GeV^0
StdDevMatrixElemValue = ( 8.197419e-03 ) GeV^0
MeanWeight = ( 4.515827e-01 +- 0.000000e+00 )
[Min,Max]Weight = [ 4.515827e-01 , 4.515827e-01 ]
StdDevWeight = ( 0.000000e+00 )
***********************************************************************
00 CudaFree : 1.084176 sec
0a ProcInit : 0.000522 sec
0b MemAlloc : 0.035510 sec
0c GenCreat : 0.009668 sec
0d SGoodHel : 0.001756 sec
1a GenSeed : 0.000012 sec
1b GenRnGen : 0.000629 sec
2a RamboIni : 0.000041 sec
2b RamboFin : 0.000013 sec
2c CpDTHwgt : 0.000475 sec
2d CpDTHmom : 0.005438 sec
3a SigmaKin : 0.000013 sec
3b CpDTHmes : 0.000757 sec
4a DumpLoop : 0.003222 sec
8a CompStat : 0.003654 sec
9a GenDestr : 0.000053 sec
9b DumpScrn : 0.000229 sec
9c DumpJson : 0.000008 sec
TOTAL : 1.146176 sec
TOTAL (123) : 0.007379 sec
TOTAL (23) : 0.006738 sec
TOTAL (1) : 0.000641 sec
TOTAL (2) : 0.005968 sec
TOTAL (3) : 0.000770 sec
***********************************************************************
./check.exe -p 16384 32 1
***********************************************************************
NumBlocksPerGrid = 16384
NumThreadsPerBlock = 32
NumIterations = 1
-----------------------------------------------------------------------
FP precision = DOUBLE (nan=0)
Complex type = STD::COMPLEX
RanNumb memory layout = AOSOA[4]
Momenta memory layout = AOSOA[4]
Random number generation = CURAND (C++ code)
-----------------------------------------------------------------------
NumberOfEntries = 1
TotalTime[Rnd+Rmb+ME] (123)= ( 1.512748e+00 ) sec
TotalTime[Rambo+ME] (23)= ( 1.477347e+00 ) sec
TotalTime[RndNumGen] (1)= ( 3.540115e-02 ) sec
TotalTime[Rambo] (2)= ( 1.121947e-01 ) sec
TotalTime[MatrixElems] (3)= ( 1.365152e+00 ) sec
MeanTimeInMatrixElems = ( 1.365152e+00 ) sec
[Min,Max]TimeInMatrixElems = [ 1.365152e+00 , 1.365152e+00 ] sec
-----------------------------------------------------------------------
TotalEventsComputed = 524288
EvtsPerSec[Rnd+Rmb+ME](123)= ( 3.465798e+05 ) sec^-1
EvtsPerSec[Rmb+ME] (23)= ( 3.548848e+05 ) sec^-1
EvtsPerSec[MatrixElems] (3)= ( 3.840509e+05 ) sec^-1
***********************************************************************
NumMatrixElements(notNan) = 524288
MeanMatrixElemValue = ( 1.371958e-02 +- 1.132119e-05 ) GeV^0
[Min,Max]MatrixElemValue = [ 6.071582e-03 , 3.374915e-02 ] GeV^0
StdDevMatrixElemValue = ( 8.197419e-03 ) GeV^0
MeanWeight = ( 4.515827e-01 +- 0.000000e+00 )
[Min,Max]Weight = [ 4.515827e-01 , 4.515827e-01 ]
StdDevWeight = ( 0.000000e+00 )
***********************************************************************
0a ProcInit : 0.000329 sec
0b MemAlloc : 0.000044 sec
0c GenCreat : 0.000853 sec
1a GenSeed : 0.000008 sec
1b GenRnGen : 0.035393 sec
2a RamboIni : 0.016423 sec
2b RamboFin : 0.095772 sec
3a SigmaKin : 1.365152 sec
4a DumpLoop : 0.004525 sec
8a CompStat : 0.003041 sec
9a GenDestr : 0.000072 sec
9b DumpScrn : 0.000189 sec
9c DumpJson : 0.000009 sec
TOTAL : 1.521810 sec
TOTAL (123) : 1.512748 sec
TOTAL (23) : 1.477347 sec
TOTAL (1) : 0.035401 sec
TOTAL (2) : 0.112195 sec
TOTAL (3) : 1.365152 sec
***********************************************************************
…cal variable. This has a smaller performance degradation on GPU? Can get it back by using a local variable in calculate_wavefunctions, if needed. ./gcheck.exe -p 16384 32 1 *********************************************************************** NumBlocksPerGrid = 16384 NumThreadsPerBlock = 32 NumIterations = 1 ----------------------------------------------------------------------- FP precision = DOUBLE (nan=0) Complex type = THRUST::COMPLEX RanNumb memory layout = AOSOA[4] Momenta memory layout = AOSOA[4] Wavefunction GPU memory = LOCAL Random number generation = CURAND DEVICE (CUDA code) ----------------------------------------------------------------------- NumberOfEntries = 1 TotalTime[Rnd+Rmb+ME] (123)= ( 7.408483e-03 ) sec TotalTime[Rambo+ME] (23)= ( 6.765105e-03 ) sec TotalTime[RndNumGen] (1)= ( 6.433780e-04 ) sec TotalTime[Rambo] (2)= ( 5.962489e-03 ) sec TotalTime[MatrixElems] (3)= ( 8.026160e-04 ) sec MeanTimeInMatrixElems = ( 8.026160e-04 ) sec [Min,Max]TimeInMatrixElems = [ 8.026160e-04 , 8.026160e-04 ] sec ----------------------------------------------------------------------- TotalEventsComputed = 524288 EvtsPerSec[Rnd+Rmb+ME](123)= ( 7.076861e+07 ) sec^-1 EvtsPerSec[Rmb+ME] (23)= ( 7.749887e+07 ) sec^-1 EvtsPerSec[MatrixElems] (3)= ( 6.532240e+08 ) sec^-1 *********************************************************************** NumMatrixElements(notNan) = 524288 MeanMatrixElemValue = ( 1.371958e-02 +- 1.132119e-05 ) GeV^0 [Min,Max]MatrixElemValue = [ 6.071582e-03 , 3.374915e-02 ] GeV^0 StdDevMatrixElemValue = ( 8.197419e-03 ) GeV^0 MeanWeight = ( 4.515827e-01 +- 0.000000e+00 ) [Min,Max]Weight = [ 4.515827e-01 , 4.515827e-01 ] StdDevWeight = ( 0.000000e+00 ) *********************************************************************** 00 CudaFree : 0.687299 sec 0a ProcInit : 0.000427 sec 0b MemAlloc : 0.035461 sec 0c GenCreat : 0.012625 sec 0d SGoodHel : 0.001746 sec 1a GenSeed : 0.000010 sec 1b GenRnGen : 0.000633 sec 2a RamboIni : 0.000017 sec 2b RamboFin : 0.000016 sec 2c CpDTHwgt : 0.000506 sec 2d CpDTHmom : 0.005423 sec 3a SigmaKin : 0.000015 sec 3b CpDTHmes : 0.000787 sec 4a DumpLoop : 0.005516 sec 8a CompStat : 0.003650 sec 9a GenDestr : 0.000055 sec 9b DumpScrn : 0.000293 sec 9c DumpJson : 0.000008 sec TOTAL : 0.754489 sec TOTAL (123) : 0.007408 sec TOTAL (23) : 0.006765 sec TOTAL (1) : 0.000643 sec TOTAL (2) : 0.005962 sec TOTAL (3) : 0.000803 sec *********************************************************************** ./check.exe -p 16384 32 1 *********************************************************************** NumBlocksPerGrid = 16384 NumThreadsPerBlock = 32 NumIterations = 1 ----------------------------------------------------------------------- FP precision = DOUBLE (nan=0) Complex type = STD::COMPLEX RanNumb memory layout = AOSOA[4] Momenta memory layout = AOSOA[4] Random number generation = CURAND (C++ code) ----------------------------------------------------------------------- NumberOfEntries = 1 TotalTime[Rnd+Rmb+ME] (123)= ( 1.509930e+00 ) sec TotalTime[Rambo+ME] (23)= ( 1.475219e+00 ) sec TotalTime[RndNumGen] (1)= ( 3.471084e-02 ) sec TotalTime[Rambo] (2)= ( 1.109848e-01 ) sec TotalTime[MatrixElems] (3)= ( 1.364234e+00 ) sec MeanTimeInMatrixElems = ( 1.364234e+00 ) sec [Min,Max]TimeInMatrixElems = [ 1.364234e+00 , 1.364234e+00 ] sec ----------------------------------------------------------------------- TotalEventsComputed = 524288 EvtsPerSec[Rnd+Rmb+ME](123)= ( 3.472268e+05 ) sec^-1 EvtsPerSec[Rmb+ME] (23)= ( 3.553968e+05 ) sec^-1 EvtsPerSec[MatrixElems] (3)= ( 3.843094e+05 ) sec^-1 *********************************************************************** NumMatrixElements(notNan) = 524288 MeanMatrixElemValue = ( 1.371958e-02 +- 1.132119e-05 ) GeV^0 [Min,Max]MatrixElemValue = [ 6.071582e-03 , 3.374915e-02 ] GeV^0 StdDevMatrixElemValue = ( 8.197419e-03 ) GeV^0 MeanWeight = ( 4.515827e-01 +- 0.000000e+00 ) [Min,Max]Weight = [ 4.515827e-01 , 4.515827e-01 ] StdDevWeight = ( 0.000000e+00 ) *********************************************************************** 0a ProcInit : 0.000314 sec 0b MemAlloc : 0.000042 sec 0c GenCreat : 0.000848 sec 1a GenSeed : 0.000008 sec 1b GenRnGen : 0.034703 sec 2a RamboIni : 0.016324 sec 2b RamboFin : 0.094661 sec 3a SigmaKin : 1.364234 sec 4a DumpLoop : 0.004442 sec 8a CompStat : 0.003013 sec 9a GenDestr : 0.000119 sec 9b DumpScrn : 0.000195 sec 9c DumpJson : 0.000007 sec TOTAL : 1.518909 sec TOTAL (123) : 1.509930 sec TOTAL (23) : 1.475219 sec TOTAL (1) : 0.034711 sec TOTAL (2) : 0.110985 sec TOTAL (3) : 1.364234 sec ***********************************************************************
…ions. There are at least two issues here - on both cpu and gpu, dividing by denominators should be done once, not on each hel - the helicity filtering on cpp uses a loop that is buggy (gives up too early)
Note: essentially I am inverting the helicity and event loops, this is the key. *********************************************************************** NumBlocksPerGrid = 16384 NumThreadsPerBlock = 32 NumIterations = 1 ----------------------------------------------------------------------- FP precision = DOUBLE (nan=0) Complex type = THRUST::COMPLEX RanNumb memory layout = AOSOA[4] Momenta memory layout = AOSOA[4] Wavefunction GPU memory = LOCAL Random number generation = CURAND DEVICE (CUDA code) ----------------------------------------------------------------------- NumberOfEntries = 1 TotalTime[Rnd+Rmb+ME] (123)= ( 7.218796e-03 ) sec TotalTime[Rambo+ME] (23)= ( 6.566758e-03 ) sec TotalTime[RndNumGen] (1)= ( 6.520380e-04 ) sec TotalTime[Rambo] (2)= ( 5.775874e-03 ) sec TotalTime[MatrixElems] (3)= ( 7.908840e-04 ) sec MeanTimeInMatrixElems = ( 7.908840e-04 ) sec [Min,Max]TimeInMatrixElems = [ 7.908840e-04 , 7.908840e-04 ] sec ----------------------------------------------------------------------- TotalEventsComputed = 524288 EvtsPerSec[Rnd+Rmb+ME](123)= ( 7.262818e+07 ) sec^-1 EvtsPerSec[Rmb+ME] (23)= ( 7.983970e+07 ) sec^-1 EvtsPerSec[MatrixElems] (3)= ( 6.629139e+08 ) sec^-1 *********************************************************************** NumMatrixElements(notNan) = 524288 MeanMatrixElemValue = ( 1.371958e-02 +- 1.132119e-05 ) GeV^0 [Min,Max]MatrixElemValue = [ 6.071582e-03 , 3.374915e-02 ] GeV^0 StdDevMatrixElemValue = ( 8.197419e-03 ) GeV^0 MeanWeight = ( 4.515827e-01 +- 0.000000e+00 ) [Min,Max]Weight = [ 4.515827e-01 , 4.515827e-01 ] StdDevWeight = ( 0.000000e+00 ) *********************************************************************** 00 CudaFree : 0.897224 sec 0a ProcInit : 0.000579 sec 0b MemAlloc : 0.037365 sec 0c GenCreat : 0.009812 sec 0d SGoodHel : 0.001844 sec 1a GenSeed : 0.000011 sec 1b GenRnGen : 0.000641 sec 2a RamboIni : 0.000033 sec 2b RamboFin : 0.000012 sec 2c CpDTHwgt : 0.000482 sec 2d CpDTHmom : 0.005249 sec 3a SigmaKin : 0.000013 sec 3b CpDTHmes : 0.000778 sec 4a DumpLoop : 0.005650 sec 8a CompStat : 0.003652 sec 9a GenDestr : 0.000068 sec 9b DumpScrn : 0.000303 sec 9c DumpJson : 0.000007 sec TOTAL : 0.963722 sec TOTAL (123) : 0.007219 sec TOTAL (23) : 0.006567 sec TOTAL (1) : 0.000652 sec TOTAL (2) : 0.005776 sec TOTAL (3) : 0.000791 sec ***********************************************************************
The physics results are correct but performance gets degraded by almost a factor 2. It looks like I am calculating too many helicities. *********************************************************************** NumBlocksPerGrid = 16384 NumThreadsPerBlock = 32 NumIterations = 1 ----------------------------------------------------------------------- FP precision = DOUBLE (nan=0) Complex type = STD::COMPLEX RanNumb memory layout = AOSOA[4] Momenta memory layout = AOSOA[4] Random number generation = CURAND (C++ code) ----------------------------------------------------------------------- NumberOfEntries = 1 TotalTime[Rnd+Rmb+ME] (123)= ( 2.463410e+00 ) sec TotalTime[Rambo+ME] (23)= ( 2.428130e+00 ) sec TotalTime[RndNumGen] (1)= ( 3.528046e-02 ) sec TotalTime[Rambo] (2)= ( 1.116047e-01 ) sec TotalTime[MatrixElems] (3)= ( 2.316525e+00 ) sec MeanTimeInMatrixElems = ( 2.316525e+00 ) sec [Min,Max]TimeInMatrixElems = [ 2.316525e+00 , 2.316525e+00 ] sec ----------------------------------------------------------------------- TotalEventsComputed = 524288 EvtsPerSec[Rnd+Rmb+ME](123)= ( 2.128302e+05 ) sec^-1 EvtsPerSec[Rmb+ME] (23)= ( 2.159225e+05 ) sec^-1 EvtsPerSec[MatrixElems] (3)= ( 2.263252e+05 ) sec^-1 *********************************************************************** NumMatrixElements(notNan) = 524288 MeanMatrixElemValue = ( 1.371958e-02 +- 1.132119e-05 ) GeV^0 [Min,Max]MatrixElemValue = [ 6.071582e-03 , 3.374915e-02 ] GeV^0 StdDevMatrixElemValue = ( 8.197419e-03 ) GeV^0 MeanWeight = ( 4.515827e-01 +- 0.000000e+00 ) [Min,Max]Weight = [ 4.515827e-01 , 4.515827e-01 ] StdDevWeight = ( 0.000000e+00 ) *********************************************************************** 0a ProcInit : 0.000353 sec 0b MemAlloc : 0.000047 sec 0c GenCreat : 0.000870 sec 0d SGoodHel : 0.000162 sec 1a GenSeed : 0.000009 sec 1b GenRnGen : 0.035272 sec 2a RamboIni : 0.016716 sec 2b RamboFin : 0.094888 sec 3a SigmaKin : 2.316525 sec 4a DumpLoop : 0.004477 sec 8a CompStat : 0.002808 sec 9a GenDestr : 0.000077 sec 9b DumpScrn : 0.000211 sec 9c DumpJson : 0.000007 sec TOTAL : 2.472422 sec TOTAL (123) : 2.463410 sec TOTAL (23) : 2.428130 sec TOTAL (1) : 0.035280 sec TOTAL (2) : 0.111605 sec TOTAL (3) : 2.316525 sec ***********************************************************************
./check.exe -p 16384 32 1 *********************************************************************** NumBlocksPerGrid = 16384 NumThreadsPerBlock = 32 NumIterations = 1 ----------------------------------------------------------------------- FP precision = DOUBLE (nan=0) Complex type = STD::COMPLEX RanNumb memory layout = AOSOA[4] Momenta memory layout = AOSOA[4] Random number generation = CURAND (C++ code) ----------------------------------------------------------------------- NumberOfEntries = 1 TotalTime[Rnd+Rmb+ME] (123)= ( 1.451416e+00 ) sec TotalTime[Rambo+ME] (23)= ( 1.423717e+00 ) sec TotalTime[RndNumGen] (1)= ( 2.769905e-02 ) sec TotalTime[Rambo] (2)= ( 1.000820e-01 ) sec TotalTime[MatrixElems] (3)= ( 1.323635e+00 ) sec MeanTimeInMatrixElems = ( 1.323635e+00 ) sec [Min,Max]TimeInMatrixElems = [ 1.323635e+00 , 1.323635e+00 ] sec ----------------------------------------------------------------------- TotalEventsComputed = 524288 EvtsPerSec[Rnd+Rmb+ME](123)= ( 3.612252e+05 ) sec^-1 EvtsPerSec[Rmb+ME] (23)= ( 3.682530e+05 ) sec^-1 EvtsPerSec[MatrixElems] (3)= ( 3.960972e+05 ) sec^-1 *********************************************************************** NumMatrixElements(notNan) = 524288 MeanMatrixElemValue = ( 1.371958e-02 +- 1.132119e-05 ) GeV^0 [Min,Max]MatrixElemValue = [ 6.071582e-03 , 3.374915e-02 ] GeV^0 StdDevMatrixElemValue = ( 8.197419e-03 ) GeV^0 MeanWeight = ( 4.515827e-01 +- 0.000000e+00 ) [Min,Max]Weight = [ 4.515827e-01 , 4.515827e-01 ] StdDevWeight = ( 0.000000e+00 ) *********************************************************************** 0a ProcInit : 0.000317 sec 0b MemAlloc : 0.028313 sec 0c GenCreat : 0.000863 sec 0d SGoodHel : 0.000164 sec 1a GenSeed : 0.000011 sec 1b GenRnGen : 0.027688 sec 2a RamboIni : 0.006959 sec 2b RamboFin : 0.093123 sec 3a SigmaKin : 1.323635 sec 4a DumpLoop : 0.004347 sec 8a CompStat : 0.003050 sec 9a GenDestr : 0.000076 sec 9b DumpScrn : 0.000253 sec 9c DumpJson : 0.000007 sec TOTAL : 1.488808 sec TOTAL (123) : 1.451416 sec TOTAL (23) : 1.423717 sec TOTAL (1) : 0.027699 sec TOTAL (2) : 0.100082 sec TOTAL (3) : 1.323635 sec ***********************************************************************
…functions. ./gcheck.exe -p 16384 32 1 *********************************************************************** NumBlocksPerGrid = 16384 NumThreadsPerBlock = 32 NumIterations = 1 ----------------------------------------------------------------------- FP precision = DOUBLE (nan=0) Complex type = THRUST::COMPLEX RanNumb memory layout = AOSOA[4] Momenta memory layout = AOSOA[4] Wavefunction GPU memory = LOCAL Random number generation = CURAND DEVICE (CUDA code) ----------------------------------------------------------------------- NumberOfEntries = 1 TotalTime[Rnd+Rmb+ME] (123)= ( 7.351215e-03 ) sec TotalTime[Rambo+ME] (23)= ( 6.713424e-03 ) sec TotalTime[RndNumGen] (1)= ( 6.377910e-04 ) sec TotalTime[Rambo] (2)= ( 5.919468e-03 ) sec TotalTime[MatrixElems] (3)= ( 7.939560e-04 ) sec MeanTimeInMatrixElems = ( 7.939560e-04 ) sec [Min,Max]TimeInMatrixElems = [ 7.939560e-04 , 7.939560e-04 ] sec ----------------------------------------------------------------------- TotalEventsComputed = 524288 EvtsPerSec[Rnd+Rmb+ME](123)= ( 7.131991e+07 ) sec^-1 EvtsPerSec[Rmb+ME] (23)= ( 7.809547e+07 ) sec^-1 EvtsPerSec[MatrixElems] (3)= ( 6.603489e+08 ) sec^-1 *********************************************************************** NumMatrixElements(notNan) = 524288 MeanMatrixElemValue = ( 1.371958e-02 +- 1.132119e-05 ) GeV^0 [Min,Max]MatrixElemValue = [ 6.071582e-03 , 3.374915e-02 ] GeV^0 StdDevMatrixElemValue = ( 8.197419e-03 ) GeV^0 MeanWeight = ( 4.515827e-01 +- 0.000000e+00 ) [Min,Max]Weight = [ 4.515827e-01 , 4.515827e-01 ] StdDevWeight = ( 0.000000e+00 ) *********************************************************************** 00 CudaFree : 0.686566 sec 0a ProcInit : 0.000418 sec 0b MemAlloc : 0.034625 sec 0c GenCreat : 0.012608 sec 0d SGoodHel : 0.001840 sec 1a GenSeed : 0.000013 sec 1b GenRnGen : 0.000625 sec 2a RamboIni : 0.000017 sec 2b RamboFin : 0.000012 sec 2c CpDTHwgt : 0.000512 sec 2d CpDTHmom : 0.005378 sec 3a SigmaKin : 0.000013 sec 3b CpDTHmes : 0.000781 sec 4a DumpLoop : 0.005419 sec 8a CompStat : 0.003564 sec 9a GenDestr : 0.000099 sec 9b DumpScrn : 0.000212 sec 9c DumpJson : 0.000007 sec TOTAL : 0.752710 sec TOTAL (123) : 0.007351 sec TOTAL (23) : 0.006713 sec TOTAL (1) : 0.000638 sec TOTAL (2) : 0.005919 sec TOTAL (3) : 0.000794 sec *********************************************************************** ./check.exe -p 16384 32 1 *********************************************************************** NumBlocksPerGrid = 16384 NumThreadsPerBlock = 32 NumIterations = 1 ----------------------------------------------------------------------- FP precision = DOUBLE (nan=0) Complex type = STD::COMPLEX RanNumb memory layout = AOSOA[4] Momenta memory layout = AOSOA[4] Random number generation = CURAND (C++ code) ----------------------------------------------------------------------- NumberOfEntries = 1 TotalTime[Rnd+Rmb+ME] (123)= ( 1.510862e+00 ) sec TotalTime[Rambo+ME] (23)= ( 1.483222e+00 ) sec TotalTime[RndNumGen] (1)= ( 2.763983e-02 ) sec TotalTime[Rambo] (2)= ( 9.924585e-02 ) sec TotalTime[MatrixElems] (3)= ( 1.383976e+00 ) sec MeanTimeInMatrixElems = ( 1.383976e+00 ) sec [Min,Max]TimeInMatrixElems = [ 1.383976e+00 , 1.383976e+00 ] sec ----------------------------------------------------------------------- TotalEventsComputed = 524288 EvtsPerSec[Rnd+Rmb+ME](123)= ( 3.470125e+05 ) sec^-1 EvtsPerSec[Rmb+ME] (23)= ( 3.534790e+05 ) sec^-1 EvtsPerSec[MatrixElems] (3)= ( 3.788273e+05 ) sec^-1 *********************************************************************** NumMatrixElements(notNan) = 524288 MeanMatrixElemValue = ( 1.371958e-02 +- 1.132119e-05 ) GeV^0 [Min,Max]MatrixElemValue = [ 6.071582e-03 , 3.374915e-02 ] GeV^0 StdDevMatrixElemValue = ( 8.197419e-03 ) GeV^0 MeanWeight = ( 4.515827e-01 +- 0.000000e+00 ) [Min,Max]Weight = [ 4.515827e-01 , 4.515827e-01 ] StdDevWeight = ( 0.000000e+00 ) *********************************************************************** 0a ProcInit : 0.000333 sec 0b MemAlloc : 0.027811 sec 0c GenCreat : 0.000846 sec 0d SGoodHel : 0.000151 sec 1a GenSeed : 0.000009 sec 1b GenRnGen : 0.027630 sec 2a RamboIni : 0.006760 sec 2b RamboFin : 0.092486 sec 3a SigmaKin : 1.383976 sec 4a DumpLoop : 0.004520 sec 8a CompStat : 0.003015 sec 9a GenDestr : 0.000075 sec 9b DumpScrn : 0.000257 sec 9c DumpJson : 0.000010 sec TOTAL : 1.547881 sec TOTAL (123) : 1.510862 sec TOTAL (23) : 1.483222 sec TOTAL (1) : 0.027640 sec TOTAL (2) : 0.099246 sec TOTAL (3) : 1.383976 sec ***********************************************************************
…ixx/oxx functions." This reverts commit b63320f.
Note: with this older implementation, there are 55 lines from "objdump -d -C CPPProcess.o | egrep 'vaddpd|vmul|vfmadd132pd|ymm' | wc -l"
…hin the ixx/oxx functions."" This reverts commit c075db4. Note: with this newer implementation, there are 126 lines from "objdump -d -C CPPProcess.o | egrep 'vaddpd|vmul|vfmadd132pd|ymm' | wc -l" A positive effect of SIMD vectorization on performance is still not there (one would have to migrate also the FFV functions, which requires RRRRIIII), but this is a first proof of concept that the changes go in the right direction
This was referenced Nov 29, 2020
./gcheck.exe -p 16384 32 1 *********************************************************************** NumBlocksPerGrid = 16384 NumThreadsPerBlock = 32 NumIterations = 1 ----------------------------------------------------------------------- FP precision = DOUBLE (nan=0) Complex type = THRUST::COMPLEX RanNumb memory layout = AOSOA[4] Momenta memory layout = AOSOA[4] Wavefunction GPU memory = LOCAL Random number generation = CURAND DEVICE (CUDA code) ----------------------------------------------------------------------- NumberOfEntries = 1 TotalTime[Rnd+Rmb+ME] (123)= ( 7.653209e-03 ) sec TotalTime[Rambo+ME] (23)= ( 7.009831e-03 ) sec TotalTime[RndNumGen] (1)= ( 6.433780e-04 ) sec TotalTime[Rambo] (2)= ( 6.199672e-03 ) sec TotalTime[MatrixElems] (3)= ( 8.101590e-04 ) sec MeanTimeInMatrixElems = ( 8.101590e-04 ) sec [Min,Max]TimeInMatrixElems = [ 8.101590e-04 , 8.101590e-04 ] sec ----------------------------------------------------------------------- TotalEventsComputed = 524288 EvtsPerSec[Rnd+Rmb+ME](123)= ( 6.850564e+07 ) sec^-1 EvtsPerSec[Rmb+ME] (23)= ( 7.479324e+07 ) sec^-1 EvtsPerSec[MatrixElems] (3)= ( 6.471421e+08 ) sec^-1 *********************************************************************** NumMatrixElements(notNan) = 524288 MeanMatrixElemValue = ( 1.371958e-02 +- 1.132119e-05 ) GeV^0 [Min,Max]MatrixElemValue = [ 6.071582e-03 , 3.374915e-02 ] GeV^0 StdDevMatrixElemValue = ( 8.197419e-03 ) GeV^0 MeanWeight = ( 4.515827e-01 +- 0.000000e+00 ) [Min,Max]Weight = [ 4.515827e-01 , 4.515827e-01 ] StdDevWeight = ( 0.000000e+00 ) *********************************************************************** 00 CudaFree : 0.687987 sec 0a ProcInit : 0.000422 sec 0b MemAlloc : 0.034919 sec 0c GenCreat : 0.011849 sec 0d SGoodHel : 0.001837 sec 1a GenSeed : 0.000013 sec 1b GenRnGen : 0.000631 sec 2a RamboIni : 0.000019 sec 2b RamboFin : 0.000013 sec 2c CpDTHwgt : 0.000519 sec 2d CpDTHmom : 0.005649 sec 3a SigmaKin : 0.000015 sec 3b CpDTHmes : 0.000795 sec 4a DumpLoop : 0.005283 sec 8a CompStat : 0.003659 sec 9a GenDestr : 0.000051 sec 9b DumpScrn : 0.000226 sec 9c DumpJson : 0.000007 sec TOTAL : 0.753894 sec TOTAL (123) : 0.007653 sec TOTAL (23) : 0.007010 sec TOTAL (1) : 0.000643 sec TOTAL (2) : 0.006200 sec TOTAL (3) : 0.000810 sec *********************************************************************** ./check.exe -p 16384 32 1 *********************************************************************** NumBlocksPerGrid = 16384 NumThreadsPerBlock = 32 NumIterations = 1 ----------------------------------------------------------------------- FP precision = DOUBLE (nan=0) Complex type = STD::COMPLEX RanNumb memory layout = AOSOA[4] Momenta memory layout = AOSOA[4] Random number generation = CURAND (C++ code) ----------------------------------------------------------------------- NumberOfEntries = 1 TotalTime[Rnd+Rmb+ME] (123)= ( 1.467684e+00 ) sec TotalTime[Rambo+ME] (23)= ( 1.439913e+00 ) sec TotalTime[RndNumGen] (1)= ( 2.777057e-02 ) sec TotalTime[Rambo] (2)= ( 9.937939e-02 ) sec TotalTime[MatrixElems] (3)= ( 1.340534e+00 ) sec MeanTimeInMatrixElems = ( 1.340534e+00 ) sec [Min,Max]TimeInMatrixElems = [ 1.340534e+00 , 1.340534e+00 ] sec ----------------------------------------------------------------------- TotalEventsComputed = 524288 EvtsPerSec[Rnd+Rmb+ME](123)= ( 3.572214e+05 ) sec^-1 EvtsPerSec[Rmb+ME] (23)= ( 3.641109e+05 ) sec^-1 EvtsPerSec[MatrixElems] (3)= ( 3.911039e+05 ) sec^-1 *********************************************************************** NumMatrixElements(notNan) = 524288 MeanMatrixElemValue = ( 1.371958e-02 +- 1.132119e-05 ) GeV^0 [Min,Max]MatrixElemValue = [ 6.071582e-03 , 3.374915e-02 ] GeV^0 StdDevMatrixElemValue = ( 8.197419e-03 ) GeV^0 MeanWeight = ( 4.515827e-01 +- 0.000000e+00 ) [Min,Max]Weight = [ 4.515827e-01 , 4.515827e-01 ] StdDevWeight = ( 0.000000e+00 ) *********************************************************************** 0a ProcInit : 0.000319 sec 0b MemAlloc : 0.027103 sec 0c GenCreat : 0.000906 sec 0d SGoodHel : 0.000186 sec 1a GenSeed : 0.000009 sec 1b GenRnGen : 0.027762 sec 2a RamboIni : 0.006784 sec 2b RamboFin : 0.092595 sec 3a SigmaKin : 1.340534 sec 4a DumpLoop : 0.004592 sec 8a CompStat : 0.003529 sec 9a GenDestr : 0.000083 sec 9b DumpScrn : 0.000231 sec 9c DumpJson : 0.000011 sec TOTAL : 1.504643 sec TOTAL (123) : 1.467684 sec TOTAL (23) : 1.439913 sec TOTAL (1) : 0.027771 sec TOTAL (2) : 0.099379 sec TOTAL (3) : 1.340534 sec ***********************************************************************
There is more vectorization, but it segfaults... ./check.exe -p 16384 32 1 Segmentation fault (core dumped) objdump -d -C check.exe | egrep 'vaddpd|vmul|vfmadd132pd|ymm' | wc -l 216
Using avx2 in -march, valgrind at least tells me it is a General Protection Fault ==481028== Process terminating with default action of signal 11 (SIGSEGV): dumping core ==481028== General Protection Fault ==481028== at 0x40876E: MG5_sm::oxzxxxM0(double const*, int, int, std::complex<double> (*) [4], int, int) (in /afs/cern.ch/user/a/avalassi/GPU2020/madgraph4gpu/epoch1/cuda/ee_mumu/SubProcesses/P1_Sigma_sm_epem_mupmum/check.exe) ==481028== by 0x40A48C: Proc::calculate_wavefunctions(int, double const*, double*, int) (in /afs/cern.ch/user/a/avalassi/GPU2020/madgraph4gpu/epoch1/cuda/ee_mumu/SubProcesses/P1_Sigma_sm_epem_mupmum/check.exe) ==481028== by 0x40AD16: Proc::sigmaKin_getGoodHel(double const*, double*, bool*, int) (in /afs/cern.ch/user/a/avalassi/GPU2020/madgraph4gpu/epoch1/cuda/ee_mumu/SubProcesses/P1_Sigma_sm_epem_mupmum/check.exe) ==481028== by 0x405207: main (in /afs/cern.ch/user/a/avalassi/GPU2020/madgraph4gpu/epoch1/cuda/ee_mumu/SubProcesses/P1_Sigma_sm_epem_mupmum/check.exe)
Performance is 1.25E6, slightly better than gcc9 1.15E6 but lower than Fortran 1.50E6 time ./build.none/check.exe -p 2048 256 12 *********************************************************************** NumBlocksPerGrid = 2048 NumThreadsPerBlock = 256 NumIterations = 12 ----------------------------------------------------------------------- FP precision = DOUBLE (nan=0) Complex type = STD::COMPLEX RanNumb memory layout = AOSOA[8] [HARDCODED FOR REPRODUCIBILITY] Momenta memory layout = AOSOA[1] == AOS Internal loops fptype_sv = VECTOR[1] == SCALAR (no SIMD) Random number generation = CURAND (C++ code) OMP threads / `nproc --all` = 1 / 4 MatrixElements compiler = clang 10.0.0 ----------------------------------------------------------------------- NumberOfEntries = 12 TotalTime[Rnd+Rmb+ME] (123) = ( 7.234199e+00 ) sec TotalTime[Rambo+ME] (23) = ( 6.911213e+00 ) sec TotalTime[RndNumGen] (1) = ( 3.229851e-01 ) sec TotalTime[Rambo] (2) = ( 1.849719e+00 ) sec TotalTime[MatrixElems] (3) = ( 5.061495e+00 ) sec MeanTimeInMatrixElems = ( 4.217912e-01 ) sec [Min,Max]TimeInMatrixElems = [ 4.214358e-01 , 4.223094e-01 ] sec ----------------------------------------------------------------------- TotalEventsComputed = 6291456 EvtsPerSec[Rnd+Rmb+ME](123) = ( 8.696825e+05 ) sec^-1 EvtsPerSec[Rmb+ME] (23) = ( 9.103258e+05 ) sec^-1 EvtsPerSec[MatrixElems] (3) = ( 1.243004e+06 ) sec^-1 *********************************************************************** NumMatrixElements(notNan) = 6291456 MeanMatrixElemValue = ( 1.371706e-02 +- 3.270315e-06 ) GeV^0 [Min,Max]MatrixElemValue = [ 6.071582e-03 , 3.374925e-02 ] GeV^0 StdDevMatrixElemValue = ( 8.202858e-03 ) GeV^0 MeanWeight = ( 4.515827e-01 +- 0.000000e+00 ) [Min,Max]Weight = [ 4.515827e-01 , 4.515827e-01 ] StdDevWeight = ( 0.000000e+00 ) *********************************************************************** 0a ProcInit : 0.000383 sec 0b MemAlloc : 0.070821 sec 0c GenCreat : 0.000904 sec 0d SGoodHel : 0.000438 sec 1a GenSeed : 0.000030 sec 1b GenRnGen : 0.322956 sec 2a RamboIni : 0.081141 sec 2b RamboFin : 1.768578 sec 3a SigmaKin : 5.061495 sec 4a DumpLoop : 0.074358 sec 8a CompStat : 0.084354 sec 9a GenDestr : 0.000020 sec 9b DumpScrn : 0.009514 sec 9c DumpJson : 0.000002 sec TOTAL : 7.474991 sec TOTAL (123) : 7.234199 sec TOTAL (23) : 6.911214 sec TOTAL (1) : 0.322985 sec TOTAL (2) : 1.849719 sec TOTAL (3) : 5.061495 sec *********************************************************************** real 0m7.499s user 0m7.376s sys 0m0.121s time ./build.none/gcheck.exe -p 2048 256 12 *********************************************************************** NumBlocksPerGrid = 2048 NumThreadsPerBlock = 256 NumIterations = 12 ----------------------------------------------------------------------- FP precision = DOUBLE (nan=0) Complex type = THRUST::COMPLEX RanNumb memory layout = AOSOA[8] [HARDCODED FOR REPRODUCIBILITY] Momenta memory layout = AOSOA[4] Random number generation = CURAND DEVICE (CUDA code) MatrixElements compiler = nvcc 11.0.221 ----------------------------------------------------------------------- NumberOfEntries = 12 TotalTime[Rnd+Rmb+ME] (123) = ( 9.123791e-02 ) sec TotalTime[Rambo+ME] (23) = ( 8.373227e-02 ) sec TotalTime[RndNumGen] (1) = ( 7.505641e-03 ) sec TotalTime[Rambo] (2) = ( 7.402575e-02 ) sec TotalTime[MatrixElems] (3) = ( 9.706521e-03 ) sec MeanTimeInMatrixElems = ( 8.088767e-04 ) sec [Min,Max]TimeInMatrixElems = [ 8.009510e-04 , 8.176020e-04 ] sec ----------------------------------------------------------------------- TotalEventsComputed = 6291456 EvtsPerSec[Rnd+Rmb+ME](123) = ( 6.895660e+07 ) sec^-1 EvtsPerSec[Rmb+ME] (23) = ( 7.513777e+07 ) sec^-1 EvtsPerSec[MatrixElems] (3) = ( 6.481680e+08 ) sec^-1 *********************************************************************** NumMatrixElements(notNan) = 6291456 MeanMatrixElemValue = ( 1.371706e-02 +- 3.270315e-06 ) GeV^0 [Min,Max]MatrixElemValue = [ 6.071582e-03 , 3.374925e-02 ] GeV^0 StdDevMatrixElemValue = ( 8.202858e-03 ) GeV^0 MeanWeight = ( 4.515827e-01 +- 0.000000e+00 ) [Min,Max]Weight = [ 4.515827e-01 , 4.515827e-01 ] StdDevWeight = ( 0.000000e+00 ) *********************************************************************** 00 CudaFree : 0.802752 sec 0a ProcInit : 0.000472 sec 0b MemAlloc : 0.032316 sec 0c GenCreat : 0.009958 sec 0d SGoodHel : 0.002051 sec 1a GenSeed : 0.000017 sec 1b GenRnGen : 0.007489 sec 2a RamboIni : 0.000106 sec 2b RamboFin : 0.000051 sec 2c CpDTHwgt : 0.006522 sec 2d CpDTHmom : 0.067347 sec 3a SigmaKin : 0.000081 sec 3b CpDTHmes : 0.009625 sec 4a DumpLoop : 0.079669 sec 8a CompStat : 0.046016 sec 9a GenDestr : 0.000063 sec 9b DumpScrn : 0.000268 sec 9c DumpJson : 0.000002 sec TOTAL : 1.064805 sec TOTAL (123) : 0.091238 sec TOTAL (23) : 0.083732 sec TOTAL (1) : 0.007506 sec TOTAL (2) : 0.074026 sec TOTAL (3) : 0.009707 sec *********************************************************************** real 0m1.365s user 0m0.447s sys 0m0.478s
…hout SIMD! Now AVX=none with gcc9 is 1.28E6, it was 1.15E6 (remember fortran is 1.50E6). It means that with AVX=none gcc9 and clang10 are completely comparable. Note however that the speedup between AVX=none and AVX=avx2 is lower than 4: 4.40E6 / 1.28E6 is only 3.4, we can do better... time ./build.none/check.exe -p 2048 256 12 *********************************************************************** NumBlocksPerGrid = 2048 NumThreadsPerBlock = 256 NumIterations = 12 ----------------------------------------------------------------------- FP precision = DOUBLE (nan=0) Complex type = STD::COMPLEX RanNumb memory layout = AOSOA[8] [HARDCODED FOR REPRODUCIBILITY] Momenta memory layout = AOSOA[1] == AOS Internal loops fptype_sv = VECTOR[1] == SCALAR (no SIMD) Random number generation = CURAND (C++ code) OMP threads / `nproc --all` = 1 / 4 MatrixElements compiler = gcc (GCC) 9.2.0 ----------------------------------------------------------------------- NumberOfEntries = 12 TotalTime[Rnd+Rmb+ME] (123) = ( 7.160223e+00 ) sec TotalTime[Rambo+ME] (23) = ( 6.836318e+00 ) sec TotalTime[RndNumGen] (1) = ( 3.239050e-01 ) sec TotalTime[Rambo] (2) = ( 1.939587e+00 ) sec TotalTime[MatrixElems] (3) = ( 4.896731e+00 ) sec MeanTimeInMatrixElems = ( 4.080609e-01 ) sec [Min,Max]TimeInMatrixElems = [ 4.074413e-01 , 4.092229e-01 ] sec ----------------------------------------------------------------------- TotalEventsComputed = 6291456 EvtsPerSec[Rnd+Rmb+ME](123) = ( 8.786676e+05 ) sec^-1 EvtsPerSec[Rmb+ME] (23) = ( 9.202989e+05 ) sec^-1 EvtsPerSec[MatrixElems] (3) = ( 1.284828e+06 ) sec^-1 *********************************************************************** NumMatrixElements(notNan) = 6291456 MeanMatrixElemValue = ( 1.371706e-02 +- 3.270315e-06 ) GeV^0 [Min,Max]MatrixElemValue = [ 6.071582e-03 , 3.374925e-02 ] GeV^0 StdDevMatrixElemValue = ( 8.202858e-03 ) GeV^0 MeanWeight = ( 4.515827e-01 +- 0.000000e+00 ) [Min,Max]Weight = [ 4.515827e-01 , 4.515827e-01 ] StdDevWeight = ( 0.000000e+00 ) *********************************************************************** 0a ProcInit : 0.000369 sec 0b MemAlloc : 0.070329 sec 0c GenCreat : 0.000909 sec 0d SGoodHel : 0.000105 sec 1a GenSeed : 0.000026 sec 1b GenRnGen : 0.323879 sec 2a RamboIni : 0.077785 sec 2b RamboFin : 1.861802 sec 3a SigmaKin : 4.896730 sec 4a DumpLoop : 0.073871 sec 8a CompStat : 0.025105 sec 9a GenDestr : 0.000082 sec 9b DumpScrn : 0.008952 sec 9c DumpJson : 0.000006 sec TOTAL : 7.339950 sec TOTAL (123) : 7.160223 sec TOTAL (23) : 6.836318 sec TOTAL (1) : 0.323905 sec TOTAL (2) : 1.939587 sec TOTAL (3) : 4.896730 sec *********************************************************************** real 0m7.362s user 0m7.236s sys 0m0.123s time ./build.avx2/check.exe -p 2048 256 12 *********************************************************************** NumBlocksPerGrid = 2048 NumThreadsPerBlock = 256 NumIterations = 12 ----------------------------------------------------------------------- FP precision = DOUBLE (nan=0) Complex type = STD::COMPLEX RanNumb memory layout = AOSOA[8] [HARDCODED FOR REPRODUCIBILITY] Momenta memory layout = AOSOA[4] Internal loops fptype_sv = VECTOR[4] (AVX2) Random number generation = CURAND (C++ code) OMP threads / `nproc --all` = 1 / 4 MatrixElements compiler = gcc (GCC) 9.2.0 ----------------------------------------------------------------------- NumberOfEntries = 12 TotalTime[Rnd+Rmb+ME] (123) = ( 3.598255e+00 ) sec TotalTime[Rambo+ME] (23) = ( 3.275359e+00 ) sec TotalTime[RndNumGen] (1) = ( 3.228953e-01 ) sec TotalTime[Rambo] (2) = ( 1.845746e+00 ) sec TotalTime[MatrixElems] (3) = ( 1.429614e+00 ) sec MeanTimeInMatrixElems = ( 1.191345e-01 ) sec [Min,Max]TimeInMatrixElems = [ 1.187156e-01 , 1.201074e-01 ] sec ----------------------------------------------------------------------- TotalEventsComputed = 6291456 EvtsPerSec[Rnd+Rmb+ME](123) = ( 1.748474e+06 ) sec^-1 EvtsPerSec[Rmb+ME] (23) = ( 1.920844e+06 ) sec^-1 EvtsPerSec[MatrixElems] (3) = ( 4.400809e+06 ) sec^-1 *********************************************************************** NumMatrixElements(notNan) = 6291456 MeanMatrixElemValue = ( 1.371706e-02 +- 3.270315e-06 ) GeV^0 [Min,Max]MatrixElemValue = [ 6.071582e-03 , 3.374925e-02 ] GeV^0 StdDevMatrixElemValue = ( 8.202858e-03 ) GeV^0 MeanWeight = ( 4.515827e-01 +- 0.000000e+00 ) [Min,Max]Weight = [ 4.515827e-01 , 4.515827e-01 ] StdDevWeight = ( 0.000000e+00 ) *********************************************************************** 0a ProcInit : 0.000379 sec 0b MemAlloc : 0.070129 sec 0c GenCreat : 0.000908 sec 0d SGoodHel : 0.000100 sec 1a GenSeed : 0.000025 sec 1b GenRnGen : 0.322871 sec 2a RamboIni : 0.110108 sec 2b RamboFin : 1.735638 sec 3a SigmaKin : 1.429614 sec 4a DumpLoop : 0.075421 sec 8a CompStat : 0.024105 sec 9a GenDestr : 0.000091 sec 9b DumpScrn : 0.008895 sec 9c DumpJson : 0.000002 sec TOTAL : 3.778286 sec TOTAL (123) : 3.598255 sec TOTAL (23) : 3.275360 sec TOTAL (1) : 0.322895 sec TOTAL (2) : 1.845746 sec TOTAL (3) : 1.429614 sec *********************************************************************** real 0m3.799s user 0m3.677s sys 0m0.120s
Fix conflicts in epoch1/cuda/ee_mumu/src/Makefile
…omp.sp Fix conflicts in epoch1/cuda/ee_mumu/SubProcesses/Makefile NB: this commit is about using -lgomp for both gcc and clang (removing an if), however in this branch -lgomp was already used (the if was a noop), because of a mistake in a previous merge commit.
Fix conflicts in epoch1/cuda/ee_mumu/SubProcesses/Makefile This merge essentially merges the clang PR madgraph5#134
/cvmfs/sft.cern.ch/lcg/releases/gcc/10.1.0-6f386/x86_64-centos7/bin/g++ -O3 -std=c++17 -I. -I../../src -I../../../../../tools -I../../../../../test/googletest/googletest/include -I../../../../../test/include -Wall -Wshadow -Wextra -fopenmp -DMGONGPU_COMMONRAND_ONHOST -ffast-math -c runTest.cc -o build.none/runTest.o
runTest.cc: In member function ‘virtual double CPUTest::getMomentum(std::size_t, unsigned int, unsigned int) const’:
runTest.cc:86:98: error: invalid types ‘double[const long unsigned int]’ for array subscript
86 | ::npar * mgOnGpu::np4 + particle * mgOnGpu::np4 + component][ieppM];
| ^
Note however that the test segfaults on gcc10, but not on gcc9
This happens in both none and avx2
Member
Author
|
While waiting to merge the PR, I also included a few patches that were in the further branch klas2. Without them, |
Fix conflicts: epoch1/cuda/ee_mumu/SubProcesses/P1_Sigma_sm_epem_mupmum/runTest.cc
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
See #71
This is work in progress... comments welcome (preferably on the PR, or also directly on the code)