Merge epoch2 and epoch1 - second part (still without CPPProcess)#149
Conversation
…value in implementation)
… - copy it to ep2 Epoch2 before fastmath: time ./check.exe -p 2048 256 12 *********************************************************************** NumBlocksPerGrid = 2048 NumThreadsPerBlock = 256 NumIterations = 12 ----------------------------------------------------------------------- FP precision = DOUBLE (nan=0) Complex type = STD::COMPLEX RanNumb memory layout = AOSOA[4] Momenta memory layout = AOSOA[4] Random number generation = CURAND (C++ code) ----------------------------------------------------------------------- NumberOfEntries = 12 TotalTime[Rnd+Rmb+ME] (123)= ( 9.806006e+00 ) sec TotalTime[Rambo+ME] (23)= ( 9.456839e+00 ) sec TotalTime[RndNumGen] (1)= ( 3.491671e-01 ) sec TotalTime[Rambo] (2)= ( 2.018251e+00 ) sec TotalTime[MatrixElems] (3)= ( 7.438588e+00 ) sec MeanTimeInMatrixElems = ( 6.198823e-01 ) sec [Min,Max]TimeInMatrixElems = [ 6.183559e-01 , 6.259246e-01 ] sec ----------------------------------------------------------------------- TotalEventsComputed = 6291456 EvtsPerSec[Rnd+Rmb+ME](123)= ( 6.415921e+05 ) sec^-1 EvtsPerSec[Rmb+ME] (23)= ( 6.652811e+05 ) sec^-1 EvtsPerSec[MatrixElems] (3)= ( 8.457864e+05 ) sec^-1 *********************************************************************** NumMatrixElements(notNan) = 6291456 MeanMatrixElemValue = ( 1.372152e-02 +- 3.269516e-06 ) GeV^0 [Min,Max]MatrixElemValue = [ 6.071581e-03 , 3.374925e-02 ] GeV^0 StdDevMatrixElemValue = ( 8.200854e-03 ) GeV^0 MeanWeight = ( 4.515827e-01 +- 0.000000e+00 ) [Min,Max]Weight = [ 4.515827e-01 , 4.515827e-01 ] StdDevWeight = ( 0.000000e+00 ) *********************************************************************** 0a ProcInit : 0.000397 sec 0b MemAlloc : 0.000043 sec 0c GenCreat : 0.000955 sec 1a GenSeed : 0.000031 sec 1b GenRnGen : 0.349136 sec 2a RamboIni : 0.138318 sec 2b RamboFin : 1.879934 sec 3a SigmaKin : 7.438588 sec 4a DumpLoop : 0.087978 sec 8a CompStat : 0.045155 sec 9a GenDestr : 0.000113 sec 9b DumpScrn : 0.000223 sec 9c DumpJson : 0.000001 sec TOTAL : 9.940873 sec TOTAL (123) : 9.806006 sec TOTAL (23) : 9.456840 sec TOTAL (1) : 0.349167 sec TOTAL (2) : 2.018251 sec TOTAL (3) : 7.438588 sec *********************************************************************** real 0m9.971s user 0m9.812s sys 0m0.157s Epoch2 after fastmath: NOT FASTER (?!) time ./check.exe -p 2048 256 12 *********************************************************************** NumBlocksPerGrid = 2048 NumThreadsPerBlock = 256 NumIterations = 12 ----------------------------------------------------------------------- FP precision = DOUBLE (nan=0) Complex type = STD::COMPLEX RanNumb memory layout = AOSOA[4] Momenta memory layout = AOSOA[4] Random number generation = CURAND (C++ code) ----------------------------------------------------------------------- NumberOfEntries = 12 TotalTime[Rnd+Rmb+ME] (123)= ( 9.747692e+00 ) sec TotalTime[Rambo+ME] (23)= ( 9.397507e+00 ) sec TotalTime[RndNumGen] (1)= ( 3.501850e-01 ) sec TotalTime[Rambo] (2)= ( 1.976519e+00 ) sec TotalTime[MatrixElems] (3)= ( 7.420988e+00 ) sec MeanTimeInMatrixElems = ( 6.184157e-01 ) sec [Min,Max]TimeInMatrixElems = [ 6.178201e-01 , 6.216142e-01 ] sec ----------------------------------------------------------------------- TotalEventsComputed = 6291456 EvtsPerSec[Rnd+Rmb+ME](123)= ( 6.454303e+05 ) sec^-1 EvtsPerSec[Rmb+ME] (23)= ( 6.694814e+05 ) sec^-1 EvtsPerSec[MatrixElems] (3)= ( 8.477922e+05 ) sec^-1 *********************************************************************** NumMatrixElements(notNan) = 6291456 MeanMatrixElemValue = ( 1.372152e-02 +- 3.269516e-06 ) GeV^0 [Min,Max]MatrixElemValue = [ 6.071581e-03 , 3.374925e-02 ] GeV^0 StdDevMatrixElemValue = ( 8.200854e-03 ) GeV^0 MeanWeight = ( 4.515827e-01 +- 0.000000e+00 ) [Min,Max]Weight = [ 4.515827e-01 , 4.515827e-01 ] StdDevWeight = ( 0.000000e+00 ) *********************************************************************** 0a ProcInit : 0.000400 sec 0b MemAlloc : 0.000043 sec 0c GenCreat : 0.001004 sec 1a GenSeed : 0.000032 sec 1b GenRnGen : 0.350153 sec 2a RamboIni : 0.140705 sec 2b RamboFin : 1.835814 sec 3a SigmaKin : 7.420989 sec 4a DumpLoop : 0.083478 sec 8a CompStat : 0.045091 sec 9a GenDestr : 0.000119 sec 9b DumpScrn : 0.000269 sec 9c DumpJson : 0.000001 sec TOTAL : 9.878097 sec TOTAL (123) : 9.747692 sec TOTAL (23) : 9.397507 sec TOTAL (1) : 0.350185 sec TOTAL (2) : 1.976519 sec TOTAL (3) : 7.420989 sec *********************************************************************** real 0m9.908s user 0m9.769s sys 0m0.138s
…osmetics and copy ep1 to ep2 What ep1 had which is now added also to ep2: OMP, fastmath, Wextra, clang patch, host info Using fastmath also here, the speed does increase in epoch2 (note that HelAmps is compiled here via an include, so it makes sense) Epoch2: time ./check.exe -p 2048 256 12 *********************************************************************** NumBlocksPerGrid = 2048 NumThreadsPerBlock = 256 NumIterations = 12 ----------------------------------------------------------------------- FP precision = DOUBLE (nan=0) Complex type = STD::COMPLEX RanNumb memory layout = AOSOA[4] Momenta memory layout = AOSOA[4] Random number generation = CURAND (C++ code) ----------------------------------------------------------------------- NumberOfEntries = 12 TotalTime[Rnd+Rmb+ME] (123)= ( 8.066252e+00 ) sec TotalTime[Rambo+ME] (23)= ( 7.716077e+00 ) sec TotalTime[RndNumGen] (1)= ( 3.501755e-01 ) sec TotalTime[Rambo] (2)= ( 1.981157e+00 ) sec TotalTime[MatrixElems] (3)= ( 5.734920e+00 ) sec MeanTimeInMatrixElems = ( 4.779100e-01 ) sec [Min,Max]TimeInMatrixElems = [ 4.771928e-01 , 4.813840e-01 ] sec ----------------------------------------------------------------------- TotalEventsComputed = 6291456 EvtsPerSec[Rnd+Rmb+ME](123)= ( 7.799726e+05 ) sec^-1 EvtsPerSec[Rmb+ME] (23)= ( 8.153698e+05 ) sec^-1 EvtsPerSec[MatrixElems] (3)= ( 1.097043e+06 ) sec^-1 *********************************************************************** NumMatrixElements(notNan) = 6291456 MeanMatrixElemValue = ( 1.372152e-02 +- 3.269516e-06 ) GeV^0 [Min,Max]MatrixElemValue = [ 6.071581e-03 , 3.374925e-02 ] GeV^0 StdDevMatrixElemValue = ( 8.200854e-03 ) GeV^0 MeanWeight = ( 4.515827e-01 +- 0.000000e+00 ) [Min,Max]Weight = [ 4.515827e-01 , 4.515827e-01 ] StdDevWeight = ( 0.000000e+00 ) *********************************************************************** 0a ProcInit : 0.000383 sec 0b MemAlloc : 0.000041 sec 0c GenCreat : 0.001009 sec 1a GenSeed : 0.000049 sec 1b GenRnGen : 0.350127 sec 2a RamboIni : 0.137961 sec 2b RamboFin : 1.843195 sec 3a SigmaKin : 5.734920 sec 4a DumpLoop : 0.085327 sec 8a CompStat : 0.027027 sec 9a GenDestr : 0.000147 sec 9b DumpScrn : 0.000251 sec 9c DumpJson : 0.000001 sec TOTAL : 8.180439 sec TOTAL (123) : 8.066252 sec TOTAL (23) : 7.716077 sec TOTAL (1) : 0.350176 sec TOTAL (2) : 1.981157 sec TOTAL (3) : 5.734920 sec *********************************************************************** real 0m8.211s user 0m8.072s sys 0m0.137s Note that epoch1 is always a bit faster... Epoch1: time ./check.exe -p 2048 256 12 *********************************************************************** NumBlocksPerGrid = 2048 NumThreadsPerBlock = 256 NumIterations = 12 ----------------------------------------------------------------------- FP precision = DOUBLE (nan=0) Complex type = STD::COMPLEX RanNumb memory layout = AOSOA[4] Momenta memory layout = AOSOA[4] Random number generation = CURAND (C++ code) OMP threads / `nproc --all` = 1 / 4 MatrixElements compiler = gcc (GCC) 9.2.0 ----------------------------------------------------------------------- NumberOfEntries = 12 TotalTime[Rnd+Rmb+ME] (123) = ( 7.710680e+00 ) sec TotalTime[Rambo+ME] (23) = ( 7.382994e+00 ) sec TotalTime[RndNumGen] (1) = ( 3.276863e-01 ) sec TotalTime[Rambo] (2) = ( 1.939835e+00 ) sec TotalTime[MatrixElems] (3) = ( 5.443159e+00 ) sec MeanTimeInMatrixElems = ( 4.535966e-01 ) sec [Min,Max]TimeInMatrixElems = [ 4.533969e-01 , 4.538179e-01 ] sec ----------------------------------------------------------------------- TotalEventsComputed = 6291456 EvtsPerSec[Rnd+Rmb+ME](123) = ( 8.159405e+05 ) sec^-1 EvtsPerSec[Rmb+ME] (23) = ( 8.521551e+05 ) sec^-1 EvtsPerSec[MatrixElems] (3) = ( 1.155846e+06 ) sec^-1 *********************************************************************** NumMatrixElements(notNan) = 6291456 MeanMatrixElemValue = ( 1.372152e-02 +- 3.269516e-06 ) GeV^0 [Min,Max]MatrixElemValue = [ 6.071582e-03 , 3.374925e-02 ] GeV^0 StdDevMatrixElemValue = ( 8.200854e-03 ) GeV^0 MeanWeight = ( 4.515827e-01 +- 0.000000e+00 ) [Min,Max]Weight = [ 4.515827e-01 , 4.515827e-01 ] StdDevWeight = ( 0.000000e+00 ) *********************************************************************** 0a ProcInit : 0.000411 sec 0b MemAlloc : 0.074275 sec 0c GenCreat : 0.000958 sec 1a GenSeed : 0.000023 sec 1b GenRnGen : 0.327663 sec 2a RamboIni : 0.100796 sec 2b RamboFin : 1.839039 sec 3a SigmaKin : 5.443159 sec 4a DumpLoop : 0.082644 sec 8a CompStat : 0.027072 sec 9a GenDestr : 0.000104 sec 9b DumpScrn : 0.013933 sec 9c DumpJson : 0.000006 sec TOTAL : 7.910083 sec TOTAL (123) : 7.710680 sec TOTAL (23) : 7.382994 sec TOTAL (1) : 0.327686 sec TOTAL (2) : 1.939835 sec TOTAL (3) : 5.443159 sec *********************************************************************** real 0m7.939s user 0m7.790s sys 0m0.147s Conversely, epoch2 is 10% faster than epoch1 in CUDA??? Epoch2: time ./gcheck.exe -p 2048 256 12 *********************************************************************** NumBlocksPerGrid = 2048 NumThreadsPerBlock = 256 NumIterations = 12 ----------------------------------------------------------------------- FP precision = DOUBLE (nan=0) Complex type = THRUST::COMPLEX RanNumb memory layout = AOSOA[4] Momenta memory layout = AOSOA[4] Wavefunction GPU memory = LOCAL Random number generation = CURAND DEVICE (CUDA code) ----------------------------------------------------------------------- NumberOfEntries = 12 TotalTime[Rnd+Rmb+ME] (123)= ( 1.042367e-01 ) sec TotalTime[Rambo+ME] (23)= ( 9.679775e-02 ) sec TotalTime[RndNumGen] (1)= ( 7.438907e-03 ) sec TotalTime[Rambo] (2)= ( 8.743204e-02 ) sec TotalTime[MatrixElems] (3)= ( 9.365707e-03 ) sec MeanTimeInMatrixElems = ( 7.804756e-04 ) sec [Min,Max]TimeInMatrixElems = [ 7.767680e-04 , 7.837020e-04 ] sec ----------------------------------------------------------------------- TotalEventsComputed = 6291456 EvtsPerSec[Rnd+Rmb+ME](123)= ( 6.035742e+07 ) sec^-1 EvtsPerSec[Rmb+ME] (23)= ( 6.499589e+07 ) sec^-1 EvtsPerSec[MatrixElems] (3)= ( 6.717545e+08 ) sec^-1 *********************************************************************** NumMatrixElements(notNan) = 6291456 MeanMatrixElemValue = ( 1.372152e-02 +- 3.269516e-06 ) GeV^0 [Min,Max]MatrixElemValue = [ 6.071581e-03 , 3.374925e-02 ] GeV^0 StdDevMatrixElemValue = ( 8.200854e-03 ) GeV^0 MeanWeight = ( 4.515827e-01 +- 0.000000e+00 ) [Min,Max]Weight = [ 4.515827e-01 , 4.515827e-01 ] StdDevWeight = ( 0.000000e+00 ) *********************************************************************** 00 CudaFree : 2.037707 sec 0a ProcInit : 0.000523 sec 0b MemAlloc : 0.035856 sec 0c GenCreat : 0.009784 sec 0d SGoodHel : 0.001597 sec 1a GenSeed : 0.000021 sec 1b GenRnGen : 0.007418 sec 2a RamboIni : 0.000088 sec 2b RamboFin : 0.000045 sec 2c CpDTHwgt : 0.007396 sec 2d CpDTHmom : 0.079903 sec 3a SigmaKin : 0.000087 sec 3b CpDTHmes : 0.009279 sec 4a DumpLoop : 0.087360 sec 8a CompStat : 0.044967 sec 9a GenDestr : 0.000068 sec 9b DumpScrn : 0.000254 sec 9c DumpJson : 0.000002 sec TOTAL : 2.322353 sec TOTAL (123) : 0.104237 sec TOTAL (23) : 0.096798 sec TOTAL (1) : 0.007439 sec TOTAL (2) : 0.087432 sec TOTAL (3) : 0.009366 sec *********************************************************************** real 0m2.630s user 0m0.426s sys 0m0.781s Epoch1: time ./gcheck.exe -p 2048 256 12 *********************************************************************** NumBlocksPerGrid = 2048 NumThreadsPerBlock = 256 NumIterations = 12 ----------------------------------------------------------------------- FP precision = DOUBLE (nan=0) Complex type = THRUST::COMPLEX RanNumb memory layout = AOSOA[4] Momenta memory layout = AOSOA[4] Wavefunction GPU memory = LOCAL Random number generation = CURAND DEVICE (CUDA code) MatrixElements compiler = nvcc 11.0.221 ----------------------------------------------------------------------- NumberOfEntries = 12 TotalTime[Rnd+Rmb+ME] (123) = ( 1.056586e-01 ) sec TotalTime[Rambo+ME] (23) = ( 9.805914e-02 ) sec TotalTime[RndNumGen] (1) = ( 7.599440e-03 ) sec TotalTime[Rambo] (2) = ( 8.761816e-02 ) sec TotalTime[MatrixElems] (3) = ( 1.044098e-02 ) sec MeanTimeInMatrixElems = ( 8.700821e-04 ) sec [Min,Max]TimeInMatrixElems = [ 8.588060e-04 , 8.841980e-04 ] sec ----------------------------------------------------------------------- TotalEventsComputed = 6291456 EvtsPerSec[Rnd+Rmb+ME](123) = ( 5.954515e+07 ) sec^-1 EvtsPerSec[Rmb+ME] (23) = ( 6.415981e+07 ) sec^-1 EvtsPerSec[MatrixElems] (3) = ( 6.025730e+08 ) sec^-1 *********************************************************************** NumMatrixElements(notNan) = 6291456 MeanMatrixElemValue = ( 1.372152e-02 +- 3.269516e-06 ) GeV^0 [Min,Max]MatrixElemValue = [ 6.071582e-03 , 3.374925e-02 ] GeV^0 StdDevMatrixElemValue = ( 8.200854e-03 ) GeV^0 MeanWeight = ( 4.515827e-01 +- 0.000000e+00 ) [Min,Max]Weight = [ 4.515827e-01 , 4.515827e-01 ] StdDevWeight = ( 0.000000e+00 ) *********************************************************************** 00 CudaFree : 1.039487 sec 0a ProcInit : 0.000524 sec 0b MemAlloc : 0.035999 sec 0c GenCreat : 0.011516 sec 0d SGoodHel : 0.001738 sec 1a GenSeed : 0.000021 sec 1b GenRnGen : 0.007579 sec 2a RamboIni : 0.000098 sec 2b RamboFin : 0.000061 sec 2c CpDTHwgt : 0.007369 sec 2d CpDTHmom : 0.080091 sec 3a SigmaKin : 0.000084 sec 3b CpDTHmes : 0.010357 sec 4a DumpLoop : 0.087430 sec 8a CompStat : 0.045176 sec 9a GenDestr : 0.000067 sec 9b DumpScrn : 0.000222 sec 9c DumpJson : 0.000002 sec TOTAL : 1.327819 sec TOTAL (123) : 0.105659 sec TOTAL (23) : 0.098059 sec TOTAL (1) : 0.007599 sec TOTAL (2) : 0.087618 sec TOTAL (3) : 0.010441 sec *********************************************************************** real 0m1.636s user 0m0.523s sys 0m0.867s
…smetics) to ep2 Minimal changes in epoch1: - remove unused headers in epoch1 - remove two empty lines in the code doing the performance dump Port to epoch2 many changes from epoch1: - add omp.h in epoch2 - use the ep1 printout about '-d' also in epoch2 - use the ep1 printout about OMP_NUM_THREADS also in epoch2 - export OMP_NUM_THREADS=1 if not set also in epoch2 - initialize T() in hstMakeUnique also in epoch2 - comment out unused stdwtim also in epoch2 - add one space per line in the performance dump also in epoch2 - add OMP info in the performance dump also in epoch2 - add gcc compiler info in the performance dump also in epoch2 - return 0 at the end of main also in epoch2
…smetics) to ep2 Minimal changes in epoch1: - remove unused headers in epoch1 - remove two empty lines in the code doing the performance dump Port to epoch2 many changes from epoch1: - add omp.h in epoch2 - use the ep1 printout about '-d' also in epoch2 - use the ep1 printout about OMP_NUM_THREADS also in epoch2 - export OMP_NUM_THREADS=1 if not set also in epoch2 - initialize T() in hstMakeUnique also in epoch2 - comment out unused stdwtim also in epoch2 - add one space per line in the performance dump also in epoch2 - add OMP info in the performance dump also in epoch2 - [commented out] add gcc compiler info in the performance dump also in epoch2 - return 0 at the end of main also in epoch2 No change in performance in epoch2: c++ 1.09E6, cuda 6.71E8
…rs as in epoch1
Indeed, check.cc was not compiling in SINGLE mode otherwise:
Makefile:44: CUDA_HOME is not set or is invalid. Export CUDA_HOME to compile with cuda
/cvmfs/sft.cern.ch/lcg/releases/gcc/9.2.0-afc57/x86_64-centos7/bin/g++ -O3 -std=c++11 -I. -I../../src -I../../../../../tools -Wall -Wshadow -Wextra -fopenmp -DMGONGPU_COMMONRAND_ONHOST -ffast-math -c check.cc -o check.o
check.cc: In function ‘int main(int, char**)’:
check.cc:312:81: error: conversion from ‘vector<float>’ to non-scalar type ‘vector<double>’ requested
312 | std::vector<double> commonRnd = commonRandomPromises[iiter].get_future().get();
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~
make: *** [check.o] Error 1
Note (issue madgraph5#143) that neither epoch2 nor epoch1 build in single precision, anyway...
…piler This also requires adding Process::getCompiler to ep2 CPPProcess.cc/h. Now check.cc is identical in both epoch2 and epoch1 (and runTest.cc is almost identical, except for the test name). Will now include PR madgraph5#144 for single precision in epoch1, and will copy check.cc again (and runTest.cc with some changes). Epoch2 baseline remains epoch2 C++ 1.10e6, cuda 6.6e8 time ./check.exe -p 2048 256 12 *********************************************************************** NumBlocksPerGrid = 2048 NumThreadsPerBlock = 256 NumIterations = 12 ----------------------------------------------------------------------- FP precision = DOUBLE (nan=0) Complex type = STD::COMPLEX RanNumb memory layout = AOSOA[4] Momenta memory layout = AOSOA[4] Random number generation = CURAND (C++ code) OMP threads / `nproc --all` = 1 / 4 MatrixElements compiler = gcc (GCC) 9.2.0 ----------------------------------------------------------------------- NumberOfEntries = 12 TotalTime[Rnd+Rmb+ME] (123) = ( 7.968041e+00 ) sec TotalTime[Rambo+ME] (23) = ( 7.643061e+00 ) sec TotalTime[RndNumGen] (1) = ( 3.249804e-01 ) sec TotalTime[Rambo] (2) = ( 1.928639e+00 ) sec TotalTime[MatrixElems] (3) = ( 5.714422e+00 ) sec MeanTimeInMatrixElems = ( 4.762018e-01 ) sec [Min,Max]TimeInMatrixElems = [ 4.760149e-01 , 4.765775e-01 ] sec ----------------------------------------------------------------------- TotalEventsComputed = 6291456 EvtsPerSec[Rnd+Rmb+ME](123) = ( 7.895863e+05 ) sec^-1 EvtsPerSec[Rmb+ME] (23) = ( 8.231592e+05 ) sec^-1 EvtsPerSec[MatrixElems] (3) = ( 1.100979e+06 ) sec^-1 *********************************************************************** NumMatrixElements(notNan) = 6291456 MeanMatrixElemValue = ( 1.372152e-02 +- 3.269516e-06 ) GeV^0 [Min,Max]MatrixElemValue = [ 6.071581e-03 , 3.374925e-02 ] GeV^0 StdDevMatrixElemValue = ( 8.200854e-03 ) GeV^0 MeanWeight = ( 4.515827e-01 +- 0.000000e+00 ) [Min,Max]Weight = [ 4.515827e-01 , 4.515827e-01 ] StdDevWeight = ( 0.000000e+00 ) *********************************************************************** time ./gcheck.exe -p 2048 256 12 *********************************************************************** NumBlocksPerGrid = 2048 NumThreadsPerBlock = 256 NumIterations = 12 ----------------------------------------------------------------------- FP precision = DOUBLE (nan=0) Complex type = THRUST::COMPLEX RanNumb memory layout = AOSOA[4] Momenta memory layout = AOSOA[4] Wavefunction GPU memory = LOCAL Random number generation = CURAND DEVICE (CUDA code) MatrixElements compiler = nvcc 11.0.221 ----------------------------------------------------------------------- NumberOfEntries = 12 TotalTime[Rnd+Rmb+ME] (123) = ( 1.044441e-01 ) sec TotalTime[Rambo+ME] (23) = ( 9.709213e-02 ) sec TotalTime[RndNumGen] (1) = ( 7.351930e-03 ) sec TotalTime[Rambo] (2) = ( 8.758798e-02 ) sec TotalTime[MatrixElems] (3) = ( 9.504147e-03 ) sec MeanTimeInMatrixElems = ( 7.920122e-04 ) sec [Min,Max]TimeInMatrixElems = [ 7.825940e-04 , 8.001750e-04 ] sec ----------------------------------------------------------------------- TotalEventsComputed = 6291456 EvtsPerSec[Rnd+Rmb+ME](123) = ( 6.023757e+07 ) sec^-1 EvtsPerSec[Rmb+ME] (23) = ( 6.479882e+07 ) sec^-1 EvtsPerSec[MatrixElems] (3) = ( 6.619696e+08 ) sec^-1 *********************************************************************** NumMatrixElements(notNan) = 6291456 MeanMatrixElemValue = ( 1.372152e-02 +- 3.269516e-06 ) GeV^0 [Min,Max]MatrixElemValue = [ 6.071581e-03 , 3.374925e-02 ] GeV^0 StdDevMatrixElemValue = ( 8.200854e-03 ) GeV^0 MeanWeight = ( 4.515827e-01 +- 0.000000e+00 ) [Min,Max]Weight = [ 4.515827e-01 , 4.515827e-01 ] StdDevWeight = ( 0.000000e+00 ) ***********************************************************************
…adgraph5#144 Note that now ep2 and ep1 runTest.cc are identical except for the test name EP2/EP1
…ferent names in epoch_process_id.h
|
I have decided to split this further into two PR. I have done everything except CPPProcess, but this is the most complex part (and I actually even see a minor performance differences). I will split that out in a third PR. Recap about issue #139
More in detail about this PR #149 below. In src:
In SubProcesses and below:
Note: at this stage, epoch1 is slightly faster than epoch2 in c++, but the inverse in CUDA.
First batch of changes Minimal changes in epoch1:
Port to epoch2 many changes from epoch1:
7bis) runTest.cc A large batch of additional changes (mainly in PR #144) came from fixing epoch2 check.cc to use fptype for random numbers as in epoch1. This triggered many additional checks about single precision, included in PR #144, which also includes a better treatment of NaNs. This is all at the time of this PR (after some previous ones). Then the rest will be about CPPProcess. |
|
Self-merging. |
This is the PR to complete issue #139.
I keep it as WIP for now, it's 80% done but still needs a few (quite important as performance-relevant) tweaks.