het + epoch1/epoch2#153
Closed
valassi wants to merge 72 commits into
Closed
Conversation
./hcheck.exe -p 16384 32 1 *********************************************************************** NumBlocksPerGrid = 16384 NumThreadsPerBlock = 32 NumIterations = 1 ----------------------------------------------------------------------- FP precision = DOUBLE Complex type = THRUST::COMPLEX RanNumb memory layout = AOSOA[4] Momenta memory layout = AOSOA[4] Random number generation = CURAND DEVICE (CUDA code) Wavefunction GPU memory = LOCAL ----------------------------------------------------------------------- NumIterations = 1 TotalTime[Rnd+Rmb+ME] (123)= ( 7.312938e-03 ) sec TotalTime[Rambo+ME] (23)= ( 6.714818e-03 ) sec TotalTime[RndNumGen] (1)= ( 5.981200e-04 ) sec TotalTime[Rambo] (2)= ( 5.945168e-03 ) sec TotalTime[MatrixElems] (3)= ( 7.696500e-04 ) sec MeanTimeInMatrixElems = ( 7.696500e-04 ) sec [Min,Max]TimeInMatrixElems = [ 7.696500e-04 , 7.696500e-04 ] sec ----------------------------------------------------------------------- TotalEventsComputed = 524288 (nan=0) EvtsPerSec[Rnd+Rmb+ME](123)= ( 7.169321e+07 ) sec^-1 EvtsPerSec[Rmb+ME] (23)= ( 7.807926e+07 ) sec^-1 EvtsPerSec[MatrixElems] (3)= ( 6.812032e+08 ) sec^-1 *********************************************************************** NumMatrixElements(notNan) = 524288 MeanMatrixElemValue = ( 1.371958e-02 +- 1.132119e-05 ) GeV^0 [Min,Max]MatrixElemValue = [ 6.071582e-03 , 3.374915e-02 ] GeV^0 StdDevMatrixElemValue = ( 8.197419e-03 ) GeV^0 MeanWeight = ( 4.515827e-01 +- 0.000000e+00 ) [Min,Max]Weight = [ 4.515827e-01 , 4.515827e-01 ] StdDevWeight = ( 0.000000e+00 ) *********************************************************************** (GPU) 00 CudaFree : 0.872603 sec (GPU) 0a ProcInit : 0.000233 sec (GPU) 0b MemAlloc : 0.035793 sec (GPU) 0c GenCreat : 0.009814 sec (GPU) 0d SGoodHel : 0.001759 sec (GPU) 1a GenSeed : 0.000009 sec (GPU) 1b GenRnGen : 0.000589 sec (GPU) 2a RamboIni : 0.000022 sec (GPU) 2b RamboFin : 0.000014 sec (GPU) 2c CpDTHwgt : 0.000502 sec (GPU) 2d CpDTHmom : 0.005407 sec (GPU) 3a SigmaKin : 0.000014 sec (GPU) 3b CpDTHmes : 0.000756 sec (GPU) 4a DumpLoop : 0.004765 sec (GPU) 8a CompStat : 0.003540 sec (GPU) 9a GenDestr : 0.000048 sec (GPU) 9b DumpScrn : 0.000044 sec (GPU) 9c DumpJson : 0.000007 sec (GPU) TOTAL : 0.935918 sec (GPU) TOTAL (123) : 0.007313 sec (GPU) TOTAL (23) : 0.006715 sec (GPU) TOTAL (1) : 0.000598 sec (GPU) TOTAL (2) : 0.005945 sec (GPU) TOTAL (3) : 0.000770 sec *********************************************************************** *********************************************************************** NumBlocksPerGrid = 16384 NumThreadsPerBlock = 32 NumIterations = 1 ----------------------------------------------------------------------- FP precision = DOUBLE Complex type = STD::COMPLEX RanNumb memory layout = AOSOA[4] Momenta memory layout = AOSOA[4] Random number generation = CURAND (C++ code) OMP threads / maxthreads = 4 / 4 ----------------------------------------------------------------------- NumIterations = 1 TotalTime[Rnd+Rmb+ME] (123)= ( 5.725351e-01 ) sec TotalTime[Rambo+ME] (23)= ( 5.449318e-01 ) sec TotalTime[RndNumGen] (1)= ( 2.760323e-02 ) sec TotalTime[Rambo] (2)= ( 9.914417e-02 ) sec TotalTime[MatrixElems] (3)= ( 4.457877e-01 ) sec MeanTimeInMatrixElems = ( 4.457877e-01 ) sec [Min,Max]TimeInMatrixElems = [ 4.457877e-01 , 4.457877e-01 ] sec ----------------------------------------------------------------------- TotalEventsComputed = 524288 (nan=0) EvtsPerSec[Rnd+Rmb+ME](123)= ( 9.157308e+05 ) sec^-1 EvtsPerSec[Rmb+ME] (23)= ( 9.621167e+05 ) sec^-1 EvtsPerSec[MatrixElems] (3)= ( 1.176094e+06 ) sec^-1 *********************************************************************** NumMatrixElements(notNan) = 524288 MeanMatrixElemValue = ( 1.371958e-02 +- 1.132119e-05 ) GeV^0 [Min,Max]MatrixElemValue = [ 6.071582e-03 , 3.374915e-02 ] GeV^0 StdDevMatrixElemValue = ( 8.197419e-03 ) GeV^0 MeanWeight = ( 4.515827e-01 +- 0.000000e+00 ) [Min,Max]Weight = [ 4.515827e-01 , 4.515827e-01 ] StdDevWeight = ( 0.000000e+00 ) *********************************************************************** (CPU) 0a ProcInit : 0.000331 sec (CPU) 0b MemAlloc : 0.025358 sec (CPU) 0c GenCreat : 0.000915 sec (CPU) 1a GenSeed : 0.000009 sec (CPU) 1b GenRnGen : 0.027595 sec (CPU) 2a RamboIni : 0.006872 sec (CPU) 2b RamboFin : 0.092273 sec (CPU) 3a SigmaKin : 0.445788 sec (CPU) 4a DumpLoop : 0.004605 sec (CPU) 8a CompStat : 0.003633 sec (CPU) 9a GenDestr : 0.000094 sec (CPU) 9b DumpScrn : 0.004946 sec (CPU) 9c DumpJson : 0.000008 sec (CPU) TOTAL : 0.612425 sec (CPU) TOTAL (123) : 0.572535 sec (CPU) TOTAL (23) : 0.544932 sec (CPU) TOTAL (1) : 0.027603 sec (CPU) TOTAL (2) : 0.099144 sec (CPU) TOTAL (3) : 0.445788 sec *********************************************************************** ----------------------------------------------------------------------- TotalTime[Rnd+Rmb+ME] (123)= ( 5.798480e-01 ) sec TotalTime[Rambo+ME] (23)= ( 5.516467e-01 ) sec TotalTime[RndNumGen] (1)= ( 2.820135e-02 ) sec TotalTime[Rambo] (2)= ( 1.050893e-01 ) sec TotalTime[MatrixElems] (3)= ( 4.465573e-01 ) sec ----------------------------------------------------------------------- TotalEventsComputed = 1048576 EvtsPerSec[Rnd+Rmb+ME](123)= ( 1.808364e+06 ) sec^-1 EvtsPerSec[Rmb+ME] (23)= ( 1.900811e+06 ) sec^-1 EvtsPerSec[MatrixElems] (3)= ( 2.348133e+06 ) sec^-1 -----------------------------------------------------------------------
This makes me realise the calculation is clearly wrong: one should add throughputs, not times (the wall time is the same on CPU and GPU!)
Decrease the GPU multiplier from 100 to 70 (itscrd03, with 4 OMP threads). **************************************************************************** (GPU) NumBlocksPerGrid = 16384 (GPU) NumThreadsPerBlock = 32 (GPU) NumIterations = 700 ---------------------------------------------------------------------------- (GPU) FP precision = DOUBLE (GPU) Complex type = THRUST::COMPLEX (GPU) RanNumb memory layout = AOSOA[4] (GPU) Momenta memory layout = AOSOA[4] (GPU) Random number generation = CURAND DEVICE (CUDA code) (GPU) Wavefunction GPU memory = LOCAL ---------------------------------------------------------------------------- (GPU) NumIterations = 700 (GPU) TotalTime[Rnd+Rmb+ME] (123)= ( 5.835859e+00 ) sec (GPU) TotalTime[Rambo+ME] (23)= ( 5.378664e+00 ) sec (GPU) TotalTime[RndNumGen] (1)= ( 4.571957e-01 ) sec (GPU) TotalTime[Rambo] (2)= ( 4.823339e+00 ) sec (GPU) TotalTime[MatrixElems] (3)= ( 5.553249e-01 ) sec (GPU) MeanTimeInMatrixElems = ( 7.933214e-04 ) sec (GPU) [Min,Max]TimeInMatrixElems = [ 7.126600e-04 , 1.121232e-02 ] sec ---------------------------------------------------------------------------- (GPU) TotalEventsComputed = 367001600 (nan=0) (GPU) EvtsPerSec[Rnd+Rmb+ME](123)= ( 6.288733e+07 ) sec^-1 (GPU) EvtsPerSec[Rmb+ME] (23)= ( 6.823286e+07 ) sec^-1 (GPU) EvtsPerSec[MatrixElems] (3)= ( 6.608772e+08 ) sec^-1 **************************************************************************** (GPU) NumMatrixElements(notNan) = 367001600 (GPU) MeanMatrixElemValue = ( 1.371705e-02 +- 4.280686e-07 ) GeV^0 (GPU) [Min,Max]MatrixElemValue = [ 6.071582e-03 , 3.374926e-02 ] GeV^0 (GPU) StdDevMatrixElemValue = ( 8.200632e-03 ) GeV^0 (GPU) MeanWeight = ( 4.515827e-01 +- 0.000000e+00 ) (GPU) [Min,Max]Weight = [ 4.515827e-01 , 4.515827e-01 ] (GPU) StdDevWeight = ( 0.000000e+00 ) **************************************************************************** (GPU) 00 CudaFree : 1.040999 sec (GPU) 0a ProcInit : 0.000255 sec (GPU) 0b MemAlloc : 0.043748 sec (GPU) 0c GenCreat : 0.035068 sec (GPU) 0d SGoodHel : 0.001759 sec (GPU) 1a GenSeed : 0.005864 sec (GPU) 1b GenRnGen : 0.451332 sec (GPU) 2a RamboIni : 0.010404 sec (GPU) 2b RamboFin : 0.008416 sec (GPU) 2c CpDTHwgt : 0.369433 sec (GPU) 2d CpDTHmom : 4.435089 sec (GPU) 3a SigmaKin : 0.010518 sec (GPU) 3b CpDTHmes : 0.544807 sec (GPU) 4a DumpLoop : 2.404587 sec (GPU) 8a CompStat : 2.632091 sec (GPU) 9a GenDestr : 0.000196 sec (GPU) 9b DumpScrn : 0.000071 sec (GPU) 9c DumpJson : 0.000007 sec (GPU) TOTAL : 11.994643 sec (GPU) TOTAL (123) : 5.835862 sec (GPU) TOTAL (23) : 5.378666 sec (GPU) TOTAL (1) : 0.457196 sec (GPU) TOTAL (2) : 4.823342 sec (GPU) TOTAL (3) : 0.555325 sec **************************************************************************** **************************************************************************** (CPU) NumBlocksPerGrid = 16384 (CPU) NumThreadsPerBlock = 32 (CPU) NumIterations = 10 ---------------------------------------------------------------------------- (CPU) FP precision = DOUBLE (CPU) Complex type = STD::COMPLEX (CPU) RanNumb memory layout = AOSOA[4] (CPU) Momenta memory layout = AOSOA[4] (CPU) Random number generation = CURAND (C++ code) (CPU) OMP threads / maxthreads = 4 / 4 ---------------------------------------------------------------------------- (CPU) NumIterations = 10 (CPU) TotalTime[Rnd+Rmb+ME] (123)= ( 5.245397e+00 ) sec (CPU) TotalTime[Rambo+ME] (23)= ( 4.964864e+00 ) sec (CPU) TotalTime[RndNumGen] (1)= ( 2.805332e-01 ) sec (CPU) TotalTime[Rambo] (2)= ( 1.005407e+00 ) sec (CPU) TotalTime[MatrixElems] (3)= ( 3.959456e+00 ) sec (CPU) MeanTimeInMatrixElems = ( 3.959456e-01 ) sec (CPU) [Min,Max]TimeInMatrixElems = [ 2.837697e-01 , 4.298637e-01 ] sec ---------------------------------------------------------------------------- (CPU) TotalEventsComputed = 5242880 (nan=0) (CPU) EvtsPerSec[Rnd+Rmb+ME](123)= ( 9.995201e+05 ) sec^-1 (CPU) EvtsPerSec[Rmb+ME] (23)= ( 1.055997e+06 ) sec^-1 (CPU) EvtsPerSec[MatrixElems] (3)= ( 1.324141e+06 ) sec^-1 **************************************************************************** (CPU) NumMatrixElements(notNan) = 5242880 (CPU) MeanMatrixElemValue = ( 1.372304e-02 +- 3.581814e-06 ) GeV^0 (CPU) [Min,Max]MatrixElemValue = [ 6.071582e-03 , 3.374925e-02 ] GeV^0 (CPU) StdDevMatrixElemValue = ( 8.201400e-03 ) GeV^0 (CPU) MeanWeight = ( 4.515827e-01 +- 0.000000e+00 ) (CPU) [Min,Max]Weight = [ 4.515827e-01 , 4.515827e-01 ] (CPU) StdDevWeight = ( 0.000000e+00 ) **************************************************************************** (CPU) 0a ProcInit : 0.015581 sec (CPU) 0b MemAlloc : 0.028460 sec (CPU) 0c GenCreat : 0.000978 sec (CPU) 1a GenSeed : 0.000084 sec (CPU) 1b GenRnGen : 0.280449 sec (CPU) 2a RamboIni : 0.072770 sec (CPU) 2b RamboFin : 0.932637 sec (CPU) 3a SigmaKin : 3.959456 sec (CPU) 4a DumpLoop : 0.034727 sec (CPU) 8a CompStat : 0.038073 sec (CPU) 9a GenDestr : 0.000141 sec (CPU) 9b DumpScrn : 0.061204 sec (CPU) 9c DumpJson : 0.000011 sec (CPU) TOTAL : 5.424570 sec (CPU) TOTAL (123) : 5.245397 sec (CPU) TOTAL (23) : 4.964864 sec (CPU) TOTAL (1) : 0.280533 sec (CPU) TOTAL (2) : 1.005407 sec (CPU) TOTAL (3) : 3.959456 sec **************************************************************************** ---------------------------------------------------------------------------- (GPU) TotalEventsComputed = 367001600 (GPU) TotalTime[Rnd+Rmb+ME] (123)= ( 5.835859e+00 ) sec (GPU) TotalTime[Rambo+ME] (23)= ( 5.378664e+00 ) sec (GPU) TotalTime[RndNumGen] (1)= ( 4.571957e-01 ) sec (GPU) TotalTime[Rambo] (2)= ( 4.823339e+00 ) sec (GPU) TotalTime[MatrixElems] (3)= ( 5.553249e-01 ) sec ---------------------------------------------------------------------------- (GPU) TotalEventsComputed = 367001600 (GPU) EvtsPerSec[Rnd+Rmb+ME](123)= ( 6.288733e+07 ) sec^-1 (GPU) EvtsPerSec[Rmb+ME] (23)= ( 6.823286e+07 ) sec^-1 (GPU) EvtsPerSec[MatrixElems] (3)= ( 6.608772e+08 ) sec^-1 **************************************************************************** ---------------------------------------------------------------------------- (CPU) TotalEventsComputed = 5242880 (CPU) TotalTime[Rnd+Rmb+ME] (123)= ( 5.245397e+00 ) sec (CPU) TotalTime[Rambo+ME] (23)= ( 4.964864e+00 ) sec (CPU) TotalTime[RndNumGen] (1)= ( 2.805332e-01 ) sec (CPU) TotalTime[Rambo] (2)= ( 1.005407e+00 ) sec (CPU) TotalTime[MatrixElems] (3)= ( 3.959456e+00 ) sec ---------------------------------------------------------------------------- (CPU) TotalEventsComputed = 5242880 (CPU) EvtsPerSec[Rnd+Rmb+ME](123)= ( 9.995201e+05 ) sec^-1 (CPU) EvtsPerSec[Rmb+ME] (23)= ( 1.055997e+06 ) sec^-1 (CPU) EvtsPerSec[MatrixElems] (3)= ( 1.324141e+06 ) sec^-1 **************************************************************************** (HET) TotalEventsComputed = 372244480 (HET) EvtsPerSec[Rnd+Rmb+ME](123)= ( 6.388685e+07 ) sec^-1 (HET) EvtsPerSec[Rmb+ME] (23)= ( 6.928886e+07 ) sec^-1 (HET) EvtsPerSec[MatrixElems] (3)= ( 6.622013e+08 ) sec^-1 ****************************************************************************
Fix conflicts in epoch1/cuda/ee_mumu/SubProcesses/P1_Sigma_sm_epem_mupmum/check.cc
Fix conflicts in epoch1/cuda/ee_mumu/SubProcesses/Makefile
Fix conflicts in epoch1/cuda/ee_mumu/SubProcesses/P1_Sigma_sm_epem_mupmum/check.cc
------------------------------------------------------------------------- (CPU) Process = EPOCH1_EEMUMU_CPP (CPU) OMP threads / `nproc --all` = 1 / 4 (CPU) EvtsPerSec[MatrixElems] (3) = ( 1.116806e+06 ) sec^-1 (CPU) MeanMatrixElemValue = ( 1.372152e-02 +- 3.269516e-06 ) GeV^0 (CPU) TOTAL : 8.151754 sec real 0m8.182s ------------------------------------------------------------------------- (GPU) Process = EPOCH1_EEMUMU_CUDA (GPU) EvtsPerSec[MatrixElems] (3) = ( 6.019693e+08 ) sec^-1 (GPU) MeanMatrixElemValue = ( 1.372152e-02 +- 3.269516e-06 ) GeV^0 (GPU) TOTAL : 1.052835 sec real 0m1.363s ==PROF== Profiling "_ZN5gProc8sigmaKinEPKdPd": launch__registers_per_thread 166 ------------------------------------------------------------------------- (GPU) Process = EPOCH1_EEMUMU_CUDA (GPU) EvtsPerSec[MatrixElems] (3) = ( 6.260430e+08 ) sec^-1 (GPU) MeanMatrixElemValue = ( 1.371704e-02 +- 3.852995e-07 ) GeV^0 (GPU) TOTAL : 22.669111 sec (CPU) Process = EPOCH1_EEMUMU_CPP (CPU) OMP threads / `nproc --all` = 4 / 4 (CPU) EvtsPerSec[MatrixElems] (3) = ( 2.969506e+06 ) sec^-1 (CPU) MeanMatrixElemValue = ( 1.372152e-02 +- 3.269516e-06 ) GeV^0 (CPU) TOTAL : 4.769517 sec (GPU) EvtsPerSec[MatrixElems] (3) = ( 6.260430e+08 ) sec^-1 (CPU) OMP threads / `nproc --all` = 1 / 4 (CPU) EvtsPerSec[MatrixElems] (3) = ( 2.969506e+06 ) sec^-1 (HET) EvtsPerSec[MatrixElems] (3) = ( 6.290125e+08 ) sec^-1 real 0m23.555s ------------------------------------------------------------------------- Process = EPOCH2_EEMUMU_CPP OMP threads / `nproc --all` = 1 / 4 EvtsPerSec[MatrixElems] (3) = ( 1.104355e+06 ) sec^-1 MeanMatrixElemValue = ( 1.372152e-02 +- 3.269516e-06 ) GeV^0 TOTAL : 8.228827 sec real 0m8.260s ------------------------------------------------------------------------- Process = EPOCH2_EEMUMU_CUDA EvtsPerSec[MatrixElems] (3) = ( 6.103946e+08 ) sec^-1 MeanMatrixElemValue = ( 1.372152e-02 +- 3.269516e-06 ) GeV^0 TOTAL : 1.022151 sec real 0m1.334s ==PROF== Profiling "_ZN5gProc8sigmaKinEPKdPd": launch__registers_per_thread 164 -------------------------------------------------------------------------
------------------------------------------------------------------------- (CPU) Process = EPOCH1_EEMUMU_CPP (CPU) OMP threads / `nproc --all` = 1 / 4 (CPU) EvtsPerSec[MatrixElems] (3) = ( 1.130270e+06 ) sec^-1 (CPU) MeanMatrixElemValue = ( 1.372152e-02 +- 3.269516e-06 ) GeV^0 (CPU) TOTAL : 8.087486 sec real 0m8.117s ------------------------------------------------------------------------- (GPU) Process = EPOCH1_EEMUMU_CUDA (GPU) EvtsPerSec[MatrixElems] (3) = ( 6.156074e+08 ) sec^-1 (GPU) MeanMatrixElemValue = ( 1.372152e-02 +- 3.269516e-06 ) GeV^0 (GPU) TOTAL : 1.050187 sec real 0m1.360s ==PROF== Profiling "_ZN5gProc8sigmaKinEPKdPd": launch__registers_per_thread 164 ------------------------------------------------------------------------- (GPU) Process = EPOCH1_EEMUMU_CUDA (GPU) EvtsPerSec[MatrixElems] (3) = ( 6.743147e+08 ) sec^-1 (GPU) MeanMatrixElemValue = ( 1.371704e-02 +- 3.852995e-07 ) GeV^0 (GPU) TOTAL : 21.294849 sec (CPU) Process = EPOCH1_EEMUMU_CPP (CPU) OMP threads / `nproc --all` = 4 / 4 (CPU) EvtsPerSec[MatrixElems] (3) = ( 2.983748e+06 ) sec^-1 (CPU) MeanMatrixElemValue = ( 1.372152e-02 +- 3.269516e-06 ) GeV^0 (CPU) TOTAL : 4.765584 sec (GPU) EvtsPerSec[MatrixElems] (3) = ( 6.743147e+08 ) sec^-1 (CPU) OMP threads / `nproc --all` = 1 / 4 (CPU) EvtsPerSec[MatrixElems] (3) = ( 2.983748e+06 ) sec^-1 (HET) EvtsPerSec[MatrixElems] (3) = ( 6.772985e+08 ) sec^-1 real 0m22.174s ------------------------------------------------------------------------- Process = EPOCH2_EEMUMU_CPP OMP threads / `nproc --all` = 1 / 4 EvtsPerSec[MatrixElems] (3) = ( 1.106797e+06 ) sec^-1 MeanMatrixElemValue = ( 1.372152e-02 +- 3.269516e-06 ) GeV^0 TOTAL : 8.228164 sec real 0m8.258s ------------------------------------------------------------------------- Process = EPOCH2_EEMUMU_CUDA EvtsPerSec[MatrixElems] (3) = ( 6.071362e+08 ) sec^-1 MeanMatrixElemValue = ( 1.372152e-02 +- 3.269516e-06 ) GeV^0 TOTAL : 0.831403 sec real 0m1.147s ==PROF== Profiling "_ZN5gProc8sigmaKinEPKdPd": launch__registers_per_thread 164 -------------------------------------------------------------------------
(Fix conflict in ep1 timermap.h: move het version to SubProcesses and link it) BASELINE PERFORMANCE for hetep12 before merging ep2to2ep1 to master: ------------------------------------------------------------------------- (CPU) Process = EPOCH1_EEMUMU_CPP (CPU) OMP threads / `nproc --all` = 1 / 4 (CPU) EvtsPerSec[MatrixElems] (3) = ( 1.133133e+06 ) sec^-1 (CPU) MeanMatrixElemValue = ( 1.372152e-02 +- 3.269516e-06 ) GeV^0 (CPU) TOTAL : 8.049342 sec real 0m8.076s ------------------------------------------------------------------------- (CPU) Process = EPOCH1_EEMUMU_CPP (CPU) OMP threads / `nproc --all` = 4 / 4 (CPU) EvtsPerSec[MatrixElems] (3) = ( 4.470283e+06 ) sec^-1 (CPU) MeanMatrixElemValue = ( 1.372152e-02 +- 3.269516e-06 ) GeV^0 (CPU) TOTAL : 3.925186 sec real 0m3.952s ------------------------------------------------------------------------- (GPU) Process = EPOCH1_EEMUMU_CUDA (GPU) EvtsPerSec[MatrixElems] (3) = ( 6.625980e+08 ) sec^-1 (GPU) MeanMatrixElemValue = ( 1.372152e-02 +- 3.269516e-06 ) GeV^0 (GPU) TOTAL : 0.787367 sec real 0m1.095s ==PROF== Profiling "_ZN5gProc8sigmaKinEPKdPd": launch__registers_per_thread 164 ------------------------------------------------------------------------- (GPU) Process = EPOCH1_EEMUMU_CUDA (GPU) EvtsPerSec[MatrixElems] (3) = ( 7.323155e+08 ) sec^-1 (GPU) MeanMatrixElemValue = ( 1.371704e-02 +- 3.852995e-07 ) GeV^0 (GPU) TOTAL : 20.614412 sec (CPU) Process = EPOCH1_EEMUMU_CPP (CPU) OMP threads / `nproc --all` = 4 / 4 (CPU) EvtsPerSec[MatrixElems] (3) = ( 2.955639e+06 ) sec^-1 (CPU) MeanMatrixElemValue = ( 1.372152e-02 +- 3.269516e-06 ) GeV^0 (CPU) TOTAL : 4.757736 sec (GPU) EvtsPerSec[MatrixElems] (3) = ( 7.323155e+08 ) sec^-1 (CPU) OMP threads / `nproc --all` = 1 / 4 (CPU) EvtsPerSec[MatrixElems] (3) = ( 2.955639e+06 ) sec^-1 (HET) EvtsPerSec[MatrixElems] (3) = ( 7.352711e+08 ) sec^-1 real 0m21.461s ------------------------------------------------------------------------- Process = EPOCH2_EEMUMU_CPP OMP threads / `nproc --all` = 1 / 4 EvtsPerSec[MatrixElems] (3) = ( 1.131501e+06 ) sec^-1 MeanMatrixElemValue = ( 1.372152e-02 +- 3.269516e-06 ) GeV^0 TOTAL : 8.065051 sec real 0m8.092s ------------------------------------------------------------------------- Process = EPOCH2_EEMUMU_CPP OMP threads / `nproc --all` = 4 / 4 EvtsPerSec[MatrixElems] (3) = ( 4.493152e+06 ) sec^-1 MeanMatrixElemValue = ( 1.372152e-02 +- 3.269516e-06 ) GeV^0 TOTAL : 3.885839 sec real 0m3.912s ------------------------------------------------------------------------- Process = EPOCH2_EEMUMU_CUDA EvtsPerSec[MatrixElems] (3) = ( 6.856494e+08 ) sec^-1 MeanMatrixElemValue = ( 1.372152e-02 +- 3.269516e-06 ) GeV^0 TOTAL : 1.172856 sec real 0m1.480s ==PROF== Profiling "_ZN5gProc8sigmaKinEPKdPd": launch__registers_per_thread 164 -------------------------------------------------------------------------
Member
Author
|
This PR was about het + epoch12 (hetep12). I will close it because I will first merge vectorization, the version based on epoch12 merging (klas2ep12). I am replacing it by draft PR #159, including het + klas2ep12 + epoch12 (hetklas). |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This merges together het #87 and epoch12 #151.
It will replace #87. Open this as WIP.