Skip to content

het + epoch1/epoch2#153

Closed
valassi wants to merge 72 commits into
madgraph5:masterfrom
valassi:hetep12
Closed

het + epoch1/epoch2#153
valassi wants to merge 72 commits into
madgraph5:masterfrom
valassi:hetep12

Conversation

@valassi
Copy link
Copy Markdown
Member

@valassi valassi commented Apr 2, 2021

This merges together het #87 and epoch12 #151.

It will replace #87. Open this as WIP.

./hcheck.exe -p 16384 32 1
***********************************************************************
NumBlocksPerGrid           = 16384
NumThreadsPerBlock         = 32
NumIterations              = 1
-----------------------------------------------------------------------
FP precision               = DOUBLE
Complex type               = THRUST::COMPLEX
RanNumb memory layout      = AOSOA[4]
Momenta memory layout      = AOSOA[4]
Random number generation   = CURAND DEVICE (CUDA code)
Wavefunction GPU memory    = LOCAL
-----------------------------------------------------------------------
NumIterations              = 1
TotalTime[Rnd+Rmb+ME] (123)= ( 7.312938e-03                 )  sec
TotalTime[Rambo+ME]    (23)= ( 6.714818e-03                 )  sec
TotalTime[RndNumGen]    (1)= ( 5.981200e-04                 )  sec
TotalTime[Rambo]        (2)= ( 5.945168e-03                 )  sec
TotalTime[MatrixElems]  (3)= ( 7.696500e-04                 )  sec
MeanTimeInMatrixElems      = ( 7.696500e-04                 )  sec
[Min,Max]TimeInMatrixElems = [ 7.696500e-04 ,  7.696500e-04 ]  sec
-----------------------------------------------------------------------
TotalEventsComputed        = 524288 (nan=0)
EvtsPerSec[Rnd+Rmb+ME](123)= ( 7.169321e+07                 )  sec^-1
EvtsPerSec[Rmb+ME]     (23)= ( 7.807926e+07                 )  sec^-1
EvtsPerSec[MatrixElems] (3)= ( 6.812032e+08                 )  sec^-1
***********************************************************************
NumMatrixElements(notNan)  = 524288
MeanMatrixElemValue        = ( 1.371958e-02 +- 1.132119e-05 )  GeV^0
[Min,Max]MatrixElemValue   = [ 6.071582e-03 ,  3.374915e-02 ]  GeV^0
StdDevMatrixElemValue      = ( 8.197419e-03                 )  GeV^0
MeanWeight                 = ( 4.515827e-01 +- 0.000000e+00 )
[Min,Max]Weight            = [ 4.515827e-01 ,  4.515827e-01 ]
StdDevWeight               = ( 0.000000e+00                 )
***********************************************************************
(GPU) 00 CudaFree :     0.872603 sec
(GPU) 0a ProcInit :     0.000233 sec
(GPU) 0b MemAlloc :     0.035793 sec
(GPU) 0c GenCreat :     0.009814 sec
(GPU) 0d SGoodHel :     0.001759 sec
(GPU) 1a GenSeed  :     0.000009 sec
(GPU) 1b GenRnGen :     0.000589 sec
(GPU) 2a RamboIni :     0.000022 sec
(GPU) 2b RamboFin :     0.000014 sec
(GPU) 2c CpDTHwgt :     0.000502 sec
(GPU) 2d CpDTHmom :     0.005407 sec
(GPU) 3a SigmaKin :     0.000014 sec
(GPU) 3b CpDTHmes :     0.000756 sec
(GPU) 4a DumpLoop :     0.004765 sec
(GPU) 8a CompStat :     0.003540 sec
(GPU) 9a GenDestr :     0.000048 sec
(GPU) 9b DumpScrn :     0.000044 sec
(GPU) 9c DumpJson :     0.000007 sec
(GPU) TOTAL       :     0.935918 sec
(GPU) TOTAL (123) :     0.007313 sec
(GPU) TOTAL  (23) :     0.006715 sec
(GPU) TOTAL   (1) :     0.000598 sec
(GPU) TOTAL   (2) :     0.005945 sec
(GPU) TOTAL   (3) :     0.000770 sec
***********************************************************************
***********************************************************************
NumBlocksPerGrid           = 16384
NumThreadsPerBlock         = 32
NumIterations              = 1
-----------------------------------------------------------------------
FP precision               = DOUBLE
Complex type               = STD::COMPLEX
RanNumb memory layout      = AOSOA[4]
Momenta memory layout      = AOSOA[4]
Random number generation   = CURAND (C++ code)
OMP threads / maxthreads   = 4 / 4
-----------------------------------------------------------------------
NumIterations              = 1
TotalTime[Rnd+Rmb+ME] (123)= ( 5.725351e-01                 )  sec
TotalTime[Rambo+ME]    (23)= ( 5.449318e-01                 )  sec
TotalTime[RndNumGen]    (1)= ( 2.760323e-02                 )  sec
TotalTime[Rambo]        (2)= ( 9.914417e-02                 )  sec
TotalTime[MatrixElems]  (3)= ( 4.457877e-01                 )  sec
MeanTimeInMatrixElems      = ( 4.457877e-01                 )  sec
[Min,Max]TimeInMatrixElems = [ 4.457877e-01 ,  4.457877e-01 ]  sec
-----------------------------------------------------------------------
TotalEventsComputed        = 524288 (nan=0)
EvtsPerSec[Rnd+Rmb+ME](123)= ( 9.157308e+05                 )  sec^-1
EvtsPerSec[Rmb+ME]     (23)= ( 9.621167e+05                 )  sec^-1
EvtsPerSec[MatrixElems] (3)= ( 1.176094e+06                 )  sec^-1
***********************************************************************
NumMatrixElements(notNan)  = 524288
MeanMatrixElemValue        = ( 1.371958e-02 +- 1.132119e-05 )  GeV^0
[Min,Max]MatrixElemValue   = [ 6.071582e-03 ,  3.374915e-02 ]  GeV^0
StdDevMatrixElemValue      = ( 8.197419e-03                 )  GeV^0
MeanWeight                 = ( 4.515827e-01 +- 0.000000e+00 )
[Min,Max]Weight            = [ 4.515827e-01 ,  4.515827e-01 ]
StdDevWeight               = ( 0.000000e+00                 )
***********************************************************************
(CPU) 0a ProcInit :     0.000331 sec
(CPU) 0b MemAlloc :     0.025358 sec
(CPU) 0c GenCreat :     0.000915 sec
(CPU) 1a GenSeed  :     0.000009 sec
(CPU) 1b GenRnGen :     0.027595 sec
(CPU) 2a RamboIni :     0.006872 sec
(CPU) 2b RamboFin :     0.092273 sec
(CPU) 3a SigmaKin :     0.445788 sec
(CPU) 4a DumpLoop :     0.004605 sec
(CPU) 8a CompStat :     0.003633 sec
(CPU) 9a GenDestr :     0.000094 sec
(CPU) 9b DumpScrn :     0.004946 sec
(CPU) 9c DumpJson :     0.000008 sec
(CPU) TOTAL       :     0.612425 sec
(CPU) TOTAL (123) :     0.572535 sec
(CPU) TOTAL  (23) :     0.544932 sec
(CPU) TOTAL   (1) :     0.027603 sec
(CPU) TOTAL   (2) :     0.099144 sec
(CPU) TOTAL   (3) :     0.445788 sec
***********************************************************************
-----------------------------------------------------------------------
TotalTime[Rnd+Rmb+ME] (123)= ( 5.798480e-01                 )  sec
TotalTime[Rambo+ME]    (23)= ( 5.516467e-01                 )  sec
TotalTime[RndNumGen]    (1)= ( 2.820135e-02                 )  sec
TotalTime[Rambo]        (2)= ( 1.050893e-01                 )  sec
TotalTime[MatrixElems]  (3)= ( 4.465573e-01                 )  sec
-----------------------------------------------------------------------
TotalEventsComputed        = 1048576
EvtsPerSec[Rnd+Rmb+ME](123)= ( 1.808364e+06                 )  sec^-1
EvtsPerSec[Rmb+ME]     (23)= ( 1.900811e+06                 )  sec^-1
EvtsPerSec[MatrixElems] (3)= ( 2.348133e+06                 )  sec^-1
-----------------------------------------------------------------------
This makes me realise the calculation is clearly wrong: one should
add throughputs, not times (the wall time is the same on CPU and GPU!)
Decrease the GPU multiplier from 100 to 70 (itscrd03, with 4 OMP threads).

****************************************************************************
(GPU) NumBlocksPerGrid           = 16384
(GPU) NumThreadsPerBlock         = 32
(GPU) NumIterations              = 700
----------------------------------------------------------------------------
(GPU) FP precision               = DOUBLE
(GPU) Complex type               = THRUST::COMPLEX
(GPU) RanNumb memory layout      = AOSOA[4]
(GPU) Momenta memory layout      = AOSOA[4]
(GPU) Random number generation   = CURAND DEVICE (CUDA code)
(GPU) Wavefunction GPU memory    = LOCAL
----------------------------------------------------------------------------
(GPU) NumIterations              = 700
(GPU) TotalTime[Rnd+Rmb+ME] (123)= ( 5.835859e+00                 )  sec
(GPU) TotalTime[Rambo+ME]    (23)= ( 5.378664e+00                 )  sec
(GPU) TotalTime[RndNumGen]    (1)= ( 4.571957e-01                 )  sec
(GPU) TotalTime[Rambo]        (2)= ( 4.823339e+00                 )  sec
(GPU) TotalTime[MatrixElems]  (3)= ( 5.553249e-01                 )  sec
(GPU) MeanTimeInMatrixElems      = ( 7.933214e-04                 )  sec
(GPU) [Min,Max]TimeInMatrixElems = [ 7.126600e-04 ,  1.121232e-02 ]  sec
----------------------------------------------------------------------------
(GPU) TotalEventsComputed        = 367001600 (nan=0)
(GPU) EvtsPerSec[Rnd+Rmb+ME](123)= ( 6.288733e+07                 )  sec^-1
(GPU) EvtsPerSec[Rmb+ME]     (23)= ( 6.823286e+07                 )  sec^-1
(GPU) EvtsPerSec[MatrixElems] (3)= ( 6.608772e+08                 )  sec^-1
****************************************************************************
(GPU) NumMatrixElements(notNan)  = 367001600
(GPU) MeanMatrixElemValue        = ( 1.371705e-02 +- 4.280686e-07 )  GeV^0
(GPU) [Min,Max]MatrixElemValue   = [ 6.071582e-03 ,  3.374926e-02 ]  GeV^0
(GPU) StdDevMatrixElemValue      = ( 8.200632e-03                 )  GeV^0
(GPU) MeanWeight                 = ( 4.515827e-01 +- 0.000000e+00 )
(GPU) [Min,Max]Weight            = [ 4.515827e-01 ,  4.515827e-01 ]
(GPU) StdDevWeight               = ( 0.000000e+00                 )
****************************************************************************
(GPU) 00 CudaFree :     1.040999 sec
(GPU) 0a ProcInit :     0.000255 sec
(GPU) 0b MemAlloc :     0.043748 sec
(GPU) 0c GenCreat :     0.035068 sec
(GPU) 0d SGoodHel :     0.001759 sec
(GPU) 1a GenSeed  :     0.005864 sec
(GPU) 1b GenRnGen :     0.451332 sec
(GPU) 2a RamboIni :     0.010404 sec
(GPU) 2b RamboFin :     0.008416 sec
(GPU) 2c CpDTHwgt :     0.369433 sec
(GPU) 2d CpDTHmom :     4.435089 sec
(GPU) 3a SigmaKin :     0.010518 sec
(GPU) 3b CpDTHmes :     0.544807 sec
(GPU) 4a DumpLoop :     2.404587 sec
(GPU) 8a CompStat :     2.632091 sec
(GPU) 9a GenDestr :     0.000196 sec
(GPU) 9b DumpScrn :     0.000071 sec
(GPU) 9c DumpJson :     0.000007 sec
(GPU) TOTAL       :    11.994643 sec
(GPU) TOTAL (123) :     5.835862 sec
(GPU) TOTAL  (23) :     5.378666 sec
(GPU) TOTAL   (1) :     0.457196 sec
(GPU) TOTAL   (2) :     4.823342 sec
(GPU) TOTAL   (3) :     0.555325 sec
****************************************************************************
****************************************************************************
(CPU) NumBlocksPerGrid           = 16384
(CPU) NumThreadsPerBlock         = 32
(CPU) NumIterations              = 10
----------------------------------------------------------------------------
(CPU) FP precision               = DOUBLE
(CPU) Complex type               = STD::COMPLEX
(CPU) RanNumb memory layout      = AOSOA[4]
(CPU) Momenta memory layout      = AOSOA[4]
(CPU) Random number generation   = CURAND (C++ code)
(CPU) OMP threads / maxthreads   = 4 / 4
----------------------------------------------------------------------------
(CPU) NumIterations              = 10
(CPU) TotalTime[Rnd+Rmb+ME] (123)= ( 5.245397e+00                 )  sec
(CPU) TotalTime[Rambo+ME]    (23)= ( 4.964864e+00                 )  sec
(CPU) TotalTime[RndNumGen]    (1)= ( 2.805332e-01                 )  sec
(CPU) TotalTime[Rambo]        (2)= ( 1.005407e+00                 )  sec
(CPU) TotalTime[MatrixElems]  (3)= ( 3.959456e+00                 )  sec
(CPU) MeanTimeInMatrixElems      = ( 3.959456e-01                 )  sec
(CPU) [Min,Max]TimeInMatrixElems = [ 2.837697e-01 ,  4.298637e-01 ]  sec
----------------------------------------------------------------------------
(CPU) TotalEventsComputed        = 5242880 (nan=0)
(CPU) EvtsPerSec[Rnd+Rmb+ME](123)= ( 9.995201e+05                 )  sec^-1
(CPU) EvtsPerSec[Rmb+ME]     (23)= ( 1.055997e+06                 )  sec^-1
(CPU) EvtsPerSec[MatrixElems] (3)= ( 1.324141e+06                 )  sec^-1
****************************************************************************
(CPU) NumMatrixElements(notNan)  = 5242880
(CPU) MeanMatrixElemValue        = ( 1.372304e-02 +- 3.581814e-06 )  GeV^0
(CPU) [Min,Max]MatrixElemValue   = [ 6.071582e-03 ,  3.374925e-02 ]  GeV^0
(CPU) StdDevMatrixElemValue      = ( 8.201400e-03                 )  GeV^0
(CPU) MeanWeight                 = ( 4.515827e-01 +- 0.000000e+00 )
(CPU) [Min,Max]Weight            = [ 4.515827e-01 ,  4.515827e-01 ]
(CPU) StdDevWeight               = ( 0.000000e+00                 )
****************************************************************************
(CPU) 0a ProcInit :     0.015581 sec
(CPU) 0b MemAlloc :     0.028460 sec
(CPU) 0c GenCreat :     0.000978 sec
(CPU) 1a GenSeed  :     0.000084 sec
(CPU) 1b GenRnGen :     0.280449 sec
(CPU) 2a RamboIni :     0.072770 sec
(CPU) 2b RamboFin :     0.932637 sec
(CPU) 3a SigmaKin :     3.959456 sec
(CPU) 4a DumpLoop :     0.034727 sec
(CPU) 8a CompStat :     0.038073 sec
(CPU) 9a GenDestr :     0.000141 sec
(CPU) 9b DumpScrn :     0.061204 sec
(CPU) 9c DumpJson :     0.000011 sec
(CPU) TOTAL       :     5.424570 sec
(CPU) TOTAL (123) :     5.245397 sec
(CPU) TOTAL  (23) :     4.964864 sec
(CPU) TOTAL   (1) :     0.280533 sec
(CPU) TOTAL   (2) :     1.005407 sec
(CPU) TOTAL   (3) :     3.959456 sec
****************************************************************************
----------------------------------------------------------------------------
(GPU) TotalEventsComputed        = 367001600
(GPU) TotalTime[Rnd+Rmb+ME] (123)= ( 5.835859e+00                 )  sec
(GPU) TotalTime[Rambo+ME]    (23)= ( 5.378664e+00                 )  sec
(GPU) TotalTime[RndNumGen]    (1)= ( 4.571957e-01                 )  sec
(GPU) TotalTime[Rambo]        (2)= ( 4.823339e+00                 )  sec
(GPU) TotalTime[MatrixElems]  (3)= ( 5.553249e-01                 )  sec
----------------------------------------------------------------------------
(GPU) TotalEventsComputed        = 367001600
(GPU) EvtsPerSec[Rnd+Rmb+ME](123)= ( 6.288733e+07                 )  sec^-1
(GPU) EvtsPerSec[Rmb+ME]     (23)= ( 6.823286e+07                 )  sec^-1
(GPU) EvtsPerSec[MatrixElems] (3)= ( 6.608772e+08                 )  sec^-1
****************************************************************************
----------------------------------------------------------------------------
(CPU) TotalEventsComputed        = 5242880
(CPU) TotalTime[Rnd+Rmb+ME] (123)= ( 5.245397e+00                 )  sec
(CPU) TotalTime[Rambo+ME]    (23)= ( 4.964864e+00                 )  sec
(CPU) TotalTime[RndNumGen]    (1)= ( 2.805332e-01                 )  sec
(CPU) TotalTime[Rambo]        (2)= ( 1.005407e+00                 )  sec
(CPU) TotalTime[MatrixElems]  (3)= ( 3.959456e+00                 )  sec
----------------------------------------------------------------------------
(CPU) TotalEventsComputed        = 5242880
(CPU) EvtsPerSec[Rnd+Rmb+ME](123)= ( 9.995201e+05                 )  sec^-1
(CPU) EvtsPerSec[Rmb+ME]     (23)= ( 1.055997e+06                 )  sec^-1
(CPU) EvtsPerSec[MatrixElems] (3)= ( 1.324141e+06                 )  sec^-1
****************************************************************************
(HET) TotalEventsComputed        = 372244480
(HET) EvtsPerSec[Rnd+Rmb+ME](123)= ( 6.388685e+07                 )  sec^-1
(HET) EvtsPerSec[Rmb+ME]     (23)= ( 6.928886e+07                 )  sec^-1
(HET) EvtsPerSec[MatrixElems] (3)= ( 6.622013e+08                 )  sec^-1
****************************************************************************
Fix conflicts in epoch1/cuda/ee_mumu/SubProcesses/P1_Sigma_sm_epem_mupmum/check.cc
Fix conflicts in epoch1/cuda/ee_mumu/SubProcesses/Makefile
Fix conflicts in epoch1/cuda/ee_mumu/SubProcesses/P1_Sigma_sm_epem_mupmum/check.cc
valassi added 25 commits April 5, 2021 11:04
-------------------------------------------------------------------------
(CPU) Process                     = EPOCH1_EEMUMU_CPP
(CPU) OMP threads / `nproc --all` = 1 / 4
(CPU) EvtsPerSec[MatrixElems] (3) = ( 1.116806e+06                 )  sec^-1
(CPU) MeanMatrixElemValue         = ( 1.372152e-02 +- 3.269516e-06 )  GeV^0
(CPU) TOTAL       :     8.151754 sec
real    0m8.182s
-------------------------------------------------------------------------
(GPU) Process                     = EPOCH1_EEMUMU_CUDA
(GPU) EvtsPerSec[MatrixElems] (3) = ( 6.019693e+08                 )  sec^-1
(GPU) MeanMatrixElemValue         = ( 1.372152e-02 +- 3.269516e-06 )  GeV^0
(GPU) TOTAL       :     1.052835 sec
real    0m1.363s
==PROF== Profiling "_ZN5gProc8sigmaKinEPKdPd": launch__registers_per_thread 166
-------------------------------------------------------------------------
(GPU) Process                     = EPOCH1_EEMUMU_CUDA
(GPU) EvtsPerSec[MatrixElems] (3) = ( 6.260430e+08                 )  sec^-1
(GPU) MeanMatrixElemValue         = ( 1.371704e-02 +- 3.852995e-07 )  GeV^0
(GPU) TOTAL       :    22.669111 sec
(CPU) Process                     = EPOCH1_EEMUMU_CPP
(CPU) OMP threads / `nproc --all` = 4 / 4
(CPU) EvtsPerSec[MatrixElems] (3) = ( 2.969506e+06                 )  sec^-1
(CPU) MeanMatrixElemValue         = ( 1.372152e-02 +- 3.269516e-06 )  GeV^0
(CPU) TOTAL       :     4.769517 sec
(GPU) EvtsPerSec[MatrixElems] (3) = ( 6.260430e+08                 )  sec^-1
(CPU) OMP threads / `nproc --all` = 1 / 4
(CPU) EvtsPerSec[MatrixElems] (3) = ( 2.969506e+06                 )  sec^-1
(HET) EvtsPerSec[MatrixElems] (3) = ( 6.290125e+08                 )  sec^-1
real    0m23.555s
-------------------------------------------------------------------------
Process                     = EPOCH2_EEMUMU_CPP
OMP threads / `nproc --all` = 1 / 4
EvtsPerSec[MatrixElems] (3) = ( 1.104355e+06                 )  sec^-1
MeanMatrixElemValue         = ( 1.372152e-02 +- 3.269516e-06 )  GeV^0
TOTAL       :     8.228827 sec
real    0m8.260s
-------------------------------------------------------------------------
Process                     = EPOCH2_EEMUMU_CUDA
EvtsPerSec[MatrixElems] (3) = ( 6.103946e+08                 )  sec^-1
MeanMatrixElemValue         = ( 1.372152e-02 +- 3.269516e-06 )  GeV^0
TOTAL       :     1.022151 sec
real    0m1.334s
==PROF== Profiling "_ZN5gProc8sigmaKinEPKdPd": launch__registers_per_thread 164
-------------------------------------------------------------------------
-------------------------------------------------------------------------
(CPU) Process                     = EPOCH1_EEMUMU_CPP
(CPU) OMP threads / `nproc --all` = 1 / 4
(CPU) EvtsPerSec[MatrixElems] (3) = ( 1.130270e+06                 )  sec^-1
(CPU) MeanMatrixElemValue         = ( 1.372152e-02 +- 3.269516e-06 )  GeV^0
(CPU) TOTAL       :     8.087486 sec
real    0m8.117s
-------------------------------------------------------------------------
(GPU) Process                     = EPOCH1_EEMUMU_CUDA
(GPU) EvtsPerSec[MatrixElems] (3) = ( 6.156074e+08                 )  sec^-1
(GPU) MeanMatrixElemValue         = ( 1.372152e-02 +- 3.269516e-06 )  GeV^0
(GPU) TOTAL       :     1.050187 sec
real    0m1.360s
==PROF== Profiling "_ZN5gProc8sigmaKinEPKdPd": launch__registers_per_thread 164
-------------------------------------------------------------------------
(GPU) Process                     = EPOCH1_EEMUMU_CUDA
(GPU) EvtsPerSec[MatrixElems] (3) = ( 6.743147e+08                 )  sec^-1
(GPU) MeanMatrixElemValue         = ( 1.371704e-02 +- 3.852995e-07 )  GeV^0
(GPU) TOTAL       :    21.294849 sec
(CPU) Process                     = EPOCH1_EEMUMU_CPP
(CPU) OMP threads / `nproc --all` = 4 / 4
(CPU) EvtsPerSec[MatrixElems] (3) = ( 2.983748e+06                 )  sec^-1
(CPU) MeanMatrixElemValue         = ( 1.372152e-02 +- 3.269516e-06 )  GeV^0
(CPU) TOTAL       :     4.765584 sec
(GPU) EvtsPerSec[MatrixElems] (3) = ( 6.743147e+08                 )  sec^-1
(CPU) OMP threads / `nproc --all` = 1 / 4
(CPU) EvtsPerSec[MatrixElems] (3) = ( 2.983748e+06                 )  sec^-1
(HET) EvtsPerSec[MatrixElems] (3) = ( 6.772985e+08                 )  sec^-1
real    0m22.174s
-------------------------------------------------------------------------
Process                     = EPOCH2_EEMUMU_CPP
OMP threads / `nproc --all` = 1 / 4
EvtsPerSec[MatrixElems] (3) = ( 1.106797e+06                 )  sec^-1
MeanMatrixElemValue         = ( 1.372152e-02 +- 3.269516e-06 )  GeV^0
TOTAL       :     8.228164 sec
real    0m8.258s
-------------------------------------------------------------------------
Process                     = EPOCH2_EEMUMU_CUDA
EvtsPerSec[MatrixElems] (3) = ( 6.071362e+08                 )  sec^-1
MeanMatrixElemValue         = ( 1.372152e-02 +- 3.269516e-06 )  GeV^0
TOTAL       :     0.831403 sec
real    0m1.147s
==PROF== Profiling "_ZN5gProc8sigmaKinEPKdPd": launch__registers_per_thread 164
-------------------------------------------------------------------------
(Fix conflict in ep1 timermap.h: move het version to SubProcesses and link it)

BASELINE PERFORMANCE for hetep12 before merging ep2to2ep1 to master:

-------------------------------------------------------------------------
(CPU) Process                     = EPOCH1_EEMUMU_CPP
(CPU) OMP threads / `nproc --all` = 1 / 4
(CPU) EvtsPerSec[MatrixElems] (3) = ( 1.133133e+06                 )  sec^-1
(CPU) MeanMatrixElemValue         = ( 1.372152e-02 +- 3.269516e-06 )  GeV^0
(CPU) TOTAL       :     8.049342 sec
real    0m8.076s
-------------------------------------------------------------------------
(CPU) Process                     = EPOCH1_EEMUMU_CPP
(CPU) OMP threads / `nproc --all` = 4 / 4
(CPU) EvtsPerSec[MatrixElems] (3) = ( 4.470283e+06                 )  sec^-1
(CPU) MeanMatrixElemValue         = ( 1.372152e-02 +- 3.269516e-06 )  GeV^0
(CPU) TOTAL       :     3.925186 sec
real    0m3.952s
-------------------------------------------------------------------------
(GPU) Process                     = EPOCH1_EEMUMU_CUDA
(GPU) EvtsPerSec[MatrixElems] (3) = ( 6.625980e+08                 )  sec^-1
(GPU) MeanMatrixElemValue         = ( 1.372152e-02 +- 3.269516e-06 )  GeV^0
(GPU) TOTAL       :     0.787367 sec
real    0m1.095s
==PROF== Profiling "_ZN5gProc8sigmaKinEPKdPd": launch__registers_per_thread 164
-------------------------------------------------------------------------
(GPU) Process                     = EPOCH1_EEMUMU_CUDA
(GPU) EvtsPerSec[MatrixElems] (3) = ( 7.323155e+08                 )  sec^-1
(GPU) MeanMatrixElemValue         = ( 1.371704e-02 +- 3.852995e-07 )  GeV^0
(GPU) TOTAL       :    20.614412 sec
(CPU) Process                     = EPOCH1_EEMUMU_CPP
(CPU) OMP threads / `nproc --all` = 4 / 4
(CPU) EvtsPerSec[MatrixElems] (3) = ( 2.955639e+06                 )  sec^-1
(CPU) MeanMatrixElemValue         = ( 1.372152e-02 +- 3.269516e-06 )  GeV^0
(CPU) TOTAL       :     4.757736 sec
(GPU) EvtsPerSec[MatrixElems] (3) = ( 7.323155e+08                 )  sec^-1
(CPU) OMP threads / `nproc --all` = 1 / 4
(CPU) EvtsPerSec[MatrixElems] (3) = ( 2.955639e+06                 )  sec^-1
(HET) EvtsPerSec[MatrixElems] (3) = ( 7.352711e+08                 )  sec^-1
real    0m21.461s
-------------------------------------------------------------------------
Process                     = EPOCH2_EEMUMU_CPP
OMP threads / `nproc --all` = 1 / 4
EvtsPerSec[MatrixElems] (3) = ( 1.131501e+06                 )  sec^-1
MeanMatrixElemValue         = ( 1.372152e-02 +- 3.269516e-06 )  GeV^0
TOTAL       :     8.065051 sec
real    0m8.092s
-------------------------------------------------------------------------
Process                     = EPOCH2_EEMUMU_CPP
OMP threads / `nproc --all` = 4 / 4
EvtsPerSec[MatrixElems] (3) = ( 4.493152e+06                 )  sec^-1
MeanMatrixElemValue         = ( 1.372152e-02 +- 3.269516e-06 )  GeV^0
TOTAL       :     3.885839 sec
real    0m3.912s
-------------------------------------------------------------------------
Process                     = EPOCH2_EEMUMU_CUDA
EvtsPerSec[MatrixElems] (3) = ( 6.856494e+08                 )  sec^-1
MeanMatrixElemValue         = ( 1.372152e-02 +- 3.269516e-06 )  GeV^0
TOTAL       :     1.172856 sec
real    0m1.480s
==PROF== Profiling "_ZN5gProc8sigmaKinEPKdPd": launch__registers_per_thread 164
-------------------------------------------------------------------------
@valassi
Copy link
Copy Markdown
Member Author

valassi commented Apr 9, 2021

This PR was about het + epoch12 (hetep12).

I will close it because I will first merge vectorization, the version based on epoch12 merging (klas2ep12).

I am replacing it by draft PR #159, including het + klas2ep12 + epoch12 (hetklas).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant