Skip to content

Fix throughput metrics and improve float printouts. Merge (fixing conflicts) with @shageboe's common random nubers on host.#46

Merged
valassi merged 18 commits into
madgraph5:masterfrom
valassi:master
Nov 10, 2020
Merged

Fix throughput metrics and improve float printouts. Merge (fixing conflicts) with @shageboe's common random nubers on host.#46
valassi merged 18 commits into
madgraph5:masterfrom
valassi:master

Conversation

@valassi
Copy link
Copy Markdown
Member

@valassi valassi commented Nov 10, 2020

No description provided.

This shows that rambo weights are constant but stats are imprecise
./gcheck.exe -p 16384 32 12
***************************************
NumIterations             = 12
NumThreadsPerBlock        = 32
NumBlocksPerGrid          = 16384
---------------------------------------
FP precision              = DOUBLE (nan=0)
Complex type              = THRUST::COMPLEX
RanNumb memory layout     = AOSOA[4]
Momenta memory layout     = AOSOA[4]
Wavefunction GPU memory   = LOCAL
Curand generation         = DEVICE (CUDA code)
---------------------------------------
NumberOfEntries           = 12
TotalTimeInWaveFuncs      = 9.130210e-03 sec
MeanTimeInWaveFuncs       = 7.608508e-04 sec
StdDevTimeInWaveFuncs     = 4.212643e-06 sec
MinTimeInWaveFuncs        = 7.542860e-04 sec
MaxTimeInWaveFuncs        = 7.693710e-04 sec
---------------------------------------
TotalEventsComputed       = 6291456
RamboEventsPerSec         = 9.074427e+07 sec^-1
MatrixElemEventsPerSec    = 6.890812e+08 sec^-1
***************************************
NumMatrixElements(notNan) = 6291456
MeanMatrixElemValue       = 1.372152e-02 GeV^0
StdErrMatrixElemValue     = 3.269516e-06 GeV^0
StdDevMatrixElemValue     = 8.200854e-03 GeV^0
StdDevMatrixElemValue2    = 8.200854e-03 GeV^0
MinMatrixElemValue        = 6.071582e-03 GeV^0
MaxMatrixElemValue        = 3.374925e-02 GeV^0
MeanWeight                = 4.5158270527384564152e-01
StdErrWeight              = 2.1661651276422132691e-09
StdDevWeight              = 5.4333432436062354412e-06
StdDevWeight2             = 1.5609180615679635308e-11
MinWeight                 = 4.5158270528945482214e-01
MaxWeight                 = 4.5158270528945482214e-01
***************************************
Show more digits also for MEs

./gcheck.exe -p 16384 32 12
***************************************
NumIterations             = 12
NumThreadsPerBlock        = 32
NumBlocksPerGrid          = 16384
---------------------------------------
FP precision              = DOUBLE (nan=0)
Complex type              = THRUST::COMPLEX
RanNumb memory layout     = AOSOA[4]
Momenta memory layout     = AOSOA[4]
Wavefunction GPU memory   = LOCAL
Curand generation         = DEVICE (CUDA code)
---------------------------------------
NumberOfEntries           = 12
TotalTimeInWaveFuncs      = 9.079925e-03 sec
MeanTimeInWaveFuncs       = 7.566604e-04 sec
StdDevTimeInWaveFuncs     = 4.186479e-06 sec
MinTimeInWaveFuncs        = 7.509330e-04 sec
MaxTimeInWaveFuncs        = 7.662980e-04 sec
---------------------------------------
TotalEventsComputed       = 6291456
RamboEventsPerSec         = 9.087096e+07 sec^-1
MatrixElemEventsPerSec    = 6.928974e+08 sec^-1
***************************************
NumMatrixElements(notNan) = 6291456
MeanMatrixElemValue       = 1.3721520473510778401e-02 GeV^0
StdErrMatrixElemValue     = 3.2695162866737335974e-06 GeV^0
StdDevMatrixElemValue     = 8.2008541266635308353e-03 GeV^0
MinMatrixElemValue        = 6.0715820412916253479e-03 GeV^0
MaxMatrixElemValue        = 3.3749252230265099073e-02 GeV^0
MeanWeight                = 4.5158270527384564152e-01
StdErrWeight              = 6.2230676776297664265e-15
StdDevWeight              = 1.5609180615679635308e-11
MinWeight                 = 4.5158270528945482214e-01
MaxWeight                 = 4.5158270528945482214e-01
***************************************
./gcheck.exe -p 16384 32 12
***************************************
NumIterations             = 12
NumThreadsPerBlock        = 32
NumBlocksPerGrid          = 16384
---------------------------------------
FP precision              = DOUBLE (nan=0)
Complex type              = THRUST::COMPLEX
RanNumb memory layout     = AOSOA[4]
Momenta memory layout     = AOSOA[4]
Wavefunction GPU memory   = LOCAL
Curand generation         = DEVICE (CUDA code)
---------------------------------------
NumberOfEntries           = 12
TotalTimeInWaveFuncs      = 9.093331e-03 sec
MeanTimeInWaveFuncs       = 7.577776e-04 sec
StdDevTimeInWaveFuncs     = 6.006318e-06 sec
MinTimeInWaveFuncs        = 7.503750e-04 sec
MaxTimeInWaveFuncs        = 7.744000e-04 sec
---------------------------------------
TotalEventsComputed       = 6291456
RamboEventsPerSec         = 9.071686e+07 sec^-1
MatrixElemEventsPerSec    = 6.918758e+08 sec^-1
***************************************
NumMatrixElements(notNan) = 6291456
MeanMatrixElemValue       = 1.3721520473510778401e-02 GeV^0
MeanMatrixElemValue2      = 1.3721520473511723825e-02 GeV^0
StdErrMatrixElemValue     = 3.2695162866737335974e-06 GeV^0
StdDevMatrixElemValue     = 8.2008541266635308353e-03 GeV^0
StdDevMatrixElemValue2    = 8.2008541266635221617e-03 GeV^0
MinMatrixElemValue        = 6.0715820412916253479e-03 GeV^0
MaxMatrixElemValue        = 3.3749252230265099073e-02 GeV^0
MeanWeight                = 4.5158270527384564152e-01
MeanWeight2               = 4.5158270528945482214e-01
StdErrWeight              = 6.2230676776297664265e-15
StdDevWeight              = 1.5609180615679635308e-11
StdDevWeight2             = 0.0000000000000000000e+00
MinWeight                 = 4.5158270528945482214e-01
MaxWeight                 = 4.5158270528945482214e-01
***************************************
./gcheck.exe -p 16384 32 12
***************************************
NumIterations             = 12
NumThreadsPerBlock        = 32
NumBlocksPerGrid          = 16384
---------------------------------------
FP precision              = DOUBLE (nan=0)
Complex type              = THRUST::COMPLEX
RanNumb memory layout     = AOSOA[4]
Momenta memory layout     = AOSOA[4]
Wavefunction GPU memory   = LOCAL
Curand generation         = DEVICE (CUDA code)
---------------------------------------
NumberOfEntries           = 12
TotalTimeInWaveFuncs      = 9.172111e-03 sec
MeanTimeInWaveFuncs       = 7.643426e-04 sec
StdDevTimeInWaveFuncs     = 5.285693e-06 sec
MinTimeInWaveFuncs        = 7.581960e-04 sec
MaxTimeInWaveFuncs        = 7.783100e-04 sec
---------------------------------------
TotalEventsComputed       = 6291456
RamboEventsPerSec         = 9.074976e+07 sec^-1
MatrixElemEventsPerSec    = 6.859333e+08 sec^-1
***************************************
NumMatrixElements(notNan) = 6291456
MeanMatrixElemValue       = 1.3721520473511723825e-02 GeV^0
StdErrMatrixElemValue     = 3.2695162866737302093e-06 GeV^0
StdDevMatrixElemValue     = 8.2008541266635221617e-03 GeV^0
MinMatrixElemValue        = 6.0715820412916253479e-03 GeV^0
MaxMatrixElemValue        = 3.3749252230265099073e-02 GeV^0
MeanWeight                = 4.5158270528945482214e-01
StdErrWeight              = 0.0000000000000000000e+00
StdDevWeight              = 0.0000000000000000000e+00
MinWeight                 = 4.5158270528945482214e-01
MaxWeight                 = 4.5158270528945482214e-01
***************************************
./gcheck.exe -p 16384 32 12
***************************************
NumIterations             = 12
NumThreadsPerBlock        = 32
NumBlocksPerGrid          = 16384
---------------------------------------
FP precision              = DOUBLE (nan=0)
Complex type              = THRUST::COMPLEX
RanNumb memory layout     = AOSOA[4]
Momenta memory layout     = AOSOA[4]
Wavefunction GPU memory   = LOCAL
Curand generation         = DEVICE (CUDA code)
---------------------------------------
NumberOfEntries           = 12
TotalTimeInWaveFuncs      = 9.176586e-03 sec
MeanTimeInWaveFuncs       = 7.647155e-04 sec
StdDevTimeInWaveFuncs     = 1.150358e-05 sec
MinTimeInWaveFuncs        = 7.537280e-04 sec
MaxTimeInWaveFuncs        = 8.012200e-04 sec
---------------------------------------
TotalEventsComputed       = 6291456
RamboEventsPerSec         = 9.072343e+07 sec^-1
MatrixElemEventsPerSec    = 6.855988e+08 sec^-1
***************************************
NumMatrixElements(notNan) = 6291456
MeanMatrixElemValue       = 1.372152e-02 GeV^0
StdErrMatrixElemValue     = 3.269516e-06 GeV^0
StdDevMatrixElemValue     = 8.200854e-03 GeV^0
MinMatrixElemValue        = 6.071582e-03 GeV^0
MaxMatrixElemValue        = 3.374925e-02 GeV^0
MeanWeight                = 4.515827e-01
StdErrWeight              = 0.000000e+00
StdDevWeight              = 0.000000e+00
MinWeight                 = 4.515827e-01
MaxWeight                 = 4.515827e-01
***************************************
./gcheck.exe -p 16384 32 12
***************************************
NumIterations             = 12
NumThreadsPerBlock        = 32
NumBlocksPerGrid          = 16384
---------------------------------------
FP precision              = DOUBLE (nan=0)
Complex type              = THRUST::COMPLEX
RanNumb memory layout     = AOSOA[4]
Momenta memory layout     = AOSOA[4]
Wavefunction GPU memory   = LOCAL
Curand generation         = DEVICE (CUDA code)
---------------------------------------
NumberOfEntries           = 12
TotalTimeInWaveFuncs      = 9.178543e-03 sec
MeanTimeInWaveFuncs       = 7.648786e-04 sec
StdDevTimeInWaveFuncs     = 6.406818e-06 sec
MinTimeInWaveFuncs        = 7.579180e-04 sec
MaxTimeInWaveFuncs        = 7.836200e-04 sec
---------------------------------------
TotalEventsComputed       = 6291456
RamboEventsPerSec         = 9.022583e+07 sec^-1
MatrixElemEventsPerSec    = 6.854526e+08 sec^-1
***************************************
NumMatrixElements(notNan) = 6291456
MeanMatrixElemValue       = 1.372152e-02 GeV^0
StdErrMatrixElemValue     = 3.269516e-06 GeV^0
StdDevMatrixElemValue     = 8.200854e-03 GeV^0
MinMatrixElemValue        = 6.071582e-03 GeV^0
MaxMatrixElemValue        = 3.374925e-02 GeV^0
MeanWeight                = 4.515827e-01
StdErrWeight              = 0.000000e+00
StdDevWeight              = 0.000000e+00
MinWeight                 = 4.515827e-01
MaxWeight                 = 4.515827e-01
***************************************
00 CudaFree :     0.867191 sec
0a ProcInit :     0.000441 sec
0b MemAlloc :     0.060771 sec
0c GenCreat :     0.009921 sec
0d SGoodHel :     0.001757 sec
1a GenSeed  :     0.000105 sec
1b GenRnGen :     0.007690 sec
2a RamboIni :     0.000168 sec
2b RamboFin :     0.000134 sec
2c CpDTHwgt :     0.006067 sec
2d CpDTHmom :     0.063361 sec
3a SigmaKin :     0.000163 sec
3b CpDTHmes :     0.009016 sec
4a DumpLoop :     0.023936 sec
8a CompStat :     0.045722 sec
9a GenDestr :     0.000055 sec
9b MemFree  :     0.012854 sec
9c CudReset :     0.049048 sec
9d DumpScrn :     0.000213 sec
9e DumpJson :     0.000009 sec
TOTAL       :     1.158621 sec
TOTAL(123)  :     0.086704 sec
TOTAL(23)   :     0.078909 sec
TOTAL(3)    :     0.009179 sec
***************************************
./gcheck.exe -p 16384 32 12
***********************************************************************
NumBlocksPerGrid           = 16384
NumThreadsPerBlock         = 32
NumIterations              = 12
-----------------------------------------------------------------------
FP precision               = DOUBLE (nan=0)
Complex type               = THRUST::COMPLEX
RanNumb memory layout      = AOSOA[4]
Momenta memory layout      = AOSOA[4]
Wavefunction GPU memory    = LOCAL
Curand generation          = DEVICE (CUDA code)
-----------------------------------------------------------------------
NumberOfEntries            = 12
TotalTimeIn[Rambo+ME]  (23)= ( 7.899636e-02                 )  sec
TotalTimeInRambo        (2)= ( 6.981027e-02                 )  sec
TotalTimeInMatrixElems  (3)= ( 9.186086e-03                 )  sec
MeanTimeInMatrixElems      = ( 7.655072e-04                 )  sec
[Min,Max]TimeInMatrixElems = [ 7.562420e-04 ,  8.084830e-04 ]  sec
-----------------------------------------------------------------------
TotalEventsComputed        = 6291456
[Rambo+ME]EventsPerSec (23)= ( 7.964236e+07                 )  sec^-1
RamboEventsPerSec       (2)= ( 9.012221e+07                 )  sec^-1
MatrixElemEventsPerSec  (3)= ( 6.848897e+08                 )  sec^-1
***********************************************************************
NumMatrixElements(notNan)  = 6291456
MeanMatrixElemValue        = ( 1.372152e-02 +- 3.269516e-06 )  GeV^0
[Min,Max]MatrixElemValue   = [ 6.071582e-03 ,  3.374925e-02 ]  GeV^0
StdDevMatrixElemValue      = ( 8.200854e-03                 )  GeV^0
MeanWeight                 = ( 4.515827e-01 +- 0.000000e+00 )
[Min,Max]Weight            = [ 4.515827e-01 ,  4.515827e-01 ]
StdDevWeight               = ( 0.000000e+00                 )
***********************************************************************
00 CudaFree :     0.839338 sec
0a ProcInit :     0.000514 sec
0b MemAlloc :     0.061507 sec
0c GenCreat :     0.009911 sec
0d SGoodHel :     0.001748 sec
1a GenSeed  :     0.000092 sec
1b GenRnGen :     0.007562 sec
2a RamboIni :     0.000170 sec
2b RamboFin :     0.000141 sec
2c CpDTHwgt :     0.006046 sec
2d CpDTHmom :     0.063454 sec
3a SigmaKin :     0.000163 sec
3b CpDTHmes :     0.009023 sec
4a DumpLoop :     0.023754 sec
8a CompStat :     0.045607 sec
9a GenDestr :     0.000053 sec
9b MemFree  :     0.018579 sec
9c CudReset :     0.049453 sec
9d DumpScrn :     0.000190 sec
9e DumpJson :     0.000037 sec
TOTAL       :     1.137343 sec
TOTAL (123) :     0.086651 sec
TOTAL  (23) :     0.078996 sec
TOTAL   (2) :     0.069810 sec
TOTAL   (3) :     0.009186 sec
***********************************************************************
./gcheck.exe -p 16384 32 12
***********************************************************************
NumBlocksPerGrid           = 16384
NumThreadsPerBlock         = 32
NumIterations              = 12
-----------------------------------------------------------------------
FP precision               = DOUBLE (nan=0)
Complex type               = THRUST::COMPLEX
RanNumb memory layout      = AOSOA[4]
Momenta memory layout      = AOSOA[4]
Wavefunction GPU memory    = LOCAL
Curand generation          = DEVICE (CUDA code)
-----------------------------------------------------------------------
NumberOfEntries            = 12
TotTimeIn[Rnd+Rmb+ME] (123)= ( 8.672165e-02                 )  sec
TotTimeIn[Rambo+ME]    (23)= ( 7.902262e-02                 )  sec
TotalTimeInRndNumGen    (1)= ( 7.699024e-03                 )  sec
TotalTimeInRambo        (2)= ( 6.987648e-02                 )  sec
TotalTimeInMatrixElems  (3)= ( 9.146139e-03                 )  sec
MeanTimeInMatrixElems      = ( 7.621783e-04                 )  sec
[Min,Max]TimeInMatrixElems = [ 7.216010e-04 ,  7.903250e-04 ]  sec
-----------------------------------------------------------------------
TotalEventsComputed        = 6291456
[Rnd+Rmb+ME]EvtPerSec (123)= ( 7.254770e+07                 )  sec^-1
[Rmb+ME]EvtPerSec      (23)= ( 7.961589e+07                 )  sec^-1
RndNumbGenEventsPerSec  (1)= ( 8.171758e+08                 )  sec^-1
RamboEventsPerSec       (2)= ( 9.003682e+07                 )  sec^-1
MatrixElemEventsPerSec  (3)= ( 6.878811e+08                 )  sec^-1
***********************************************************************
NumMatrixElements(notNan)  = 6291456
MeanMatrixElemValue        = ( 1.372152e-02 +- 3.269516e-06 )  GeV^0
[Min,Max]MatrixElemValue   = [ 6.071582e-03 ,  3.374925e-02 ]  GeV^0
StdDevMatrixElemValue      = ( 8.200854e-03                 )  GeV^0
MeanWeight                 = ( 4.515827e-01 +- 0.000000e+00 )
[Min,Max]Weight            = [ 4.515827e-01 ,  4.515827e-01 ]
StdDevWeight               = ( 0.000000e+00                 )
***********************************************************************
00 CudaFree :     0.908872 sec
0a ProcInit :     0.000491 sec
0b MemAlloc :     0.060200 sec
0c GenCreat :     0.009608 sec
0d SGoodHel :     0.001755 sec
1a GenSeed  :     0.000094 sec
1b GenRnGen :     0.007605 sec
2a RamboIni :     0.000222 sec
2b RamboFin :     0.000142 sec
2c CpDTHwgt :     0.006007 sec
2d CpDTHmom :     0.063506 sec
3a SigmaKin :     0.000172 sec
3b CpDTHmes :     0.008975 sec
4a DumpLoop :     0.023672 sec
8a CompStat :     0.045530 sec
9a GenDestr :     0.000056 sec
9b MemFree  :     0.012875 sec
9c CudReset :     0.049310 sec
9d DumpScrn :     0.000211 sec
9e DumpJson :     0.000007 sec
TOTAL       :     1.199308 sec
TOTAL (123) :     0.086722 sec
TOTAL  (23) :     0.079023 sec
TOTAL   (1) :     0.007699 sec
TOTAL   (2) :     0.069876 sec
TOTAL   (3) :     0.009146 sec
***********************************************************************
./gcheck.exe -p 16384 32 12
***********************************************************************
NumBlocksPerGrid           = 16384
NumThreadsPerBlock         = 32
NumIterations              = 12
-----------------------------------------------------------------------
FP precision               = DOUBLE (nan=0)
Complex type               = THRUST::COMPLEX
RanNumb memory layout      = AOSOA[4]
Momenta memory layout      = AOSOA[4]
Wavefunction GPU memory    = LOCAL
Curand generation          = DEVICE (CUDA code)
-----------------------------------------------------------------------
NumberOfEntries            = 12
TotalTime[Rnd+Rmb+ME] (123)= ( 8.635483e-02                 )  sec
TotalTime[Rambo+ME]    (23)= ( 7.875219e-02                 )  sec
TotalTime[RndNumGen]    (1)= ( 7.602644e-03                 )  sec
TotalTime[Rambo]        (2)= ( 6.959572e-02                 )  sec
TotalTime[MatrixElems]  (3)= ( 9.156469e-03                 )  sec
MeanTimeInMatrixElems      = ( 7.630391e-04                 )  sec
[Min,Max]TimeInMatrixElems = [ 7.492570e-04 ,  7.967490e-04 ]  sec
-----------------------------------------------------------------------
TotalEventsComputed        = 6291456
EvtsPerSec[Rnd+Rmb+ME](123)= ( 7.285587e+07                 )  sec^-1
EvtsPerSec[Rmb+ME]     (23)= ( 7.988929e+07                 )  sec^-1
EvtsPerSec[MatrixElems] (3)= ( 6.871050e+08                 )  sec^-1
***********************************************************************
NumMatrixElements(notNan)  = 6291456
MeanMatrixElemValue        = ( 1.372152e-02 +- 3.269516e-06 )  GeV^0
[Min,Max]MatrixElemValue   = [ 6.071582e-03 ,  3.374925e-02 ]  GeV^0
StdDevMatrixElemValue      = ( 8.200854e-03                 )  GeV^0
MeanWeight                 = ( 4.515827e-01 +- 0.000000e+00 )
[Min,Max]Weight            = [ 4.515827e-01 ,  4.515827e-01 ]
StdDevWeight               = ( 0.000000e+00                 )
***********************************************************************
00 CudaFree :     0.846373 sec
0a ProcInit :     0.000491 sec
0b MemAlloc :     0.060459 sec
0c GenCreat :     0.009691 sec
0d SGoodHel :     0.001752 sec
1a GenSeed  :     0.000096 sec
1b GenRnGen :     0.007507 sec
2a RamboIni :     0.000230 sec
2b RamboFin :     0.000138 sec
2c CpDTHwgt :     0.005930 sec
2d CpDTHmom :     0.063298 sec
3a SigmaKin :     0.000194 sec
3b CpDTHmes :     0.008962 sec
4a DumpLoop :     0.023790 sec
8a CompStat :     0.045537 sec
9a GenDestr :     0.000056 sec
9b MemFree  :     0.014834 sec
9c CudReset :     0.049248 sec
9d DumpScrn :     0.000268 sec
9e DumpJson :     0.000009 sec
TOTAL       :     1.138863 sec
TOTAL (123) :     0.086355 sec
TOTAL  (23) :     0.078752 sec
TOTAL   (1) :     0.007603 sec
TOTAL   (2) :     0.069596 sec
TOTAL   (3) :     0.009156 sec
***********************************************************************
Fix conflict in examples/gpu/eemumu_AV/SubProcesses/P1_Sigma_sm_epem_mupmum/check.cc
Missing a printout about the type of generation.

./gcheck.exe -p 16384 32 12
***********************************************************************
NumBlocksPerGrid           = 16384
NumThreadsPerBlock         = 32
NumIterations              = 12
-----------------------------------------------------------------------
FP precision               = DOUBLE (nan=0)
Complex type               = THRUST::COMPLEX
RanNumb memory layout      = AOSOA[4]
Momenta memory layout      = AOSOA[4]
Wavefunction GPU memory    = LOCAL
Curand generation          = DEVICE (CUDA code)
-----------------------------------------------------------------------
NumberOfEntries            = 12
TotalTime[Rnd+Rmb+ME] (123)= ( 1.774030e-01                 )  sec
TotalTime[Rambo+ME]    (23)= ( 7.814709e-02                 )  sec
TotalTime[RndNumGen]    (1)= ( 9.925591e-02                 )  sec
TotalTime[Rambo]        (2)= ( 6.879869e-02                 )  sec
TotalTime[MatrixElems]  (3)= ( 9.348396e-03                 )  sec
MeanTimeInMatrixElems      = ( 7.790330e-04                 )  sec
[Min,Max]TimeInMatrixElems = [ 7.526090e-04 ,  9.154800e-04 ]  sec
-----------------------------------------------------------------------
TotalEventsComputed        = 6291456
EvtsPerSec[Rnd+Rmb+ME](123)= ( 3.546420e+07                 )  sec^-1
EvtsPerSec[Rmb+ME]     (23)= ( 8.050787e+07                 )  sec^-1
EvtsPerSec[MatrixElems] (3)= ( 6.729985e+08                 )  sec^-1
***********************************************************************
NumMatrixElements(notNan)  = 6291456
MeanMatrixElemValue        = ( 1.371988e-02 +- 3.269530e-06 )  GeV^0
[Min,Max]MatrixElemValue   = [ 6.071582e-03 ,  3.374925e-02 ]  GeV^0
StdDevMatrixElemValue      = ( 8.200888e-03                 )  GeV^0
MeanWeight                 = ( 4.515827e-01 +- 0.000000e+00 )
[Min,Max]Weight            = [ 4.515827e-01 ,  4.515827e-01 ]
StdDevWeight               = ( 0.000000e+00                 )
***********************************************************************
00 CudaFree :     1.051540 sec
0a ProcInit :     0.000448 sec
0b MemAlloc :     0.094235 sec
0c GenCreat :     0.020022 sec
0d SGoodHel :     0.001745 sec
1a GenSeed  :     0.000104 sec
1b GenRnGen :     0.000096 sec
1c CpHTDrnd :     0.099056 sec
2a RamboIni :     0.000406 sec
2b RamboFin :     0.000156 sec
2c CpDTHwgt :     0.006041 sec
2d CpDTHmom :     0.062195 sec
3a SigmaKin :     0.000228 sec
3b CpDTHmes :     0.009120 sec
4a DumpLoop :     0.036934 sec
8a CompStat :     0.045969 sec
9a GenDestr :     0.000070 sec
9b MemFree  :     0.014242 sec
9c CudReset :     0.049193 sec
9d DumpScrn :     0.000225 sec
9e DumpJson :     0.000009 sec
TOTAL       :     1.492035 sec
TOTAL (123) :     0.177403 sec
TOTAL  (23) :     0.078147 sec
TOTAL   (1) :     0.099256 sec
TOTAL   (2) :     0.068799 sec
TOTAL   (3) :     0.009348 sec
***********************************************************************
./gcheck.exe -p 16384 32 12
***********************************************************************
NumBlocksPerGrid           = 16384
NumThreadsPerBlock         = 32
NumIterations              = 12
-----------------------------------------------------------------------
FP precision               = DOUBLE (nan=0)
Complex type               = THRUST::COMPLEX
RanNumb memory layout      = AOSOA[4]
Momenta memory layout      = AOSOA[4]
Wavefunction GPU memory    = LOCAL
Curand generation          = DEVICE (CUDA code)
-----------------------------------------------------------------------
NumberOfEntries            = 12
TotalTime[Rnd+Rmb+ME] (123)= ( 8.562429e-02                 )  sec
TotalTime[Rambo+ME]    (23)= ( 7.777580e-02                 )  sec
TotalTime[RndNumGen]    (1)= ( 7.848486e-03                 )  sec
TotalTime[Rambo]        (2)= ( 6.864643e-02                 )  sec
TotalTime[MatrixElems]  (3)= ( 9.129369e-03                 )  sec
MeanTimeInMatrixElems      = ( 7.607807e-04                 )  sec
[Min,Max]TimeInMatrixElems = [ 7.526080e-04 ,  7.780310e-04 ]  sec
-----------------------------------------------------------------------
TotalEventsComputed        = 6291456
EvtsPerSec[Rnd+Rmb+ME](123)= ( 7.347747e+07                 )  sec^-1
EvtsPerSec[Rmb+ME]     (23)= ( 8.089220e+07                 )  sec^-1
EvtsPerSec[MatrixElems] (3)= ( 6.891447e+08                 )  sec^-1
***********************************************************************
NumMatrixElements(notNan)  = 6291456
MeanMatrixElemValue        = ( 1.372152e-02 +- 3.269516e-06 )  GeV^0
[Min,Max]MatrixElemValue   = [ 6.071582e-03 ,  3.374925e-02 ]  GeV^0
StdDevMatrixElemValue      = ( 8.200854e-03                 )  GeV^0
MeanWeight                 = ( 4.515827e-01 +- 0.000000e+00 )
[Min,Max]Weight            = [ 4.515827e-01 ,  4.515827e-01 ]
StdDevWeight               = ( 0.000000e+00                 )
***********************************************************************
00 CudaFree :     1.057556 sec
0a ProcInit :     0.000503 sec
0b MemAlloc :     0.069165 sec
0c GenCreat :     0.010020 sec
0d SGoodHel :     0.001751 sec
1a GenSeed  :     0.000138 sec
1b GenRnGen :     0.007627 sec
1c CpHTDrnd :     0.000083 sec
2a RamboIni :     0.000177 sec
2b RamboFin :     0.000181 sec
2c CpDTHwgt :     0.005827 sec
2d CpDTHmom :     0.062461 sec
3a SigmaKin :     0.000158 sec
3b CpDTHmes :     0.008972 sec
4a DumpLoop :     0.023841 sec
8a CompStat :     0.047754 sec
9a GenDestr :     0.000068 sec
9b MemFree  :     0.012738 sec
9c CudReset :     0.049833 sec
9d DumpScrn :     0.000242 sec
9e DumpJson :     0.000029 sec
TOTAL       :     1.359123 sec
TOTAL (123) :     0.085624 sec
TOTAL  (23) :     0.077776 sec
TOTAL   (1) :     0.007848 sec
TOTAL   (2) :     0.068646 sec
TOTAL   (3) :     0.009129 sec
***********************************************************************
@valassi valassi merged commit 14d8626 into madgraph5:master Nov 10, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant