Skip to content

[epoch1] Enable C++-only builds on a system which does not have nvcc#133

Merged
valassi merged 2 commits into
madgraph5:masterfrom
valassi:nocuda
Mar 20, 2021
Merged

[epoch1] Enable C++-only builds on a system which does not have nvcc#133
valassi merged 2 commits into
madgraph5:masterfrom
valassi:nocuda

Conversation

@valassi
Copy link
Copy Markdown
Member

@valassi valassi commented Mar 20, 2021

Enable C++-only builds on a systema which does have nvcc

It is enough to "export CUDA_HOME=invalid" and only the C++ will be built
This enables emulating the CI tests

valassi added 2 commits March 20, 2021 19:56
…ilds)

Fix conflicts in epoch1/cuda/ee_mumu/src/Makefile
Fix conflicts in epoch1/cuda/ee_mumu/src/Makefile

This commit was originally in the klas PR, but bring it forward to build on gcc10.
Note that the gcc10 throughput is 1.15E6 like that of gcc9 at this point

time ./check.exe -p 2048 256 12
***********************************************************************
NumBlocksPerGrid            = 2048
NumThreadsPerBlock          = 256
NumIterations               = 12
-----------------------------------------------------------------------
FP precision                = DOUBLE (nan=0)
Complex type                = STD::COMPLEX
RanNumb memory layout       = AOSOA[4]
Momenta memory layout       = AOSOA[4]
Random number generation    = COMMON RANDOM (C++ code)
OMP threads / `nproc --all` = 1 / 4
MatrixElements compiler     = gcc (GCC) 10.1.0
-----------------------------------------------------------------------
NumberOfEntries             = 12
TotalTime[Rnd+Rmb+ME] (123) = ( 7.579283e+00                 )  sec
TotalTime[Rambo+ME]    (23) = ( 7.421322e+00                 )  sec
TotalTime[RndNumGen]    (1) = ( 1.579617e-01                 )  sec
TotalTime[Rambo]        (2) = ( 1.944444e+00                 )  sec
TotalTime[MatrixElems]  (3) = ( 5.476877e+00                 )  sec
MeanTimeInMatrixElems       = ( 4.564064e-01                 )  sec
[Min,Max]TimeInMatrixElems  = [ 4.562069e-01 ,  4.566677e-01 ]  sec
-----------------------------------------------------------------------
TotalEventsComputed         = 6291456
EvtsPerSec[Rnd+Rmb+ME](123) = ( 8.300859e+05                 )  sec^-1
EvtsPerSec[Rmb+ME]     (23) = ( 8.477541e+05                 )  sec^-1
EvtsPerSec[MatrixElems] (3) = ( 1.148730e+06                 )  sec^-1
***********************************************************************
NumMatrixElements(notNan)   = 6291456
MeanMatrixElemValue         = ( 1.371988e-02 +- 3.269530e-06 )  GeV^0
[Min,Max]MatrixElemValue    = [ 6.071582e-03 ,  3.374925e-02 ]  GeV^0
StdDevMatrixElemValue       = ( 8.200888e-03                 )  GeV^0
MeanWeight                  = ( 4.515827e-01 +- 0.000000e+00 )
[Min,Max]Weight             = [ 4.515827e-01 ,  4.515827e-01 ]
StdDevWeight                = ( 0.000000e+00                 )
***********************************************************************
0a ProcInit :     0.000309 sec
0b MemAlloc :     0.072731 sec
0c GenCreat :     0.000390 sec
1b GenRnGen :     0.157962 sec
2a RamboIni :     0.097856 sec
2b RamboFin :     1.846588 sec
3a SigmaKin :     5.476878 sec
4a DumpLoop :     0.101170 sec
8a CompStat :     0.026309 sec
9a GenDestr :     0.000003 sec
9b DumpScrn :     0.012090 sec
9c DumpJson :     0.000002 sec
TOTAL       :     7.792288 sec
TOTAL (123) :     7.579284 sec
TOTAL  (23) :     7.421322 sec
TOTAL   (1) :     0.157962 sec
TOTAL   (2) :     1.944444 sec
TOTAL   (3) :     5.476878 sec
***********************************************************************
real    0m7.816s
user    0m8.194s
sys     0m0.351s

time ./check.exe -p 2048 256 12
***********************************************************************
NumBlocksPerGrid            = 2048
NumThreadsPerBlock          = 256
NumIterations               = 12
-----------------------------------------------------------------------
FP precision                = DOUBLE (nan=0)
Complex type                = STD::COMPLEX
RanNumb memory layout       = AOSOA[4]
Momenta memory layout       = AOSOA[4]
Random number generation    = COMMON RANDOM (C++ code)
OMP threads / `nproc --all` = 1 / 4
MatrixElements compiler     = gcc (GCC) 9.2.0
-----------------------------------------------------------------------
NumberOfEntries             = 12
TotalTime[Rnd+Rmb+ME] (123) = ( 7.569729e+00                 )  sec
TotalTime[Rambo+ME]    (23) = ( 7.399933e+00                 )  sec
TotalTime[RndNumGen]    (1) = ( 1.697960e-01                 )  sec
TotalTime[Rambo]        (2) = ( 1.943793e+00                 )  sec
TotalTime[MatrixElems]  (3) = ( 5.456140e+00                 )  sec
MeanTimeInMatrixElems       = ( 4.546783e-01                 )  sec
[Min,Max]TimeInMatrixElems  = [ 4.542839e-01 ,  4.554264e-01 ]  sec
-----------------------------------------------------------------------
TotalEventsComputed         = 6291456
EvtsPerSec[Rnd+Rmb+ME](123) = ( 8.311336e+05                 )  sec^-1
EvtsPerSec[Rmb+ME]     (23) = ( 8.502044e+05                 )  sec^-1
EvtsPerSec[MatrixElems] (3) = ( 1.153096e+06                 )  sec^-1
***********************************************************************
NumMatrixElements(notNan)   = 6291456
MeanMatrixElemValue         = ( 1.371988e-02 +- 3.269530e-06 )  GeV^0
[Min,Max]MatrixElemValue    = [ 6.071582e-03 ,  3.374925e-02 ]  GeV^0
StdDevMatrixElemValue       = ( 8.200888e-03                 )  GeV^0
MeanWeight                  = ( 4.515827e-01 +- 0.000000e+00 )
[Min,Max]Weight             = [ 4.515827e-01 ,  4.515827e-01 ]
StdDevWeight                = ( 0.000000e+00                 )
***********************************************************************
0a ProcInit :     0.000370 sec
0b MemAlloc :     0.072272 sec
0c GenCreat :     0.000338 sec
1b GenRnGen :     0.169796 sec
2a RamboIni :     0.095956 sec
2b RamboFin :     1.847837 sec
3a SigmaKin :     5.456140 sec
4a DumpLoop :     0.106385 sec
8a CompStat :     0.026501 sec
9a GenDestr :     0.000006 sec
9b DumpScrn :     0.011202 sec
9c DumpJson :     0.000006 sec
TOTAL       :     7.786809 sec
TOTAL (123) :     7.569729 sec
TOTAL  (23) :     7.399933 sec
TOTAL   (1) :     0.169796 sec
TOTAL   (2) :     1.943793 sec
TOTAL   (3) :     5.456140 sec
***********************************************************************
real    0m7.810s
user    0m8.158s
sys     0m0.374s

Note also that the runTest.exe runs successfully on gcc10
This is how it builds now (in the C++-only build)
/cvmfs/sft.cern.ch/lcg/releases/gcc/10.1.0-6f386/x86_64-centos7/bin/g++  -O3  -std=c++11 -I. -I../../src -I../../../../../tools -I../../../../../test/googletest/googletest/include/ -I../../../../../test/include/  -Wall -Wshadow -Wextra -fopenmp -DMGONGPU_COMMONRAND_ONHOST -ffast-math   -c runTest.cc -o runTest.o
/cvmfs/sft.cern.ch/lcg/releases/gcc/10.1.0-6f386/x86_64-centos7/bin/g++ -o runTest.exe CPPProcess.o runTest.o ../../../../../test/src/MadgraphTest.o  -O3  -std=c++11 -I. -I../../src -I../../../../../tools -I../../../../../test/googletest/googletest/include/ -I../../../../../test/include/  -Wall -Wshadow -Wextra -fopenmp -DMGONGPU_COMMONRAND_ONHOST -ffast-math  -ldl -pthread -L../../lib -lmodel_sm -L../../../../../test/googletest/build/lib// -lgtest -lgtest_main
Note that -lgomp is added only for the cuda build of the test
@valassi
Copy link
Copy Markdown
Member Author

valassi commented Mar 20, 2021

I will self merge. This is part also of klas PR #72 but I want to test gcc10 before merging vectorization, I have an issue in gcc10 to debug.

@valassi valassi changed the title Enable C++-only builds on a systema which does have nvcc Enable C++-only builds on a system which does have nvcc Mar 20, 2021
@valassi valassi merged commit 9855f87 into madgraph5:master Mar 20, 2021
@valassi valassi changed the title Enable C++-only builds on a system which does have nvcc [epoch1] Enable C++-only builds on a system which does not have nvcc Mar 28, 2021
valassi added a commit to valassi/madgraph4gpu that referenced this pull request Mar 28, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant