Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
38 commits
Select commit Hold shift + click to select a range
f0210b5
Bug fix - main should return 0
valassi Dec 4, 2020
b71930a
Prepare for heterogeneous executable, part on CPU (c++) part on GPU (…
valassi Dec 4, 2020
2fe543a
Fix comments: build gcheck.exe with nvcc, check.exe with g++ (as alwa…
valassi Dec 4, 2020
71c4e67
First proof of concept of heterogeneous Madgraph: run GPU first and C…
valassi Dec 4, 2020
695327c
Streamline parameter outputs: move architecture-specific parameters t…
valassi Dec 4, 2020
019409e
Improve printout of number of NANs
valassi Dec 4, 2020
e855ab2
Dump to a string first and then show that string
valassi Dec 4, 2020
3448b80
Fix het main (was executing twice CPU!), also put gpu first again
valassi Dec 4, 2020
cf2dc11
Dispaly (GPU) or (CPU) in timer dumps
valassi Dec 4, 2020
dfaa49a
Extract performance stats for both CPU and GPU
valassi Dec 4, 2020
0f5bc4b
Launch in parallel on cpu and gpu on two different threads
valassi Dec 4, 2020
7ceb8d5
Dump timing statistics at the end for CPU+GPU combined throughput
valassi Dec 4, 2020
88c916d
Reenable tests. Fix link by adding -lgomp. Addresses issue #83.
valassi Dec 4, 2020
2939cbe
Minor improvement: remove trailing "/" from TOOLSDIR to avoid double …
valassi Dec 4, 2020
c496d04
Improve Makefile: remove undefined/unneeded CPPFLAGS
valassi Dec 4, 2020
a7d3b6f
(Empty commit) Resync with OMP branch for issue83
valassi Dec 4, 2020
b655e2e
Merge branch 'master' into het
valassi Dec 4, 2020
0733df2
Move check/gcheck definition to check.h header
valassi Dec 4, 2020
2c55e79
Move to multi-line (there may be more arguments coming...)
valassi Dec 4, 2020
7cbff09
Add (CPU) or (GPU) or (HET) tags in front of all statistics dumped
valassi Dec 4, 2020
41ce9b5
Implement a quick hack to give more events to the GPU than the CPU.
valassi Dec 4, 2020
025c89c
Improve stat calculation: sum CPU and GPU throughputs.
valassi Dec 4, 2020
8f6ca66
Merge remote-tracking branch 'upstream/master' into het
valassi Dec 6, 2020
21cedcf
Merge branch 'issue83' into het
valassi Dec 6, 2020
4df7bac
Merge remote-tracking branch 'upstream/master' into het
valassi Dec 9, 2020
61925d8
Merge branch 'issue83' into het
valassi Dec 9, 2020
ca58d31
Improve the printout
valassi Dec 9, 2020
369e0b0
Try to take into account OMP in heterogenous scheduling and printout
valassi Dec 9, 2020
d2c0f1d
Merge branch 'issue83' into het
valassi Dec 9, 2020
f2b867f
Merge remote-tracking branch 'upstream/master' into het
valassi Dec 9, 2020
d9261e7
Merge remote-tracking branch 'upstream/master' into het
valassi Dec 10, 2020
4a92baa
Merge remote-tracking branch 'upstream/master' into het
valassi Dec 11, 2020
6eca62b
Merge remote-tracking branch 'upstream/master' into het
valassi Mar 30, 2021
a81f80c
Merge remote-tracking branch 'upstream/master' into het
valassi Apr 1, 2021
68dea59
Fix unresolved conflicts from previous merge
valassi Apr 1, 2021
52edc92
Further fix for build issues in previous merge
valassi Apr 1, 2021
636c300
Merge remote-tracking branch 'upstream/master' into het
valassi Apr 1, 2021
23c9560
[het] Fix bug in a previous merge: outStream, not std::cout
valassi Apr 2, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 12 additions & 5 deletions epoch1/cuda/ee_mumu/SubProcesses/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,7 @@ ifneq ($(wildcard $(CUDA_HOME)/bin/nvcc),)
###CUFLAGS+= --maxrregcount 64 # degrades throughput: 1.7E8 (16384 32 12) flat at 1.7E8 (65536 128 12)
cu_main = gcheck.exe
cu_objects = gCPPProcess.o
het_main = hcheck.exe
else
# No cuda. Switch cuda compilation off and go to common random numbers in C++
NVCC := $(warning CUDA_HOME is not set or is invalid. Export CUDA_HOME to compile with cuda)
Expand All @@ -62,7 +63,7 @@ ifeq ($(UNAME_P),ppc64le)
CUFLAGS+= -Xcompiler -mno-float128
endif

all: ../../src $(cu_main) $(cxx_main) runTest.exe
all: ../../src $(cu_main) $(cxx_main) $(het_main) runTest.exe

debug: OPTFLAGS = -g -O0 -DDEBUG2
debug: CUOPTFLAGS = -G
Expand All @@ -81,11 +82,17 @@ gcheck.o: gcheck.cu *.h ../../src/*.h ../../src/*.cu
%.o : %.cc *.h ../../src/*.h
$(CXX) $(CPPFLAGS) $(CXXFLAGS) $(CUINC) -c $< -o $@

$(cu_main): gcheck.o $(LIBDIR)/lib$(MODELLIB).a $(cu_objects)
$(NVCC) $< -o $@ $(cu_objects) $(CUARCHFLAGS) $(LIBFLAGS) $(CULIBFLAGS)
# This is built with nvcc and linked with objects compiled with nvcc or g++
$(cu_main): gcheck.o $(LIBDIR)/lib$(MODELLIB).a $(cu_objects) gmain.cc
$(NVCC) $< -o $@ $(cu_objects) $(CUARCHFLAGS) $(LIBFLAGS) $(CULIBFLAGS) gmain.cc

$(cxx_main): check.o $(LIBDIR)/lib$(MODELLIB).a $(cxx_objects)
$(CXX) $< -o $@ $(cxx_objects) $(CPPFLAGS) $(CXXFLAGS) -ldl -pthread $(LIBFLAGS) $(CULIBFLAGS)
# This is built with g++ and linked with objects compiled with g++
$(cxx_main): check.o $(LIBDIR)/lib$(MODELLIB).a $(cxx_objects) cmain.cc
$(CXX) $< -o $@ $(cxx_objects) $(CPPFLAGS) $(CXXFLAGS) -ldl -pthread $(LIBFLAGS) $(CULIBFLAGS) cmain.cc

# This is built with nvcc and linked with objects compiled with nvcc or g++
$(het_main): gcheck.o check.o $(LIBDIR)/lib$(MODELLIB).a $(cu_objects) $(cxx_objects) hmain.cc
$(NVCC) gcheck.o check.o -o $@ $(cu_objects) $(cxx_objects) $(CUARCHFLAGS) -lgomp $(LIBFLAGS) $(CULIBFLAGS) hmain.cc


runTest.o: $(GTESTLIBS)
Expand Down
228 changes: 145 additions & 83 deletions epoch1/cuda/ee_mumu/SubProcesses/P1_Sigma_sm_epem_mupmum/check.cc

Large diffs are not rendered by default.

29 changes: 29 additions & 0 deletions epoch1/cuda/ee_mumu/SubProcesses/P1_Sigma_sm_epem_mupmum/check.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
#ifndef CHECK_H
#define CHECK_H 1

#include <iostream>
#include <string>
#include <vector>

// from gcheck.cu (compiled with nvcc)
int gcheck( int argc,
char **argv,
std::string& out,
std::vector<double>& stats,
const std::string& tag = "",
const int niter_multiplier = 1 ); // only for the GPU

// from check.cc (compiled with g++)
int check( int argc,
char **argv,
std::string& out,
std::vector<double>& stats,
const std::string& tag = "" );

// from check.cc (compiled with g++)
int check_omp_threads( bool debug = false ); // returns the number of OMP threads

// from check.cc (compiled with g++)
const std::string check_nprocall(); // returns the output of `nproc --all`

#endif // CHECK_H
12 changes: 12 additions & 0 deletions epoch1/cuda/ee_mumu/SubProcesses/P1_Sigma_sm_epem_mupmum/cmain.cc
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
#include "check.h"

// This is built with g++ and linked with objects compiled with g++
int main( int argc, char **argv )
{
std::string cpuOut;
std::vector<double> cpuStats;
int cpuStatus = check( argc, argv, cpuOut, cpuStats, "(CPU) " );
std::cout << cpuOut;
//for ( auto& stat : cpuStats ) std::cout << stat << std::endl;
return cpuStatus;
}
12 changes: 12 additions & 0 deletions epoch1/cuda/ee_mumu/SubProcesses/P1_Sigma_sm_epem_mupmum/gmain.cc
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
#include "check.h"

// This is built with nvcc and linked with objects compiled with nvcc or g++
int main( int argc, char **argv )
{
std::string gpuOut;
std::vector<double> gpuStats;
int gpuStatus = gcheck( argc, argv, gpuOut, gpuStats, "(GPU) " );
std::cout << gpuOut;
//for ( auto& stat : gpuStats ) std::cout << stat << std::endl;
return gpuStatus;
}
107 changes: 107 additions & 0 deletions epoch1/cuda/ee_mumu/SubProcesses/P1_Sigma_sm_epem_mupmum/hmain.cc
Original file line number Diff line number Diff line change
@@ -0,0 +1,107 @@
#include "check.h"

#include <iomanip>
#include <thread>

void dumptime( const std::string& tag,
const int nevtALL,
const double sumgtim,
const double sumrtim,
const double sumwtim )
{
std::cout << "-----------------------------------------------------------------------------" << std::endl
<< tag << "TotalEventsComputed = " << nevtALL << std::endl
<< std::scientific // fixed format: affects all floats (default precision: 6)
<< tag << "TotalTime[Rnd+Rmb+ME] (123) = ( "
<< sumgtim+sumrtim+sumwtim << std::string(16, ' ') << " ) sec" << std::endl
<< tag << "TotalTime[Rambo+ME] (23) = ( "
<< sumrtim+sumwtim << std::string(16, ' ') << " ) sec" << std::endl
<< tag << "TotalTime[RndNumGen] (1) = ( " << sumgtim << std::string(16, ' ') << " ) sec" << std::endl
<< tag << "TotalTime[Rambo] (2) = ( " << sumrtim << std::string(16, ' ') << " ) sec" << std::endl
<< tag << "TotalTime[MatrixElems] (3) = ( " << sumwtim << std::string(16, ' ') << " ) sec" << std::endl
<< std::defaultfloat // default format: affects all floats
<< "-----------------------------------------------------------------------------" << std::endl;
}

void dumptput( const std::string& tag,
const int nevtALL,
const double tputgrw,
const double tputrw,
const double tputw,
const int nthreadsomp ) // >0 for CPU, 0 for GPU and HET
{
if ( nthreadsomp > 0 )
{
std::string nprocall = check_nprocall();
std::cout << tag << "OMP threads / `nproc --all` = " << nthreadsomp << " / " << nprocall; // includes a newline
}
std::cout << tag << "TotalEventsComputed = " << nevtALL << std::endl
<< std::scientific // fixed format: affects all floats (default precision: 6)
<< tag << "EvtsPerSec[Rnd+Rmb+ME](123) = ( " << tputgrw
<< std::string(16, ' ') << " ) sec^-1" << std::endl
<< tag << "EvtsPerSec[Rmb+ME] (23) = ( " << tputrw
<< std::string(16, ' ') << " ) sec^-1" << std::endl
<< tag << "EvtsPerSec[MatrixElems] (3) = ( " << tputw
<< std::string(16, ' ') << " ) sec^-1" << std::endl
<< std::defaultfloat // default format: affects all floats
<< "*****************************************************************************" << std::endl;
}

// This is compiled with g++ and linked with objects compiled with nvcc or g++
int main( int argc, char **argv )
{
std::string gpuOut;
std::vector<double> gpuStats;
int gpuStatus;
//int gpuMult = 1; // GPU processes same #events as CPU, with same random seeds
int gpuMult = 72; // GPU processes 72x #events as CPU, with different random seeds
int nthreadsomp = check_omp_threads(); // this is always > 0
gpuMult /= nthreadsomp; // reduce the number of GPU iterations if several OMP threads are used on the CPU
std::string gpuTag = "(GPU) ";
std::thread gpuThread( [&]{ gpuStatus = gcheck( argc, argv, gpuOut, gpuStats, gpuTag, gpuMult ); });

std::string cpuOut;
std::vector<double> cpuStats;
int cpuStatus;
std::string cpuTag = "(CPU) ";
std::thread cpuThread( [&]{ cpuStatus = check( argc, argv, cpuOut, cpuStats, cpuTag ); });

gpuThread.join();
std::cout << gpuOut;

cpuThread.join();
std::cout << cpuOut;

int gpuNevtALL = (int)(gpuStats[0]);
double gpuSumgtim = gpuStats[1];
double gpuSumrtim = gpuStats[2];
double gpuSumwtim = gpuStats[3];

dumptime( gpuTag, gpuNevtALL, gpuSumgtim, gpuSumrtim, gpuSumwtim );
double gpuTputgrw = gpuNevtALL/(gpuSumgtim+gpuSumrtim+gpuSumwtim);
double gpuTputrw = gpuNevtALL/(gpuSumrtim+gpuSumwtim);
double gpuTputw = gpuNevtALL/(gpuSumwtim);
dumptput( gpuTag, gpuNevtALL, gpuTputgrw, gpuTputrw, gpuTputw, 0 );

int cpuNevtALL = (int)(cpuStats[0]);
double cpuSumgtim = cpuStats[1];
double cpuSumrtim = cpuStats[2];
double cpuSumwtim = cpuStats[3];

dumptime( cpuTag, cpuNevtALL, cpuSumgtim, cpuSumrtim, cpuSumwtim );
double cpuTputgrw = cpuNevtALL/(cpuSumgtim+cpuSumrtim+cpuSumwtim);
double cpuTputrw = cpuNevtALL/(cpuSumrtim+cpuSumwtim);
double cpuTputw = cpuNevtALL/(cpuSumwtim);
dumptput( cpuTag, cpuNevtALL, cpuTputgrw, cpuTputrw, cpuTputw, nthreadsomp );

std::string hetTag = "(HET) ";
int hetNevtALL = gpuNevtALL+cpuNevtALL;
double hetTputgrw = gpuTputgrw+cpuTputgrw;
double hetTputrw = gpuTputrw+cpuTputrw;
double hetTputw = gpuTputw+cpuTputw;
dumptput( hetTag, hetNevtALL, hetTputgrw, hetTputrw, hetTputw, 0 );

if ( gpuStatus != 0 ) return 1;
if ( cpuStatus != 0 ) return 2;
return 0;
}
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ namespace mgOnGpu

public:

TimerMap() : m_timer(), m_active(""), m_partitionTimers(), m_partitionIds() {}
TimerMap( const std::string& tag="" ) : m_timer(), m_tag(tag), m_active(""), m_partitionTimers(), m_partitionIds() {}
virtual ~TimerMap() {}

// Start the timer for a specific partition (key must be a non-empty string)
Expand Down Expand Up @@ -115,19 +115,19 @@ namespace mgOnGpu
ostr << std::setprecision(6); // set precision (default=6): affects all floats
ostr << std::fixed; // fixed format: affects all floats
for ( auto ip : m_partitionTimers )
ostr << std::setw(maxsize) << ip.first << " : "
ostr << m_tag << std::setw(maxsize) << ip.first << " : "
<< std::setw(12) << ip.second << " sec" << std::endl;
ostr << std::setw(maxsize) << totalKey << " : "
ostr << m_tag << std::setw(maxsize) << totalKey << " : "
<< std::setw(12) << total << " sec" << std::endl
<< std::setw(maxsize) << total123Key << " : "
<< m_tag << std::setw(maxsize) << total123Key << " : "
<< std::setw(12) << total123 << " sec" << std::endl
<< std::setw(maxsize) << total23Key << " : "
<< m_tag << std::setw(maxsize) << total23Key << " : "
<< std::setw(12) << total23 << " sec" << std::endl
<< std::setw(maxsize) << total1Key << " : "
<< m_tag << std::setw(maxsize) << total1Key << " : "
<< std::setw(12) << total1 << " sec" << std::endl
<< std::setw(maxsize) << total2Key << " : "
<< m_tag << std::setw(maxsize) << total2Key << " : "
<< std::setw(12) << total2 << " sec" << std::endl
<< std::setw(maxsize) << total3Key << " : "
<< m_tag << std::setw(maxsize) << total3Key << " : "
<< std::setw(12) << total3 << " sec" << std::endl;
ostr << std::defaultfloat; // default format: affects all floats
}
Expand All @@ -136,6 +136,7 @@ namespace mgOnGpu
private:

Timer<TIMERTYPE> m_timer;
std::string m_tag;
std::string m_active;
std::map< std::string, float > m_partitionTimers;
std::map< std::string, uint32_t > m_partitionIds;
Expand Down