Heterogeneous MadGraph: parallel CPU+GPU executions by valassi · Pull Request #87 · madgraph5/madgraph4gpu

valassi · 2020-12-04T14:04:37Z

See a more complete description in #85

I created a simple prototype. The point here was mainly a proof of concept, also trying to sort out the build (which may be useful for addressing #83).

The current prototype runs exactly the same number of events with exactly the same random numbers in parallel on CPU (with OMP threads) and on GPU. Both computations give the same sets of events, which are not yet combined. As the GPU is much faster, essentiually the net effect is a computation that lasts as long as the CPU version, but does double events (because the same events are also on the GPU), so the throghput doubles.

This clearly needs a lot more work (especially the optimization is tricky), but it's a useful prrof of concept.

PS I forgot to mention: this includes and supersedes #82. As in that one, the build of runTest (#83) is disabled here. I would suggest however to fix it after including these changes, which give a possible direction for how to combine cuda and c++ modules. One option for #83 is to keep a single runTest.exe as it is now, but make it much clearer that real modules are either c++ with gcc or cuda with nvcc, and then only thin layers integrating both are compiled with nvcc. Essentially, I only had to add -lgomp to build the combined module, probably the same is ok for runTest.

…cuda)

…ys in the past)

…PU later

…o the end

./hcheck.exe -p 16384 32 1 *********************************************************************** NumBlocksPerGrid = 16384 NumThreadsPerBlock = 32 NumIterations = 1 ----------------------------------------------------------------------- FP precision = DOUBLE Complex type = THRUST::COMPLEX RanNumb memory layout = AOSOA[4] Momenta memory layout = AOSOA[4] Random number generation = CURAND DEVICE (CUDA code) Wavefunction GPU memory = LOCAL ----------------------------------------------------------------------- NumIterations = 1 TotalTime[Rnd+Rmb+ME] (123)= ( 7.312938e-03 ) sec TotalTime[Rambo+ME] (23)= ( 6.714818e-03 ) sec TotalTime[RndNumGen] (1)= ( 5.981200e-04 ) sec TotalTime[Rambo] (2)= ( 5.945168e-03 ) sec TotalTime[MatrixElems] (3)= ( 7.696500e-04 ) sec MeanTimeInMatrixElems = ( 7.696500e-04 ) sec [Min,Max]TimeInMatrixElems = [ 7.696500e-04 , 7.696500e-04 ] sec ----------------------------------------------------------------------- TotalEventsComputed = 524288 (nan=0) EvtsPerSec[Rnd+Rmb+ME](123)= ( 7.169321e+07 ) sec^-1 EvtsPerSec[Rmb+ME] (23)= ( 7.807926e+07 ) sec^-1 EvtsPerSec[MatrixElems] (3)= ( 6.812032e+08 ) sec^-1 *********************************************************************** NumMatrixElements(notNan) = 524288 MeanMatrixElemValue = ( 1.371958e-02 +- 1.132119e-05 ) GeV^0 [Min,Max]MatrixElemValue = [ 6.071582e-03 , 3.374915e-02 ] GeV^0 StdDevMatrixElemValue = ( 8.197419e-03 ) GeV^0 MeanWeight = ( 4.515827e-01 +- 0.000000e+00 ) [Min,Max]Weight = [ 4.515827e-01 , 4.515827e-01 ] StdDevWeight = ( 0.000000e+00 ) *********************************************************************** (GPU) 00 CudaFree : 0.872603 sec (GPU) 0a ProcInit : 0.000233 sec (GPU) 0b MemAlloc : 0.035793 sec (GPU) 0c GenCreat : 0.009814 sec (GPU) 0d SGoodHel : 0.001759 sec (GPU) 1a GenSeed : 0.000009 sec (GPU) 1b GenRnGen : 0.000589 sec (GPU) 2a RamboIni : 0.000022 sec (GPU) 2b RamboFin : 0.000014 sec (GPU) 2c CpDTHwgt : 0.000502 sec (GPU) 2d CpDTHmom : 0.005407 sec (GPU) 3a SigmaKin : 0.000014 sec (GPU) 3b CpDTHmes : 0.000756 sec (GPU) 4a DumpLoop : 0.004765 sec (GPU) 8a CompStat : 0.003540 sec (GPU) 9a GenDestr : 0.000048 sec (GPU) 9b DumpScrn : 0.000044 sec (GPU) 9c DumpJson : 0.000007 sec (GPU) TOTAL : 0.935918 sec (GPU) TOTAL (123) : 0.007313 sec (GPU) TOTAL (23) : 0.006715 sec (GPU) TOTAL (1) : 0.000598 sec (GPU) TOTAL (2) : 0.005945 sec (GPU) TOTAL (3) : 0.000770 sec *********************************************************************** *********************************************************************** NumBlocksPerGrid = 16384 NumThreadsPerBlock = 32 NumIterations = 1 ----------------------------------------------------------------------- FP precision = DOUBLE Complex type = STD::COMPLEX RanNumb memory layout = AOSOA[4] Momenta memory layout = AOSOA[4] Random number generation = CURAND (C++ code) OMP threads / maxthreads = 4 / 4 ----------------------------------------------------------------------- NumIterations = 1 TotalTime[Rnd+Rmb+ME] (123)= ( 5.725351e-01 ) sec TotalTime[Rambo+ME] (23)= ( 5.449318e-01 ) sec TotalTime[RndNumGen] (1)= ( 2.760323e-02 ) sec TotalTime[Rambo] (2)= ( 9.914417e-02 ) sec TotalTime[MatrixElems] (3)= ( 4.457877e-01 ) sec MeanTimeInMatrixElems = ( 4.457877e-01 ) sec [Min,Max]TimeInMatrixElems = [ 4.457877e-01 , 4.457877e-01 ] sec ----------------------------------------------------------------------- TotalEventsComputed = 524288 (nan=0) EvtsPerSec[Rnd+Rmb+ME](123)= ( 9.157308e+05 ) sec^-1 EvtsPerSec[Rmb+ME] (23)= ( 9.621167e+05 ) sec^-1 EvtsPerSec[MatrixElems] (3)= ( 1.176094e+06 ) sec^-1 *********************************************************************** NumMatrixElements(notNan) = 524288 MeanMatrixElemValue = ( 1.371958e-02 +- 1.132119e-05 ) GeV^0 [Min,Max]MatrixElemValue = [ 6.071582e-03 , 3.374915e-02 ] GeV^0 StdDevMatrixElemValue = ( 8.197419e-03 ) GeV^0 MeanWeight = ( 4.515827e-01 +- 0.000000e+00 ) [Min,Max]Weight = [ 4.515827e-01 , 4.515827e-01 ] StdDevWeight = ( 0.000000e+00 ) *********************************************************************** (CPU) 0a ProcInit : 0.000331 sec (CPU) 0b MemAlloc : 0.025358 sec (CPU) 0c GenCreat : 0.000915 sec (CPU) 1a GenSeed : 0.000009 sec (CPU) 1b GenRnGen : 0.027595 sec (CPU) 2a RamboIni : 0.006872 sec (CPU) 2b RamboFin : 0.092273 sec (CPU) 3a SigmaKin : 0.445788 sec (CPU) 4a DumpLoop : 0.004605 sec (CPU) 8a CompStat : 0.003633 sec (CPU) 9a GenDestr : 0.000094 sec (CPU) 9b DumpScrn : 0.004946 sec (CPU) 9c DumpJson : 0.000008 sec (CPU) TOTAL : 0.612425 sec (CPU) TOTAL (123) : 0.572535 sec (CPU) TOTAL (23) : 0.544932 sec (CPU) TOTAL (1) : 0.027603 sec (CPU) TOTAL (2) : 0.099144 sec (CPU) TOTAL (3) : 0.445788 sec *********************************************************************** ----------------------------------------------------------------------- TotalTime[Rnd+Rmb+ME] (123)= ( 5.798480e-01 ) sec TotalTime[Rambo+ME] (23)= ( 5.516467e-01 ) sec TotalTime[RndNumGen] (1)= ( 2.820135e-02 ) sec TotalTime[Rambo] (2)= ( 1.050893e-01 ) sec TotalTime[MatrixElems] (3)= ( 4.465573e-01 ) sec ----------------------------------------------------------------------- TotalEventsComputed = 1048576 EvtsPerSec[Rnd+Rmb+ME](123)= ( 1.808364e+06 ) sec^-1 EvtsPerSec[Rmb+ME] (23)= ( 1.900811e+06 ) sec^-1 EvtsPerSec[MatrixElems] (3)= ( 2.348133e+06 ) sec^-1 -----------------------------------------------------------------------

…"//" in paths

This makes me realise the calculation is clearly wrong: one should add throughputs, not times (the wall time is the same on CPU and GPU!)

Decrease the GPU multiplier from 100 to 70 (itscrd03, with 4 OMP threads). **************************************************************************** (GPU) NumBlocksPerGrid = 16384 (GPU) NumThreadsPerBlock = 32 (GPU) NumIterations = 700 ---------------------------------------------------------------------------- (GPU) FP precision = DOUBLE (GPU) Complex type = THRUST::COMPLEX (GPU) RanNumb memory layout = AOSOA[4] (GPU) Momenta memory layout = AOSOA[4] (GPU) Random number generation = CURAND DEVICE (CUDA code) (GPU) Wavefunction GPU memory = LOCAL ---------------------------------------------------------------------------- (GPU) NumIterations = 700 (GPU) TotalTime[Rnd+Rmb+ME] (123)= ( 5.835859e+00 ) sec (GPU) TotalTime[Rambo+ME] (23)= ( 5.378664e+00 ) sec (GPU) TotalTime[RndNumGen] (1)= ( 4.571957e-01 ) sec (GPU) TotalTime[Rambo] (2)= ( 4.823339e+00 ) sec (GPU) TotalTime[MatrixElems] (3)= ( 5.553249e-01 ) sec (GPU) MeanTimeInMatrixElems = ( 7.933214e-04 ) sec (GPU) [Min,Max]TimeInMatrixElems = [ 7.126600e-04 , 1.121232e-02 ] sec ---------------------------------------------------------------------------- (GPU) TotalEventsComputed = 367001600 (nan=0) (GPU) EvtsPerSec[Rnd+Rmb+ME](123)= ( 6.288733e+07 ) sec^-1 (GPU) EvtsPerSec[Rmb+ME] (23)= ( 6.823286e+07 ) sec^-1 (GPU) EvtsPerSec[MatrixElems] (3)= ( 6.608772e+08 ) sec^-1 **************************************************************************** (GPU) NumMatrixElements(notNan) = 367001600 (GPU) MeanMatrixElemValue = ( 1.371705e-02 +- 4.280686e-07 ) GeV^0 (GPU) [Min,Max]MatrixElemValue = [ 6.071582e-03 , 3.374926e-02 ] GeV^0 (GPU) StdDevMatrixElemValue = ( 8.200632e-03 ) GeV^0 (GPU) MeanWeight = ( 4.515827e-01 +- 0.000000e+00 ) (GPU) [Min,Max]Weight = [ 4.515827e-01 , 4.515827e-01 ] (GPU) StdDevWeight = ( 0.000000e+00 ) **************************************************************************** (GPU) 00 CudaFree : 1.040999 sec (GPU) 0a ProcInit : 0.000255 sec (GPU) 0b MemAlloc : 0.043748 sec (GPU) 0c GenCreat : 0.035068 sec (GPU) 0d SGoodHel : 0.001759 sec (GPU) 1a GenSeed : 0.005864 sec (GPU) 1b GenRnGen : 0.451332 sec (GPU) 2a RamboIni : 0.010404 sec (GPU) 2b RamboFin : 0.008416 sec (GPU) 2c CpDTHwgt : 0.369433 sec (GPU) 2d CpDTHmom : 4.435089 sec (GPU) 3a SigmaKin : 0.010518 sec (GPU) 3b CpDTHmes : 0.544807 sec (GPU) 4a DumpLoop : 2.404587 sec (GPU) 8a CompStat : 2.632091 sec (GPU) 9a GenDestr : 0.000196 sec (GPU) 9b DumpScrn : 0.000071 sec (GPU) 9c DumpJson : 0.000007 sec (GPU) TOTAL : 11.994643 sec (GPU) TOTAL (123) : 5.835862 sec (GPU) TOTAL (23) : 5.378666 sec (GPU) TOTAL (1) : 0.457196 sec (GPU) TOTAL (2) : 4.823342 sec (GPU) TOTAL (3) : 0.555325 sec **************************************************************************** **************************************************************************** (CPU) NumBlocksPerGrid = 16384 (CPU) NumThreadsPerBlock = 32 (CPU) NumIterations = 10 ---------------------------------------------------------------------------- (CPU) FP precision = DOUBLE (CPU) Complex type = STD::COMPLEX (CPU) RanNumb memory layout = AOSOA[4] (CPU) Momenta memory layout = AOSOA[4] (CPU) Random number generation = CURAND (C++ code) (CPU) OMP threads / maxthreads = 4 / 4 ---------------------------------------------------------------------------- (CPU) NumIterations = 10 (CPU) TotalTime[Rnd+Rmb+ME] (123)= ( 5.245397e+00 ) sec (CPU) TotalTime[Rambo+ME] (23)= ( 4.964864e+00 ) sec (CPU) TotalTime[RndNumGen] (1)= ( 2.805332e-01 ) sec (CPU) TotalTime[Rambo] (2)= ( 1.005407e+00 ) sec (CPU) TotalTime[MatrixElems] (3)= ( 3.959456e+00 ) sec (CPU) MeanTimeInMatrixElems = ( 3.959456e-01 ) sec (CPU) [Min,Max]TimeInMatrixElems = [ 2.837697e-01 , 4.298637e-01 ] sec ---------------------------------------------------------------------------- (CPU) TotalEventsComputed = 5242880 (nan=0) (CPU) EvtsPerSec[Rnd+Rmb+ME](123)= ( 9.995201e+05 ) sec^-1 (CPU) EvtsPerSec[Rmb+ME] (23)= ( 1.055997e+06 ) sec^-1 (CPU) EvtsPerSec[MatrixElems] (3)= ( 1.324141e+06 ) sec^-1 **************************************************************************** (CPU) NumMatrixElements(notNan) = 5242880 (CPU) MeanMatrixElemValue = ( 1.372304e-02 +- 3.581814e-06 ) GeV^0 (CPU) [Min,Max]MatrixElemValue = [ 6.071582e-03 , 3.374925e-02 ] GeV^0 (CPU) StdDevMatrixElemValue = ( 8.201400e-03 ) GeV^0 (CPU) MeanWeight = ( 4.515827e-01 +- 0.000000e+00 ) (CPU) [Min,Max]Weight = [ 4.515827e-01 , 4.515827e-01 ] (CPU) StdDevWeight = ( 0.000000e+00 ) **************************************************************************** (CPU) 0a ProcInit : 0.015581 sec (CPU) 0b MemAlloc : 0.028460 sec (CPU) 0c GenCreat : 0.000978 sec (CPU) 1a GenSeed : 0.000084 sec (CPU) 1b GenRnGen : 0.280449 sec (CPU) 2a RamboIni : 0.072770 sec (CPU) 2b RamboFin : 0.932637 sec (CPU) 3a SigmaKin : 3.959456 sec (CPU) 4a DumpLoop : 0.034727 sec (CPU) 8a CompStat : 0.038073 sec (CPU) 9a GenDestr : 0.000141 sec (CPU) 9b DumpScrn : 0.061204 sec (CPU) 9c DumpJson : 0.000011 sec (CPU) TOTAL : 5.424570 sec (CPU) TOTAL (123) : 5.245397 sec (CPU) TOTAL (23) : 4.964864 sec (CPU) TOTAL (1) : 0.280533 sec (CPU) TOTAL (2) : 1.005407 sec (CPU) TOTAL (3) : 3.959456 sec **************************************************************************** ---------------------------------------------------------------------------- (GPU) TotalEventsComputed = 367001600 (GPU) TotalTime[Rnd+Rmb+ME] (123)= ( 5.835859e+00 ) sec (GPU) TotalTime[Rambo+ME] (23)= ( 5.378664e+00 ) sec (GPU) TotalTime[RndNumGen] (1)= ( 4.571957e-01 ) sec (GPU) TotalTime[Rambo] (2)= ( 4.823339e+00 ) sec (GPU) TotalTime[MatrixElems] (3)= ( 5.553249e-01 ) sec ---------------------------------------------------------------------------- (GPU) TotalEventsComputed = 367001600 (GPU) EvtsPerSec[Rnd+Rmb+ME](123)= ( 6.288733e+07 ) sec^-1 (GPU) EvtsPerSec[Rmb+ME] (23)= ( 6.823286e+07 ) sec^-1 (GPU) EvtsPerSec[MatrixElems] (3)= ( 6.608772e+08 ) sec^-1 **************************************************************************** ---------------------------------------------------------------------------- (CPU) TotalEventsComputed = 5242880 (CPU) TotalTime[Rnd+Rmb+ME] (123)= ( 5.245397e+00 ) sec (CPU) TotalTime[Rambo+ME] (23)= ( 4.964864e+00 ) sec (CPU) TotalTime[RndNumGen] (1)= ( 2.805332e-01 ) sec (CPU) TotalTime[Rambo] (2)= ( 1.005407e+00 ) sec (CPU) TotalTime[MatrixElems] (3)= ( 3.959456e+00 ) sec ---------------------------------------------------------------------------- (CPU) TotalEventsComputed = 5242880 (CPU) EvtsPerSec[Rnd+Rmb+ME](123)= ( 9.995201e+05 ) sec^-1 (CPU) EvtsPerSec[Rmb+ME] (23)= ( 1.055997e+06 ) sec^-1 (CPU) EvtsPerSec[MatrixElems] (3)= ( 1.324141e+06 ) sec^-1 **************************************************************************** (HET) TotalEventsComputed = 372244480 (HET) EvtsPerSec[Rnd+Rmb+ME](123)= ( 6.388685e+07 ) sec^-1 (HET) EvtsPerSec[Rmb+ME] (23)= ( 6.928886e+07 ) sec^-1 (HET) EvtsPerSec[MatrixElems] (3)= ( 6.622013e+08 ) sec^-1 ****************************************************************************

valassi · 2020-12-04T17:13:13Z

I have just pushed more changes.

I added a quick hack to process 70 times more events on the GPU, and use differenr random seeds
I do not compute combined "physics" (average ME)
I fixed the calculation of the combined throughput, one must add thoughputs, not times and events (the wall time is the same)

More is needed eventually, but I think this can be considered for merging. I remove the draft status.

This is an example

****************************************************************************
----------------------------------------------------------------------------
(GPU) TotalEventsComputed        = 367001600
(GPU) TotalTime[Rnd+Rmb+ME] (123)= ( 5.835859e+00                 )  sec
(GPU) TotalTime[Rambo+ME]    (23)= ( 5.378664e+00                 )  sec
(GPU) TotalTime[RndNumGen]    (1)= ( 4.571957e-01                 )  sec
(GPU) TotalTime[Rambo]        (2)= ( 4.823339e+00                 )  sec
(GPU) TotalTime[MatrixElems]  (3)= ( 5.553249e-01                 )  sec
----------------------------------------------------------------------------
(GPU) TotalEventsComputed        = 367001600
(GPU) EvtsPerSec[Rnd+Rmb+ME](123)= ( 6.288733e+07                 )  sec^-1
(GPU) EvtsPerSec[Rmb+ME]     (23)= ( 6.823286e+07                 )  sec^-1
(GPU) EvtsPerSec[MatrixElems] (3)= ( 6.608772e+08                 )  sec^-1
****************************************************************************
----------------------------------------------------------------------------
(CPU) TotalEventsComputed        = 5242880
(CPU) TotalTime[Rnd+Rmb+ME] (123)= ( 5.245397e+00                 )  sec
(CPU) TotalTime[Rambo+ME]    (23)= ( 4.964864e+00                 )  sec
(CPU) TotalTime[RndNumGen]    (1)= ( 2.805332e-01                 )  sec
(CPU) TotalTime[Rambo]        (2)= ( 1.005407e+00                 )  sec
(CPU) TotalTime[MatrixElems]  (3)= ( 3.959456e+00                 )  sec
----------------------------------------------------------------------------
(CPU) TotalEventsComputed        = 5242880
(CPU) EvtsPerSec[Rnd+Rmb+ME](123)= ( 9.995201e+05                 )  sec^-1
(CPU) EvtsPerSec[Rmb+ME]     (23)= ( 1.055997e+06                 )  sec^-1
(CPU) EvtsPerSec[MatrixElems] (3)= ( 1.324141e+06                 )  sec^-1
****************************************************************************
(HET) TotalEventsComputed        = 372244480
(HET) EvtsPerSec[Rnd+Rmb+ME](123)= ( 6.388685e+07                 )  sec^-1
(HET) EvtsPerSec[Rmb+ME]     (23)= ( 6.928886e+07                 )  sec^-1
(HET) EvtsPerSec[MatrixElems] (3)= ( 6.622013e+08                 )  sec^-1
****************************************************************************

valassi · 2020-12-04T17:18:55Z

PS Consider approving/merging #84 first (OMP multi threading), as that is included here too.

Fix conflicts in epoch1/cuda/ee_mumu/SubProcesses/P1_Sigma_sm_epem_mupmum/check.cc

Fix conflicts in epoch1/cuda/ee_mumu/SubProcesses/Makefile

Fix conflicts in epoch1/cuda/ee_mumu/SubProcesses/P1_Sigma_sm_epem_mupmum/check.cc

Fix conflicts: epoch1/cuda/ee_mumu/SubProcesses/P1_Sigma_sm_epem_mupmum/check.cc

valassi · 2021-04-01T10:40:48Z

Hm I deleted and recreated the branch, I thought this would be picked up. I am reopening this as

git push -f origin 9026441:het
[reopen PR]
git push -f origin 52edc92:het

Fix conflicts: epoch1/cuda/ee_mumu/SubProcesses/P1_Sigma_sm_epem_mupmum/check.cc

valassi · 2021-04-09T15:35:48Z

This PR is now obsolete and I will close it.

It is replaced by PR #159.

I copied a few relevant comments to the general issue #85.

valassi added 12 commits December 4, 2020 12:22

Bug fix - main should return 0

f0210b5

Prepare for heterogeneous executable, part on CPU (c++) part on GPU (…

b71930a

…cuda)

Fix comments: build gcheck.exe with nvcc, check.exe with g++ (as alwa…

2fe543a

…ys in the past)

First proof of concept of heterogeneous Madgraph: run GPU first and C…

71c4e67

…PU later

Streamline parameter outputs: move architecture-specific parameters t…

695327c

…o the end

Improve printout of number of NANs

019409e

Dump to a string first and then show that string

e855ab2

Fix het main (was executing twice CPU!), also put gpu first again

3448b80

Dispaly (GPU) or (CPU) in timer dumps

cf2dc11

Extract performance stats for both CPU and GPU

dfaa49a

Launch in parallel on cpu and gpu on two different threads

0f5bc4b

valassi requested review from hageboeck, oliviermattelaer and roiser December 4, 2020 14:04

valassi marked this pull request as draft December 4, 2020 14:14

valassi added 5 commits December 4, 2020 16:04

Reenable tests. Fix link by adding -lgomp. Addresses issue madgraph5#83.

88c916d

Minor improvement: remove trailing "/" from TOOLSDIR to avoid double …

2939cbe

…"//" in paths

Improve Makefile: remove undefined/unneeded CPPFLAGS

c496d04

(Empty commit) Resync with OMP branch for issue83

a7d3b6f

Merge branch 'master' into het

b655e2e

This was referenced Dec 4, 2020

Improve #52: separate test executable for c++ and cuda #83

Closed

Implement multi-threading (#82). Uses OpenMP as suggested by @hageboeck. #84

Merged

valassi added 5 commits December 4, 2020 16:49

Move check/gcheck definition to check.h header

0733df2

Move to multi-line (there may be more arguments coming...)

2c55e79

Add (CPU) or (GPU) or (HET) tags in front of all statistics dumped

7cbff09

Implement a quick hack to give more events to the GPU than the CPU.

41ce9b5

This makes me realise the calculation is clearly wrong: one should add throughputs, not times (the wall time is the same on CPU and GPU!)

valassi marked this pull request as ready for review December 4, 2020 17:16

valassi added 14 commits December 6, 2020 16:36

Merge remote-tracking branch 'upstream/master' into het

8f6ca66

Merge branch 'issue83' into het

21cedcf

Merge remote-tracking branch 'upstream/master' into het

4df7bac

Merge branch 'issue83' into het

61925d8

Fix conflicts in epoch1/cuda/ee_mumu/SubProcesses/P1_Sigma_sm_epem_mupmum/check.cc

Improve the printout

ca58d31

Try to take into account OMP in heterogenous scheduling and printout

369e0b0

Merge branch 'issue83' into het

d2c0f1d

Fix conflicts in epoch1/cuda/ee_mumu/SubProcesses/Makefile

Merge remote-tracking branch 'upstream/master' into het

f2b867f

Fix conflicts in epoch1/cuda/ee_mumu/SubProcesses/P1_Sigma_sm_epem_mupmum/check.cc

Merge remote-tracking branch 'upstream/master' into het

d9261e7

Merge remote-tracking branch 'upstream/master' into het

4a92baa

Merge remote-tracking branch 'upstream/master' into het

6eca62b

Fix conflicts: epoch1/cuda/ee_mumu/SubProcesses/P1_Sigma_sm_epem_mupmum/check.cc

Merge remote-tracking branch 'upstream/master' into het

a81f80c

Fix conflicts: epoch1/cuda/ee_mumu/SubProcesses/P1_Sigma_sm_epem_mupmum/check.cc

Fix unresolved conflicts from previous merge

68dea59

Further fix for build issues in previous merge

52edc92

valassi closed this Apr 1, 2021

valassi deleted the het branch April 1, 2021 10:30

valassi reopened this Apr 1, 2021

valassi force-pushed the het branch from 9026441 to 52edc92 Compare April 1, 2021 10:42

valassi added 2 commits April 1, 2021 17:34

Merge remote-tracking branch 'upstream/master' into het

636c300

Fix conflicts: epoch1/cuda/ee_mumu/SubProcesses/P1_Sigma_sm_epem_mupmum/check.cc

[het] Fix bug in a previous merge: outStream, not std::cout

23c9560

This was referenced Apr 2, 2021

het + epoch1/epoch2 #153

Closed

WIP - het + klas2 + epoch1/epoch2 (Heterogeneous standalone application: GPU + SIMD CPU) #159

Closed

Heterogeneous MadGraph: parallel CPU+GPU executions #85

Open

valassi closed this Apr 9, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Heterogeneous MadGraph: parallel CPU+GPU executions#87

Heterogeneous MadGraph: parallel CPU+GPU executions#87
valassi wants to merge 38 commits into
madgraph5:masterfrom
valassi:het

valassi commented Dec 4, 2020 •

edited

Loading

Uh oh!

valassi commented Dec 4, 2020 •

edited

Loading

Uh oh!

valassi commented Dec 4, 2020

Uh oh!

valassi commented Apr 1, 2021 •

edited

Loading

Uh oh!

valassi commented Apr 9, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

valassi commented Dec 4, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

valassi commented Dec 4, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

valassi commented Dec 4, 2020

Uh oh!

valassi commented Apr 1, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

valassi commented Apr 9, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

valassi commented Dec 4, 2020 •

edited

Loading

valassi commented Dec 4, 2020 •

edited

Loading

valassi commented Apr 1, 2021 •

edited

Loading