[eemumu_AV] Use unique_ptr and RAII to automate memory management. by hageboeck · Pull Request #47 · madgraph5/madgraph4gpu

hageboeck · 2020-11-10T15:21:13Z

I wanted to test if memory / resource management can be automated. With unique_ptr and a few helper functions, a lot of ifdef and delete[] / frees can be removed. Is this helpful for us?

roiser · 2020-11-10T16:47:50Z

In general I like the approach as it reduces the numbers of ifdefs. I have two questions on the CudaSetUpTearDown struct.

Where is it initialised?

If its initialised it will go out of scope probably only at the very end of the main. But the idea is to also dump the information how long the reset took in the timermap.dump(); (very bottom of the main). Also the numbering would change 9c <-> 9d but that's a minor one.

Not even sure we need to time the cuda reset? @valassi ?

hageboeck · 2020-11-10T17:35:33Z

In general I like the approach as it reduces the numbers of ifdefs. I have two questions on the CudaSetUpTearDown struct.

Where is it initialised?

Directly where it is declared:
https://github.com/hageboeck/madgraph4gpu/blob/7a7b0f27285cb08a0b946cfe149d8523d7e02ae6/examples/gpu/eemumu_AV/SubProcesses/P1_Sigma_sm_epem_mupmum/check.cc#L182

If its initialised it will go out of scope probably only at the very end of the main. But the idea is to also dump the information how long the reset took in the timermap.dump(); (very bottom of the main). Also the numbering would change 9c <-> 9d but that's a minor one.

Very correct. I thought that once you saw it clean up once, you have seen it all.

It can obviously be reverted by

Reverting to explicit manual cleanup.
Putting the cuda part into a block.
By resetting a resource manager, e.g. cudaDeviceManager = nullptr, cudaDeviceManager.reset() or similar.

I did it because it's exactly the same logic as the memory management, and cuda frees after a device reset just lead to a crash. I've grown to love these RAIIs, because you never have to worry about "forgot to clean up" or "what was the cleanup order again?".
Please choose one of the above. Warning, though, that if you want revert to explicit manual we need to kill all device memory before we kill the device. If the "block + indent everything" is OK, that's the easiest way, but it will create ugly diffs.

roiser · 2020-11-11T06:42:18Z

In general I like the approach as it reduces the numbers of ifdefs. I have two questions on the CudaSetUpTearDown struct.
Where is it initialised?

Directly where it is declared:
https://github.com/hageboeck/madgraph4gpu/blob/7a7b0f27285cb08a0b946cfe149d8523d7e02ae6/examples/gpu/eemumu_AV/SubProcesses/P1_Sigma_sm_epem_mupmum/check.cc#L182

ah sorry didn't see that, ok

If its initialised it will go out of scope probably only at the very end of the main. But the idea is to also dump the information how long the reset took in the timermap.dump(); (very bottom of the main). Also the numbering would change 9c <-> 9d but that's a minor one.

Very correct. I thought that once you saw it clean up once, you have seen it all.

It can obviously be reverted by
* [ ]  Reverting to explicit manual cleanup.

* [ ]  Putting the cuda part into a block.

* [ ]  By resetting a resource manager, e.g. `cudaDeviceManager = nullptr`, `cudaDeviceManager.reset()` or similar.
I did it because it's exactly the same logic as the memory management, and cuda frees after a device reset just lead to a crash. I've grown to love these RAIIs, because you never have to worry about "forgot to clean up" or "what was the cleanup order again?".
Please choose one of the above. Warning, though, that if you want revert to explicit manual we need to kill all device memory before we kill the device. If the "block + indent everything" is OK, that's the easiest way, but it will create ugly diffs.

I guess another way of solving it would be to put the dump into the destructor of the struct, then it's guaranteed that it will always go last (as long as there is only one such struct).

valassi · 2020-11-11T07:19:19Z

Thanks @hageboeck for requesting a review.

My personal taste: some of these things could be useful, some look like an overhead. And there are a few subtleties to keep in mind.

I would avoid CudaSetUpTearDown, simply because cudaFree(0) is not RAII where cudaFree is initialization and devicereset is termination. You could actually have only the first call, or only the second, or neither. It is not a new/delete. So I would not use that. (PS Now I read your comment that calling cudafree AFTER cudadevicereset causes a crash. I do not know, I think we should understand better what each of these two functions does. I think we initially added cudaFree just as a way to ensure that timing plots have a clearly identifiabl start, nothing more, so this call is definitely optional. About cudaDeviceReset, I added that because I had read that otherwise memory leaks are not correctly reported, but there is no relation to cudaFree. So I am still inclined not to introduce a relation between the two, they are not new/delete).
About the hstMakeUnique, I do not like the fact that CUDA only has cudaMallocHost and C++ only has new. You can ALSO have new in CUDA. The point is that you have a choice whether you want pinned memory or not. In case, I think that hstMakeUnique should use new in both, while if you want cudaMallocHost you could have a second additional hstMakeUniquePinned, for instance.
The destroy generator, I am really not sure. It's cool to also have a deleter, but in the end you are just adding the deletion in the line immediately after create generator, rather than at the end of the file. If you really want to have a 'unique pointer' type of design, somehoe you should encapsulate the create generator call within a unique pointer type of object for a generator instead. As it is now, I prefer to keep create generator and an explicit destroy generator, I find that easier to read and understand.
In general instead I agree with replacing all those explicit free at the end by unique pointers, for the simple array type of objects (whether they are dev or host, and new or cudamalloc or cudamallochost).

So all in all, nice suggestion, but I would keep only the unique pointers for simple arrays, and would make sure there are two separate functions for cudamallochost and new on the host in CUDA.

valassi · 2020-11-11T07:20:10Z

PS Sorry forgot to mention, this has now some conflicts after I merged PR #48

valassi · 2020-11-11T09:33:06Z

Hi both, thanks for the discussion. Just for the record and so I do not forget, @hageboeck pointed out that the problem with adding unique pointers and not adding a struct with a destructor calling cudaDeviceReset is that cudaDeviceReset must be called AFTER all cudaMalloc. A code like

  cudaDeviceReset();
}

would instead first call cudaDeviceReset() and later call the destructors of unique pointers that go out of scope. I am ok to keep Stephan's suggested struct then (with or without cudaFree).

As discussed, this is any case temporary playground code, not production code. Eventually the memory allocations will be in MadGraph methods called by main, rather than by main. Another alternative I could suggest is to restructure check.cc so that it actually contains something like

int MG_main( argc, argv, timermap )
{
  // do everything, including memory allocations
}
int main( argc, argv )
{
    mgOnGpu::TimerMap timermap;
    int status = MG_main( argc, argv, timermap );
    cudaDeviceReset();
    return status;
}

but it is quite ugly. I do not mind, lets keep CudaSetUpTearDown as it is and move on with adding the missing functionality to the rest of the code.

hageboeck · 2020-11-11T13:28:59Z

@roiser @valassi
New version is up. I think I addressed everything we talked about.

valassi

Thanks @hageboeck

The main thing I suggest to change is to remove all appearance of devMakeUnique on c++ rathe than define a no-op one.

This morning we had discussed having hstMakeUnique and hstMakeUniquePinned, which are still not there. Initially I commented that we should have them, but then I took it back. So it's ok to leave these as they are, sorry for some hesitations...

Then the other are minor suggestions, but have a look, eg alphabetically ordered headers, why not memcpy, and add a printout on cudadevicereset.

Thanks! Andrea

…y management.

…ctors

BUG FIX in PR #47: arrays were uninitialised in RAII constructors

…h5#47

hageboeck requested a review from roiser November 10, 2020 15:21

hageboeck self-assigned this Nov 10, 2020

hageboeck requested a review from valassi November 10, 2020 15:21

valassi mentioned this pull request Nov 10, 2020

Add feature to produce common random numbers. #45

Merged

hageboeck force-pushed the unique_ptr branch 2 times, most recently from 3234ebb to c0ee68c Compare November 11, 2020 13:26

valassi requested changes Nov 11, 2020

View reviewed changes

[eemumu_AV] Use unique_ptr and RAII to automate host and device memor…

ae58f23

…y management.

hageboeck force-pushed the unique_ptr branch from c0ee68c to ae58f23 Compare November 12, 2020 14:16

valassi approved these changes Nov 12, 2020

View reviewed changes

hageboeck merged commit 9e2fb6f into madgraph5:master Nov 13, 2020

hageboeck deleted the unique_ptr branch November 13, 2020 13:17

valassi added a commit to valassi/madgraph4gpu that referenced this pull request Nov 29, 2020

BUG FIX in PR madgraph5#47: arrays were uninitialised in RAII constru…

e1c7b7c

…ctors

valassi mentioned this pull request Nov 29, 2020

Bug in unique_ptr: arrays not initialized #73

Closed

valassi added a commit to valassi/madgraph4gpu that referenced this pull request Nov 29, 2020

BUG FIX in PR madgraph5#47: arrays were uninitialised in RAII constru…

4788650

…ctors

valassi added a commit that referenced this pull request Nov 30, 2020

Merge pull request #74 from valassi/issue73

f5f62b0

BUG FIX in PR #47: arrays were uninitialised in RAII constructors

valassi added a commit to valassi/madgraph4gpu that referenced this pull request Feb 24, 2022

[clf] reorder includes as suggested by @hageboeck long ago in madgrap…

f94d0d0

…h5#47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[eemumu_AV] Use unique_ptr and RAII to automate memory management.#47

[eemumu_AV] Use unique_ptr and RAII to automate memory management.#47
hageboeck merged 1 commit into
madgraph5:masterfrom
hageboeck:unique_ptr

hageboeck commented Nov 10, 2020 •

edited

Loading

Uh oh!

roiser commented Nov 10, 2020

Uh oh!

hageboeck commented Nov 10, 2020 •

edited

Loading

Uh oh!

roiser commented Nov 11, 2020

Uh oh!

valassi commented Nov 11, 2020 •

edited

Loading

Uh oh!

valassi commented Nov 11, 2020

Uh oh!

valassi commented Nov 11, 2020

Uh oh!

hageboeck commented Nov 11, 2020

Uh oh!

valassi left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

hageboeck commented Nov 10, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

roiser commented Nov 10, 2020

Uh oh!

hageboeck commented Nov 10, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

roiser commented Nov 11, 2020

Uh oh!

valassi commented Nov 11, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

valassi commented Nov 11, 2020

Uh oh!

valassi commented Nov 11, 2020

Uh oh!

hageboeck commented Nov 11, 2020

Uh oh!

valassi left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

hageboeck commented Nov 10, 2020 •

edited

Loading

hageboeck commented Nov 10, 2020 •

edited

Loading

valassi commented Nov 11, 2020 •

edited

Loading