Skip to content

[eemumu_AV] Use unique_ptr and RAII to automate memory management.#47

Merged
hageboeck merged 1 commit into
madgraph5:masterfrom
hageboeck:unique_ptr
Nov 13, 2020
Merged

[eemumu_AV] Use unique_ptr and RAII to automate memory management.#47
hageboeck merged 1 commit into
madgraph5:masterfrom
hageboeck:unique_ptr

Conversation

@hageboeck
Copy link
Copy Markdown
Member

@hageboeck hageboeck commented Nov 10, 2020

I wanted to test if memory / resource management can be automated. With unique_ptr and a few helper functions, a lot of ifdef and delete[] / frees can be removed. Is this helpful for us?

@hageboeck hageboeck requested a review from roiser November 10, 2020 15:21
@hageboeck hageboeck self-assigned this Nov 10, 2020
@hageboeck hageboeck requested a review from valassi November 10, 2020 15:21
@roiser
Copy link
Copy Markdown
Member

roiser commented Nov 10, 2020

In general I like the approach as it reduces the numbers of ifdefs. I have two questions on the CudaSetUpTearDown struct.

Where is it initialised?

If its initialised it will go out of scope probably only at the very end of the main. But the idea is to also dump the information how long the reset took in the timermap.dump(); (very bottom of the main). Also the numbering would change 9c <-> 9d but that's a minor one.

Not even sure we need to time the cuda reset? @valassi ?

@hageboeck
Copy link
Copy Markdown
Member Author

hageboeck commented Nov 10, 2020

In general I like the approach as it reduces the numbers of ifdefs. I have two questions on the CudaSetUpTearDown struct.

Where is it initialised?

Directly where it is declared:
https://github.com/hageboeck/madgraph4gpu/blob/7a7b0f27285cb08a0b946cfe149d8523d7e02ae6/examples/gpu/eemumu_AV/SubProcesses/P1_Sigma_sm_epem_mupmum/check.cc#L182

If its initialised it will go out of scope probably only at the very end of the main. But the idea is to also dump the information how long the reset took in the timermap.dump(); (very bottom of the main). Also the numbering would change 9c <-> 9d but that's a minor one.

Very correct. I thought that once you saw it clean up once, you have seen it all.

It can obviously be reverted by

  • Reverting to explicit manual cleanup.
  • Putting the cuda part into a block.
  • By resetting a resource manager, e.g. cudaDeviceManager = nullptr, cudaDeviceManager.reset() or similar.

I did it because it's exactly the same logic as the memory management, and cuda frees after a device reset just lead to a crash. I've grown to love these RAIIs, because you never have to worry about "forgot to clean up" or "what was the cleanup order again?".
Please choose one of the above. Warning, though, that if you want revert to explicit manual we need to kill all device memory before we kill the device. If the "block + indent everything" is OK, that's the easiest way, but it will create ugly diffs.

@roiser
Copy link
Copy Markdown
Member

roiser commented Nov 11, 2020

In general I like the approach as it reduces the numbers of ifdefs. I have two questions on the CudaSetUpTearDown struct.
Where is it initialised?

Directly where it is declared:
https://github.com/hageboeck/madgraph4gpu/blob/7a7b0f27285cb08a0b946cfe149d8523d7e02ae6/examples/gpu/eemumu_AV/SubProcesses/P1_Sigma_sm_epem_mupmum/check.cc#L182

ah sorry didn't see that, ok

If its initialised it will go out of scope probably only at the very end of the main. But the idea is to also dump the information how long the reset took in the timermap.dump(); (very bottom of the main). Also the numbering would change 9c <-> 9d but that's a minor one.

Very correct. I thought that once you saw it clean up once, you have seen it all.

It can obviously be reverted by

* [ ]  Reverting to explicit manual cleanup.

* [ ]  Putting the cuda part into a block.

* [ ]  By resetting a resource manager, e.g. `cudaDeviceManager = nullptr`, `cudaDeviceManager.reset()` or similar.

I did it because it's exactly the same logic as the memory management, and cuda frees after a device reset just lead to a crash. I've grown to love these RAIIs, because you never have to worry about "forgot to clean up" or "what was the cleanup order again?".
Please choose one of the above. Warning, though, that if you want revert to explicit manual we need to kill all device memory before we kill the device. If the "block + indent everything" is OK, that's the easiest way, but it will create ugly diffs.

I guess another way of solving it would be to put the dump into the destructor of the struct, then it's guaranteed that it will always go last (as long as there is only one such struct).

@valassi
Copy link
Copy Markdown
Member

valassi commented Nov 11, 2020

Thanks @hageboeck for requesting a review.

My personal taste: some of these things could be useful, some look like an overhead. And there are a few subtleties to keep in mind.

  • I would avoid CudaSetUpTearDown, simply because cudaFree(0) is not RAII where cudaFree is initialization and devicereset is termination. You could actually have only the first call, or only the second, or neither. It is not a new/delete. So I would not use that. (PS Now I read your comment that calling cudafree AFTER cudadevicereset causes a crash. I do not know, I think we should understand better what each of these two functions does. I think we initially added cudaFree just as a way to ensure that timing plots have a clearly identifiabl start, nothing more, so this call is definitely optional. About cudaDeviceReset, I added that because I had read that otherwise memory leaks are not correctly reported, but there is no relation to cudaFree. So I am still inclined not to introduce a relation between the two, they are not new/delete).
  • About the hstMakeUnique, I do not like the fact that CUDA only has cudaMallocHost and C++ only has new. You can ALSO have new in CUDA. The point is that you have a choice whether you want pinned memory or not. In case, I think that hstMakeUnique should use new in both, while if you want cudaMallocHost you could have a second additional hstMakeUniquePinned, for instance.
  • The destroy generator, I am really not sure. It's cool to also have a deleter, but in the end you are just adding the deletion in the line immediately after create generator, rather than at the end of the file. If you really want to have a 'unique pointer' type of design, somehoe you should encapsulate the create generator call within a unique pointer type of object for a generator instead. As it is now, I prefer to keep create generator and an explicit destroy generator, I find that easier to read and understand.
  • In general instead I agree with replacing all those explicit free at the end by unique pointers, for the simple array type of objects (whether they are dev or host, and new or cudamalloc or cudamallochost).

So all in all, nice suggestion, but I would keep only the unique pointers for simple arrays, and would make sure there are two separate functions for cudamallochost and new on the host in CUDA.

@valassi
Copy link
Copy Markdown
Member

valassi commented Nov 11, 2020

PS Sorry forgot to mention, this has now some conflicts after I merged PR #48

@valassi
Copy link
Copy Markdown
Member

valassi commented Nov 11, 2020

Hi both, thanks for the discussion. Just for the record and so I do not forget, @hageboeck pointed out that the problem with adding unique pointers and not adding a struct with a destructor calling cudaDeviceReset is that cudaDeviceReset must be called AFTER all cudaMalloc. A code like

  cudaDeviceReset();
}

would instead first call cudaDeviceReset() and later call the destructors of unique pointers that go out of scope. I am ok to keep Stephan's suggested struct then (with or without cudaFree).

As discussed, this is any case temporary playground code, not production code. Eventually the memory allocations will be in MadGraph methods called by main, rather than by main. Another alternative I could suggest is to restructure check.cc so that it actually contains something like

int MG_main( argc, argv, timermap )
{
  // do everything, including memory allocations
}
int main( argc, argv )
{
    mgOnGpu::TimerMap timermap;
    int status = MG_main( argc, argv, timermap );
    cudaDeviceReset();
    return status;
}

but it is quite ugly. I do not mind, lets keep CudaSetUpTearDown as it is and move on with adding the missing functionality to the rest of the code.

@hageboeck hageboeck force-pushed the unique_ptr branch 2 times, most recently from 3234ebb to c0ee68c Compare November 11, 2020 13:26
@hageboeck
Copy link
Copy Markdown
Member Author

@roiser @valassi
New version is up. I think I addressed everything we talked about.

Copy link
Copy Markdown
Member

@valassi valassi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @hageboeck

The main thing I suggest to change is to remove all appearance of devMakeUnique on c++ rathe than define a no-op one.

This morning we had discussed having hstMakeUnique and hstMakeUniquePinned, which are still not there. Initially I commented that we should have them, but then I took it back. So it's ok to leave these as they are, sorry for some hesitations...

Then the other are minor suggestions, but have a look, eg alphabetically ordered headers, why not memcpy, and add a printout on cudadevicereset.

Thanks! Andrea

Comment thread examples/gpu/eemumu_AV/SubProcesses/P1_Sigma_sm_epem_mupmum/check.cc Outdated
Comment thread examples/gpu/eemumu_AV/SubProcesses/P1_Sigma_sm_epem_mupmum/check.cc Outdated
Comment thread examples/gpu/eemumu_AV/SubProcesses/P1_Sigma_sm_epem_mupmum/check.cc Outdated
Comment thread examples/gpu/eemumu_AV/SubProcesses/P1_Sigma_sm_epem_mupmum/check.cc Outdated
Comment thread examples/gpu/eemumu_AV/SubProcesses/P1_Sigma_sm_epem_mupmum/check.cc Outdated
Comment thread examples/gpu/eemumu_AV/SubProcesses/P1_Sigma_sm_epem_mupmum/check.cc Outdated
@hageboeck hageboeck merged commit 9e2fb6f into madgraph5:master Nov 13, 2020
@hageboeck hageboeck deleted the unique_ptr branch November 13, 2020 13:17
valassi added a commit to valassi/madgraph4gpu that referenced this pull request Nov 29, 2020
valassi added a commit to valassi/madgraph4gpu that referenced this pull request Nov 29, 2020
valassi added a commit that referenced this pull request Nov 30, 2020
BUG FIX in PR #47: arrays were uninitialised in RAII constructors
valassi added a commit to valassi/madgraph4gpu that referenced this pull request Feb 24, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants