Protect access to TROOT::GetListOfGlobalFunctions in TFormula#109
Conversation
Protected all calls to TROOT::GetListOfGlobalFunctions from TFormula with the proper mutex.
|
A new Pull Request was created by @Dr15Jones (Chris Jones) for branch cms/1f31574. @cmsbuild, @smuzaffar, @Degano, @iahmad-khan, @davidlange6 can you please review it and eventually sign? Thanks. external issue cms-sw/cmsdist#2039 |
|
@davidlange6 @smuzaffar @davidlt This should fix most of the crashes seen in CMSSW_8_0_THREADED_X. I'm still investigating the exception problem. |
|
This was merged into ROOT |
|
Maybe we should update bump ROOT instead of taking a single commit? We can first bump it in the DEVEL IBs. |
|
@davidlt, yes please bump it in devel IBs (looks like we are far behind in catching up with 6.06 branch). |
|
My personal feeling is THREADED should exactly match the regular IBs since they are needed to test exactly what we will be using. Instead it would be better to have the DEVEL IBs always run their RelVal tests using multiple threads. |
|
For now I am adding these PRs in cmssw and let @davidlt take care of DEVEL |
Protect access to TROOT::GetListOfGlobalFunctions in TFormula
|
DEVEL runs now on the tip of 6.06 branch. Those two Chris commits are already upstreamed. If running multi-threaded is now a priority why we don't run the normal RelVals in that mode? E.g. do we run full RelVals in a single threaded or multi-threaded mode? I do agree that we should switch DEVEL to multi-threaded mode as that could be a good way to battle test ROOT changes for thread safety. The current tip of 6.06 should be safe, don't look like there are significant changes (mostly: spell checking, update notes, cmake, JIRA, etc..) |
|
thanks @davidlt. |
do we have a recent measurement of how many more resources it takes? (maybe we don’t care too much..) David
|
|
threaded relvals takes 6 to 8hours more to run in total, see the times for latest IB below (all ran on 8 core/16GB VM). but we have resources to run threaded for DEVEL IB. |
|
Why is this? Is this due to inefficiency in multi-threading or we are overcommitting (too big -j value) the machine and start swapping? We have steps in RelVals which can eat up to 6GB or more RSS. Have you looked at metrics (memory, IO, cpu, etc) for these machines? |
|
you can see it yourself here and then query for host name. For example for cmsbuild04 last 6/7 hours should show you the state of this machine when it was running threaded relvals. You will notice that there is some IO wait and CPU utilization is not good. Swap was not used too badly. while on cmsbuild15 (which ran for normal relvals, single threaded), cpu utilization is very good 90-95%). note that for multi-threaded mode we run MemoryInGB/4 workflows in parallel (each of them uses 4 threads). |
Protected all calls to TROOT::GetListOfGlobalFunctions from TFormula
with the proper mutex.