Skip to content

Protect access to TROOT::GetListOfGlobalFunctions in TFormula#109

Merged
smuzaffar merged 1 commit into
cms-sw:cms/1f31574from
Dr15Jones:threadSafetyFixTFormula
Dec 16, 2015
Merged

Protect access to TROOT::GetListOfGlobalFunctions in TFormula#109
smuzaffar merged 1 commit into
cms-sw:cms/1f31574from
Dr15Jones:threadSafetyFixTFormula

Conversation

@Dr15Jones
Copy link
Copy Markdown

Protected all calls to TROOT::GetListOfGlobalFunctions from TFormula
with the proper mutex.

Protected all calls to TROOT::GetListOfGlobalFunctions from TFormula
with the proper mutex.
@cmsbuild
Copy link
Copy Markdown

A new Pull Request was created by @Dr15Jones (Chris Jones) for branch cms/1f31574.

@cmsbuild, @smuzaffar, @Degano, @iahmad-khan, @davidlange6 can you please review it and eventually sign? Thanks.
You can sign-off by replying to this message having '+1' in the first line of your reply.
You can reject by replying to this message having '-1' in the first line of your reply.

external issue cms-sw/cmsdist#2039

@Dr15Jones
Copy link
Copy Markdown
Author

@davidlange6 @smuzaffar @davidlt This should fix most of the crashes seen in CMSSW_8_0_THREADED_X. I'm still investigating the exception problem.

@Dr15Jones
Copy link
Copy Markdown
Author

This was merged into ROOT
root-project#117 (comment)

@davidlt
Copy link
Copy Markdown

davidlt commented Dec 14, 2015

Maybe we should update bump ROOT instead of taking a single commit? We can first bump it in the DEVEL IBs.

@smuzaffar
Copy link
Copy Markdown

@davidlt, yes please bump it in devel IBs (looks like we are far behind in catching up with 6.06 branch).
May be we should move the THREADED IBs to then use the next branch (same as devel IBs)?

@Dr15Jones
Copy link
Copy Markdown
Author

My personal feeling is THREADED should exactly match the regular IBs since they are needed to test exactly what we will be using. Instead it would be better to have the DEVEL IBs always run their RelVal tests using multiple threads.

@smuzaffar
Copy link
Copy Markdown

For now I am adding these PRs in cmssw and let @davidlt take care of DEVEL

smuzaffar added a commit that referenced this pull request Dec 16, 2015
Protect access to TROOT::GetListOfGlobalFunctions in TFormula
@smuzaffar smuzaffar merged commit 0de26c2 into cms-sw:cms/1f31574 Dec 16, 2015
@davidlt
Copy link
Copy Markdown

davidlt commented Dec 17, 2015

DEVEL runs now on the tip of 6.06 branch. Those two Chris commits are already upstreamed.

If running multi-threaded is now a priority why we don't run the normal RelVals in that mode? E.g. do we run full RelVals in a single threaded or multi-threaded mode?

I do agree that we should switch DEVEL to multi-threaded mode as that could be a good way to battle test ROOT changes for thread safety. The current tip of 6.06 should be safe, don't look like there are significant changes (mostly: spell checking, update notes, cmake, JIRA, etc..)

@davidlange6

@smuzaffar
Copy link
Copy Markdown

thanks @davidlt.
Full RelVals are still using non-threaded mode (single thread) so as normal IBs RelVals.
I agree that running DEVEL in threaded mode is good idea. If no objections then I will update the build script to use threads for THREADED.

@davidlange6
Copy link
Copy Markdown

On Dec 17, 2015, at 11:17 AM, davidlt notifications@github.com wrote:

DEVEL runs now on the tip of 6.06 branch. Those two Chris commits are already upstreamed.

If running multi-threaded is now a priority why we don't run the normal RelVals in that mode? E.g. do we run full RelVals in a single threaded or multi-threaded mode?

do we have a recent measurement of how many more resources it takes? (maybe we don’t care too much..)

David

I do agree that we should switch DEVEL to multi-threaded mode as that could be a good way to battle test ROOT changes for thread safety. The current tip of 6.06 should be safe, don't look like there are significant changes (mostly: spell checking, update notes, cmake, JIRA, etc..)

@davidlange6


Reply to this email directly or view it on GitHub.

@smuzaffar
Copy link
Copy Markdown

threaded relvals takes 6 to 8hours more to run in total, see the times for latest IB below (all ran on 8 core/16GB VM). but we have resources to run threaded for DEVEL IB.

CMSSW_8_0_X_2015-12-16-2300 4of4           3h48
CMSSW_8_0_X_2015-12-16-2300 3of4           3h45
CMSSW_8_0_X_2015-12-16-2300 2of4           3h31
CMSSW_8_0_X_2015-12-16-2300 1of4           3h20

CMSSW_8_0_THREADED_X_2015-12-16-2300 4of4  4h41
CMSSW_8_0_THREADED_X_2015-12-16-2300 3of4  6h45
CMSSW_8_0_THREADED_X_2015-12-16-2300 2of4  6h44
CMSSW_8_0_THREADED_X_2015-12-16-2300 1of4  4h42

@davidlt
Copy link
Copy Markdown

davidlt commented Dec 17, 2015

Why is this? Is this due to inefficiency in multi-threading or we are overcommitting (too big -j value) the machine and start swapping? We have steps in RelVals which can eat up to 6GB or more RSS.

Have you looked at metrics (memory, IO, cpu, etc) for these machines?

@smuzaffar
Copy link
Copy Markdown

you can see it yourself here

https://meter.cern.ch/public/_plugin/kibana/#/dashboard/elasticsearch/Metrics:%20Host
https://cmssdt.cern.ch/jenkins/job/ib-run-relvals/buildTimeTrend

and then query for host name. For example for cmsbuild04 last 6/7 hours should show you the state of this machine when it was running threaded relvals. You will notice that there is some IO wait and CPU utilization is not good. Swap was not used too badly.

while on cmsbuild15 (which ran for normal relvals, single threaded), cpu utilization is very good 90-95%).

note that for multi-threaded mode we run MemoryInGB/4 workflows in parallel (each of them uses 4 threads).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants