Skip to content

set DIRACSYSCONFIG env variable to pilot.cfg#122

Merged
fstagni merged 2 commits into
DIRACGrid:masterfrom
fstagni:useDIRACSYSCONFIG
Jan 5, 2021
Merged

set DIRACSYSCONFIG env variable to pilot.cfg#122
fstagni merged 2 commits into
DIRACGrid:masterfrom
fstagni:useDIRACSYSCONFIG

Conversation

@fstagni

@fstagni fstagni commented Dec 3, 2020

Copy link
Copy Markdown
Contributor

No description provided.

@fstagni fstagni marked this pull request as ready for review December 4, 2020 09:44
Comment thread Pilot/dirac-install.py Outdated
lines.extend(['export %s=%s' % (envName, envValue)])
lines.extend(['( echo $%s | grep -q $%s ) || export %s=$%s:$%s' % (
envName, envValue,
envName, envName, envValue)])

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit dubious about this line: I think it could do with a comment explaining what it does.

From what I can infer, I think it has an extra $ before the last %s... It may also fail if the variable already contains a superset of the value (I'm thinking of, for example, adding /bin if the variable already contains /usr/bin although that's probably a rarer case). Will it also add an extra ":" in if the variable isn't set in the first place?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line is not different from others in the same function, e.g.

( echo $PATH | grep -q $DIRACBIN ) || export PATH=$DIRACBIN:$PATH

This would end up adding

( echo $envName | grep -q $envValue ) || export envName=$envName:$envValue

Is it wrong?
Or maybe it should be

( echo $envName | grep -q $envValue ) || export envName=$envValue:$envName

?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, isn't envValue an actual value in this instance rather than another env variable? So it should be:
export envName=$envName:envValue (without the $ before the value).

Although this only works properly in cases where these things are lists of paths and already have a value set (so it's fine for PATH and probably LD_LIBRARY_PATH, but not general to other variables.

I think the original behaviour where it just set the value may be closer to what an end user expects when they ask for a variable to be set.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am removing the changes from this file until we sort out for DIRACGrid/DIRAC#4860 (comment)

Note that right now the changes highlighted here are the same already merged in DIRAC repo's dirac-install.py

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sfayer please review now

@sfayer sfayer left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

@fstagni fstagni merged commit 9390fa2 into DIRACGrid:master Jan 5, 2021
@fstagni

fstagni commented Jan 5, 2021

Copy link
Copy Markdown
Contributor Author

Reverted in #123 because of issues within LHCb jobs.

@fstagni

fstagni commented Jan 5, 2021

Copy link
Copy Markdown
Contributor Author

I will look into this ASAP.

@fstagni

fstagni commented Jan 5, 2021

Copy link
Copy Markdown
Contributor Author

FYI:

The issue is that the DIRAC SiteName is incorrect in the job, but correct in the pilot. E.g. for a job running at LCG.UKI-LT2-IC-HEP.uk

in std.out of the job I get:

2021-01-05 15:27:03 UTC dirac-jobexec INFO: DIRAC JobID 435604113 is running at site DIRAC.Client.uk

...

2021-01-05 15:27:04 UTC dirac-jobexec/TimeLeft WARN: /LocalSite/CPUScalingFactor not defined for site DIRAC.Client.uk
2021-01-05 15:27:04 UTC dirac-jobexec/TimeLeft WARN: /LocalSite/CPUNormalizationFactor not defined for site DIRAC.Client.uk
2021-01-05 15:27:04 UTC dirac-jobexec/TimeLeft WARN: Batch system type for site DIRAC.Client.uk is not currently supported
2021-01-05 15:27:04 UTC dirac-jobexec WARN: No CEQueue in local configuration, looking to find one in CS
2021-01-05 15:27:04 UTC dirac-jobexec/GaudiApplication ERROR: Failure in GaudiApplication execute module
Traceback (most recent call last):
  File "/cvmfs/lhcb.cern.ch/lhcbdirac/v10r0p23/LHCbDIRAC/Workflow/Modules/GaudiApplication.py", line 151, in execute
    prodConfFileName = self.createProdConfFile(stepOutputTypes, histogram, runNumberGauss, firstEventNumberGauss)
  File "/cvmfs/lhcb.cern.ch/lhcbdirac/v10r0p23/LHCbDIRAC/Workflow/Modules/ModuleBase.py", line 1106, in createProdConfFile
    jobMaxCPUTime=86400)
  File "/cvmfs/lhcb.cern.ch/lhcbdirac/v10r0p23/LHCbDIRAC/Workflow/Modules/ModulesUtilities.py", line 132, in getEventsToProduce
    CPUTime = getCPUTime(CPUNormalizationFactor)
  File "/cvmfs/lhcb.cern.ch/lhcbdirac/v10r0p23/DIRAC/WorkloadManagementSystem/Client/CPUNormalization.py", line 219, in getCPUTime
    queueSection = '/Resources/Sites/%s/%s/CEs/%s/Queues' % (siteName.split('.')[0], siteName, gridCE)
AttributeError: 'NoneType' object has no attribute 'split'

Which points to the fact that right now this line is needed:
https://github.com/DIRACGrid/Pilot/blob/master/Pilot/pilotCommands.py#L932

So the issue is probably in the JobAgent.

@sfayer can you remind me what you have done in the GridPP pilot to sort out the initial issue?

@sfayer

sfayer commented Jan 5, 2021

Copy link
Copy Markdown
Member

Hmm, our patch for that issue was just to remove that line (otherwise we run a completely standard vanilla DIRAC pilot):
sfayer@2fd6609

The vanilla DIRAC pilot does have the other patch from a while ago that makes the dirac.cfg file as a symlink to pilot.cfg for the job itself (which I think isn't done in the LHCb case due to the CVMFS based installation), maybe that's why it works for us?

@fstagni

fstagni commented Jan 6, 2021

Copy link
Copy Markdown
Contributor Author

I studied the situation, and I can conclude that:

  • yes, it works for you (at least you don't see dramatic errors) because you don't use a CVMFS installation. This is anyway something (the CVMFS installation) that I guess will come for everybody, one day.
  • Anyway, by removing that famous line above, there are issues that you didn't notice because either you don't use filling mode or (which is basically the same) you don't use the PoolCE because both are hitting on these lines:

https://github.com/DIRACGrid/DIRAC/blob/6eb7c931625330e4aa4207c25a7c42c03c9c1bc3/WorkloadManagementSystem/Agent/JobAgent.py#L154-164

That basically requires the pilot.cfg.

Anyway, I don't see any better option than re-opening DIRACGrid/DIRAC#4836 (I will do it straight away). If anyone wants to give it another try, you are welcome, but for me that's enough.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants