The Site Director is responsible for generating and submitting pilot wrappers to various Computing Element communication interfaces, and deleting them afterwards.
CE interfaces may modify pilot wrappers - e.g. to bundle a proxy - before submitting them to a Computing Element or a LRMS, they have the responsibility of removing the modified pilot wrappers afterwards.
Nevertheless, some of the CE interfaces that we use need to keep the pilot wrappers even after the submission step: it might take time before being actually uploaded into the LRMS (HTCondor), or sometimes it might be used as an input of another program and need to be present until the end of the execution (LocalCE + ParallelLibrary)
Currently in these cases, the Site Director gives the responsibility of cleaning the executables to the CE interfaces as you can see here, but these ones are simple and do not store information about the submissions done and, thus, have to perform find calls to delete old files based on a date e.g. HTCondorCE.
To avoid that, I propose a new approach:
- The Site Director keeps the responsability of removing the executable files that it generated, always.
- The CE interfaces that need to keep executable files longer returns this information in a dictionary after submitting the job such as:
def submitJob(...):
...
# Existing code
result = S_OK(jobIDs)
result['PilotStampDict'] = stampDict
# What we could add
result['ExecutableToKeep'] = executablePath
return result
They can also return an executable that they have modified, so they basically say to the caller: "I cannot manage the executable I just created, sorry, here it is if you want to do something with it".
- The Site Director, after submitting a pilot wrapper, checks whether the executable can be removed immediately:
submitResult = ce.submitJob(executable, '', pilotSubmissionChunk)
if not 'ExecutableToKeep' in submitResult or submitResult['ExecutableToKeep'] != executable:
os.unlink(executable)
...
# maybe after registering the pilots in the PilotAgentsDB
# if there is an executable to keep and that has not been changed by the CE interface, we store it
if 'ExecutableToKeep' in submitResult:
self._storeExecutable(submitResult)
- The
SiteDirector._storeExecutable() would add a new entry in a new table named PilotExecutable or PilotWrapper defined in PilotAgents.sql such as:
CREATE TABLE `PilotWrapper` (
`PilotID` INT(11) UNSIGNED NOT NULL,
`ExecutablePath` VARCHAR(255) NOT NULL DEFAULT 'Unknown',
`Deleted` ENUM('True','False') NOT NULL DEFAULT 'False',
PRIMARY KEY (`PilotID`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
- When monitoring the pilots, the Site Director could then check in this table, and find all the executable paths that do not belong to pilots that still run, and delete them, with a method in
PilotAgentsDB.p that would perform:
SELECT ExecutablePath
FROM PilotWrapper
WHERE ExecutablePath.Deleted != 'False'
AND CE = <CE>
AND Queue = <Queue>
AND ExecutablePath not in (SELECT ExecutablePath
FROM PilotWrapper, PilotAgents
WHERE PilotWrapper.PilotID = PilotAgents.PilotID
AND PilotAgents.Status == <unfinished_status>)
What do you think of this solution? @andresailer
The Site Director is responsible for generating and submitting pilot wrappers to various Computing Element communication interfaces, and deleting them afterwards.
CE interfaces may modify pilot wrappers - e.g. to bundle a proxy - before submitting them to a Computing Element or a LRMS, they have the responsibility of removing the modified pilot wrappers afterwards.
Nevertheless, some of the CE interfaces that we use need to keep the pilot wrappers even after the submission step: it might take time before being actually uploaded into the LRMS (HTCondor), or sometimes it might be used as an input of another program and need to be present until the end of the execution (LocalCE + ParallelLibrary)
Currently in these cases, the Site Director gives the responsibility of cleaning the executables to the CE interfaces as you can see here, but these ones are simple and do not store information about the submissions done and, thus, have to perform
findcalls to delete old files based on a date e.g. HTCondorCE.To avoid that, I propose a new approach:
They can also return an executable that they have modified, so they basically say to the caller: "I cannot manage the executable I just created, sorry, here it is if you want to do something with it".
SiteDirector._storeExecutable()would add a new entry in a new table namedPilotExecutableorPilotWrapperdefined inPilotAgents.sqlsuch as:PilotAgentsDB.pthat would perform:What do you think of this solution? @andresailer