Skip to content

Remove the pilot executables in a deterministic way #5136

Description

@aldbr

The Site Director is responsible for generating and submitting pilot wrappers to various Computing Element communication interfaces, and deleting them afterwards.
CE interfaces may modify pilot wrappers - e.g. to bundle a proxy - before submitting them to a Computing Element or a LRMS, they have the responsibility of removing the modified pilot wrappers afterwards.

Nevertheless, some of the CE interfaces that we use need to keep the pilot wrappers even after the submission step: it might take time before being actually uploaded into the LRMS (HTCondor), or sometimes it might be used as an input of another program and need to be present until the end of the execution (LocalCE + ParallelLibrary)

Currently in these cases, the Site Director gives the responsibility of cleaning the executables to the CE interfaces as you can see here, but these ones are simple and do not store information about the submissions done and, thus, have to perform find calls to delete old files based on a date e.g. HTCondorCE.

To avoid that, I propose a new approach:

  • The Site Director keeps the responsability of removing the executable files that it generated, always.
  • The CE interfaces that need to keep executable files longer returns this information in a dictionary after submitting the job such as:
def submitJob(...):
...
# Existing code
result = S_OK(jobIDs)
result['PilotStampDict'] = stampDict

# What we could add
result['ExecutableToKeep'] = executablePath

return result

They can also return an executable that they have modified, so they basically say to the caller: "I cannot manage the executable I just created, sorry, here it is if you want to do something with it".

  • The Site Director, after submitting a pilot wrapper, checks whether the executable can be removed immediately:
submitResult = ce.submitJob(executable, '', pilotSubmissionChunk)

if not 'ExecutableToKeep' in submitResult or submitResult['ExecutableToKeep'] != executable:
  os.unlink(executable)
...
# maybe after registering the pilots in the PilotAgentsDB
# if there is an executable to keep and that has not been changed by the CE interface, we store it
if 'ExecutableToKeep' in submitResult:
  self._storeExecutable(submitResult)
  • The SiteDirector._storeExecutable() would add a new entry in a new table named PilotExecutable or PilotWrapper defined in PilotAgents.sql such as:
CREATE TABLE `PilotWrapper` (
  `PilotID` INT(11) UNSIGNED NOT NULL,
  `ExecutablePath` VARCHAR(255) NOT NULL DEFAULT 'Unknown',
  `Deleted` ENUM('True','False') NOT NULL DEFAULT 'False',
  PRIMARY KEY (`PilotID`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
  • When monitoring the pilots, the Site Director could then check in this table, and find all the executable paths that do not belong to pilots that still run, and delete them, with a method inPilotAgentsDB.p that would perform:
SELECT ExecutablePath 
FROM PilotWrapper 
WHERE ExecutablePath.Deleted != 'False'
  AND CE = <CE>
  AND Queue = <Queue>
  AND ExecutablePath not in (SELECT ExecutablePath 
                             FROM PilotWrapper, PilotAgents 
                             WHERE PilotWrapper.PilotID = PilotAgents.PilotID
                             AND PilotAgents.Status == <unfinished_status>)

What do you think of this solution? @andresailer

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Fields

No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions