Skip to content

Update ndaysnapse to be 'aware' of location for replicated data #26

@obenshaindw

Description

@obenshaindw

Data submitted to NDA through the standard data submission endpoint (NOT BSMN-S3) are distributed across 5 buckets: gpop, NDAR_Central_1, NDAR_Central_2, NDAR_Central_3, and NDAR_Central_4. Making requests of the submission API (https://nda.nih.gov/api/submission/docs/swagger-ui.html) will return these locations for any files related to a submission.

The following python functions manipulate the URL returned by the submission service and return a dictionary with the bucket and key for objects in NDAR_Central_* and nda-bsmn locations, which can be passed as arguments to boto functions for working with the S3 API.

    def ndar_central_location(self, file):
        bucket, key = (file['file_remote_path']
                       .split('//')[1]
                       .split('/', 1))
        return {'Bucket': bucket, 'Key': key}

    def nda_bsmn_location(self, file):
        original_key = (file['file_remote_path']
                        .split('//')[1]
                        .split('/', 1)[1]
                        ('ndar_data/DataSubmissions', 'submission_{}/ndar_data/DataSubmissions'.format(self.submission_id)))
        nda_bsmn_key = 'collection_{}/{}'.format(self.collection_id, original_key)
        return {'Bucket': 'nda-bsmn', 'Key': nda_bsmn_key}

These functions are included in an update to the NDASubmissionFiles class, and the file argument each accepts is from the list returned from /api/submission/submission_id/files. That response is used as an initialization argument to NDASubmissionFiles class.

            files = []
            request = requests.get(
                self.submission_api + '/{}/files'.format(s),
                headers=self.headers,
                auth=self.auth
            )
            try:
                files = json.loads(request.text)
                submission_files.append({'files': NDASubmissionFiles(files, collection_id, s),
                                         'collection_id': collection_id,
                                         'submission_id': s})
            except json.decoder.JSONDecodeError:
                print('Error occurred retrieving files from submission {}'.format(s))
                print('Request returned {}'.format(request.text))

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions