Skip to content

Support CLI uploading & downloading via new API#330

Merged
yarikoptic merged 21 commits into
masterfrom
gh-320
Jan 25, 2021
Merged

Support CLI uploading & downloading via new API#330
yarikoptic merged 21 commits into
masterfrom
gh-320

Conversation

@jwodder
Copy link
Copy Markdown
Contributor

@jwodder jwodder commented Jan 18, 2021

Closes #320.

@yarikoptic This is still in progress, but I'd appreciate feedback on what I have so far.

TODOs:

  • testing: at least a single integration test which would test upload/download cycle. For that we might need to interface another endpoint (if not there yet) to create a new dandiset (within fixture instance), upload sample nwb file(s), download them. Otherwise -- we have no clue if new api higher level interface code works since no lines are covered
  • make linters happy
  • add support for downloading a specific folder (that was disabled)
  • I hardcoded to consider "draft" to be the "latest version" . We should remove that and allow for version to be left unspecified, and then DandiAPIClient code handle it properly (get the latest version, download for it)

Moved two last items into a dedicated issue to follow up not here: #340

@codecov
Copy link
Copy Markdown

codecov Bot commented Jan 18, 2021

Codecov Report

Merging #330 (cb33a88) into master (3749836) will decrease coverage by 0.22%.
The diff coverage is 74.54%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #330      +/-   ##
==========================================
- Coverage   82.57%   82.35%   -0.23%     
==========================================
  Files          55       55              
  Lines        5136     5598     +462     
==========================================
+ Hits         4241     4610     +369     
- Misses        895      988      +93     
Flag Coverage Δ
unittests 82.35% <74.54%> (-0.23%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
dandi/tests/test_utils.py 100.00% <ø> (ø)
dandi/dandiset.py 85.07% <40.00%> (-3.82%) ⬇️
dandi/upload.py 69.48% <66.43%> (-2.12%) ⬇️
dandi/dandiarchive.py 78.57% <76.66%> (-7.92%) ⬇️
dandi/dandiapi.py 84.84% <78.43%> (+2.13%) ⬆️
dandi/tests/fixtures.py 95.62% <94.11%> (+0.34%) ⬆️
dandi/cli/cmd_download.py 96.77% <100.00%> (-0.11%) ⬇️
dandi/consts.py 100.00% <100.00%> (ø)
dandi/girder.py 87.81% <100.00%> (+0.43%) ⬆️
dandi/tests/test_dandiapi.py 100.00% <100.00%> (ø)
... and 10 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3749836...cb33a88. Read the comment docs.

Copy link
Copy Markdown
Member

@yarikoptic yarikoptic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

initial feedback. main point - is to use new AssetMeta metadata record, and add (back) decision on either file to be reuploaded or already in a correct state in the archive.

Comment thread dandi/upload.py Outdated
Comment thread dandi/upload.py Outdated
Comment thread dandi/upload.py Outdated
Comment thread dandi/upload.py Outdated
Comment thread dandi/upload.py Outdated
Comment thread dandi/upload.py
@jwodder jwodder added the minor Increment the minor version when merged label Jan 19, 2021
jwodder and others added 4 commits January 19, 2021 11:47
…or girder URLs

- added gui.dandiarchive.org as the one also serving "dandi-api"
  instance (which it does)
- added ad-hoc schema for download from URLs powering API-based web
  UI: prepend "api+" to the URL seeing in the browser.  This will go
  away when we flip the switch to use that new one by
  default. Meanwhile, while we do not have a dedicated instance running
  (like gui-api.dandiarchive.org) which would allow for easy mapping
  use this "api+"
- web UI also has changed its URLs to not show girder folder IDs, so
  I removed that URL schema handling as well
- removed mentioning of publish.dandiarchive.org which is no longer in service.
- removed handling/testing of being able to download multiple files
  selected in web UI, since it no longer supports that, and if
  re-introduced would likely to be in the new API powered interface
- "server_type" variable before was referring to either URL is girder
  or api based.  Now it refers to actual server to use -- girder or api.
@jwodder
Copy link
Copy Markdown
Contributor Author

jwodder commented Jan 20, 2021

@yarikoptic Other than testing (which I'd honestly rather not be the one to do), is there anything that still needs to be done for this PR?

@yarikoptic
Copy link
Copy Markdown
Member

@yarikoptic Other than testing (which I'd honestly rather not be the one to do), is there anything that still needs to be done for this PR?

Updated PR description with TODO items. And testing (as a unit/integration test; I can then test with our live instance) is exactly what we need most of all I think.

@yarikoptic
Copy link
Copy Markdown
Member

Woohoo, now there is https://gui-beta-dandiarchive-org.netlify.app. Thanks to @dchiquit . Ssi we can replace that adhoc api+ url with this one

@jwodder
Copy link
Copy Markdown
Contributor Author

jwodder commented Jan 21, 2021

@yarikoptic What URL do I use for downloading from the Docker Compose API instance? I tried http://localhost:8000/api/dandisets/000002/versions/draft with & without leading api+, but neither one could be parsed.

@yarikoptic
Copy link
Copy Markdown
Member

I did not add mapping for direct API urls, we have only for "web UI" urls, such as https://gui.dandiarchive.org/#/dandiset/000002/draft (so would become api+https://gui.dandiarchive.org/#/dandiset/000002/draft to "force" saying that it is for the new API backed frontend). In the docker compose API setup, is there a web UI component and is it running on some "unique to that setup" port? I could then add /push support for such urls (replacing overall that api+ with also explicit https://gui-beta-dandiarchive-org.netlify.app/ url). If it is the same port, and we cannot tell between the two possible choices (old web ui or new) then I guess I would need to add a schema to parse our API urls (might not be a bad idea anyways).

@jwodder
Copy link
Copy Markdown
Contributor Author

jwodder commented Jan 21, 2021

@yarikoptic The Docker Compose setup does not appear to have a GUI for the new API.

@dchiquito
Copy link
Copy Markdown
Contributor

It does not, you would have to clone and serve it from https://github.com/dandi/dandiarchive#web-app if you really need a GUI to use the CLI.

Is there a reason the CLI always requires a URL? It doesn't sound difficult to just specify a dandiset ID+version, i.e. cli download 000006 draft or cli download 123456 0.1234.123456.

@yarikoptic
Copy link
Copy Markdown
Member

@dchiquit in my humble attempt to simplify it for the users (just copy/paste a URL or some URI, instead of parsing/splitting to give to cli), and more formalized URIs (to include path) in addition to just DANDI:<id> is being discussed in #183 .

@yarikoptic
Copy link
Copy Markdown
Member

@jwodder - pushed 8ebaea6 which should support direct api calls as well now and react to https://gui-beta-dandiarchive-org.netlify.app/ instead of api+ prefix

Comment thread dandi/upload.py
#
# Upload file
#
yield {"status": "uploading"}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no coverage here suggests that we actually not uploaded any file... so not yet sure how test passes on CI (besides if glob is just wrong) and also for me locally test fails with

___________________________________________________ test_new_upload_download ____________________________________________________
/home/yoh/proj/dandi/dandi-cli/dandi/tests/test_upload.py:261: in test_new_upload_download
    upload(
/home/yoh/proj/dandi/dandi-cli/dandi/upload.py:50: in upload
    return _new_upload(
/home/yoh/proj/dandi/dandi-cli/dandi/upload.py:609: in _new_upload
    ds_identifier = dandiset.identifier
/home/yoh/proj/dandi/dandi-cli/dandi/dandiset.py:107: in identifier
    raise ValueError(
E   ValueError: Found no dandiset.identifier in metadata record: {'sex': ['M'], 'number_of_subjects': 1, 'organism': [{'species': 'mouse'}], 'description': 'experiment_description1', 'publications': ['related_publications1']}

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the course of fixing this, I realized that the code doesn't set the auth token for the new API anywhere. How should that be passed in/obtained?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dchiquit how to login/get token within the test docker compose instance of api server?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yarikoptic I know how to do that; the problem is how to pass the token to the upload function.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we get 401 response if we try to upload without authentication? if so - guard in send_request for receiving 401 and adopt GirderCli's dandi_authenticate for RESTFullAPIClient to do authentication in such cases (if was not authenticated before).
to avoid confusion etc, rename prior uses of DANDI_API_KEY to DANDI_GIRDER_API_KEY and use DANDI_API_KEY for authentication against our API server here.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tests & coverage should be working properly now.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The auth token can be obtained from the web GUI by clicking on the user menu drop down and copying the API Key, same as in girder.

Specifying a Authorization: Token 123567890abcdefg header should be all that is required to use that token when making requests.

* origin/master:
  Add JSON Schema validation to tests
  Functions for validating against JSON Schemas
  Move publish_model_schemata() to dandi/metadata.py
  Fix `AttributeError: 'Resource' object has no attribute 'values'`
  [DATALAD RUNCMD] Swap order of str and AnyUrl to be from specific to generic
  ENH: allow for DeprecationWarning to come from requests_toolbelt, not our problem
  Fetch Django test API token more robustly
@yarikoptic
Copy link
Copy Markdown
Member

FTR: I have pushed a merge of master (so we have readily a version accounting for recent changes to the model etc)

I did leave handling of old style simple "identifier: STR" for now, but later
we need to RF anyways to first check the schemaVersion  and act accordingly.

Related issues filed at dandi-api
- dandi/dandi-archive#63 -- identifier should be populated
- dandi/dandi-archive#64 -- schemaVersion should be populated
Causes dandi download to fail if ran in DANDI_DEVEL=1 mode
@yarikoptic
Copy link
Copy Markdown
Member

pushed some minor tune ups to get going, unfortunately upload seems to be not functioning properly with our "beta" instance. Even having provided API token, on first attempt I got 401, and on any subsequent I am getting 400. Here is a sequence of commands I have used (having created a new dandiset on https://gui-beta-dandiarchive-org.netlify.app/#/dandiset/000010)

dandi download https://gui-beta-dandiarchive-org.netlify.app/#/dandiset/000010
cd 000010
cat dandiset.yaml
echo -e "identifier:\n  propertyID: DANDI\n  value: '000009'\n" >> dandiset.yaml
cp -rpL /home/yoh/proj/dandi/dandisets/000027/sub-RAT123 .
DANDI_DEVEL=1 dandi upload -i dandi-api --devel-debug --validation=ignore

to obtain at the end

requests.exceptions.HTTPError: Error 400 while sending PUT request to https://dandi-api-dandisets-testing.s3.amazonaws.com/000009/draft/sub-RAT123/sub-RAT123.nwb?uploadId=Q_BIUKIBu9k....

so something is not yet working with the real S3 target bucket. attn: @jwodder @dchiquit

@dchiquito
Copy link
Copy Markdown
Contributor

@yarikoptic That's an error between the CLI and S3. Amazon is pretty good about including error messages, can you inspect the payload of that response?

@yarikoptic
Copy link
Copy Markdown
Member

@dchiquit here is what we get in response text (sensorred with XXX)

'<?xml version="1.0" encoding="UTF-8"?>\n<Error><Code>InvalidArgument</Code><Message>Only one auth mechanism allowed; only the X-Amz-Algorithm query parameter, Signature query string parameter or the Authorization header should be specified</Message><ArgumentName>Authorization</ArgumentName><ArgumentValue>token XXX</ArgumentValue><RequestId>XXXX</RequestId><HostId>XXXXXXXXXX=</HostId></Error>

and the culprit here is token XXX -- DandiAPIClient uses the same single requests' Session construct to talk to both API server (where we need to provide that token in the headers) and S3 (where no such token should be provided).
@jwodder -- please decouple communication to S3 from the session used for communication with API server. Such a minimal diff

diff --git a/dandi/dandiapi.py b/dandi/dandiapi.py
index 71482fe..5a68e2d 100644
--- a/dandi/dandiapi.py
+++ b/dandi/dandiapi.py
@@ -413,7 +413,7 @@ class DandiAPIClient(RESTFullAPIClient):
                         part["part_number"],
                         part["size"],
                     )
-                    r = self.put(part["upload_url"], data=chunk, json_resp=False)
+                    r = requests.put(part["upload_url"], data=chunk)
                     bytes_uploaded += len(chunk)
                     yield {
                         "status": "uploading",
@@ -436,7 +436,7 @@ class DandiAPIClient(RESTFullAPIClient):
                     "parts": parts_out,
                 },
             )
-            self.post(resp["complete_url"], data=resp["body"], json_resp=False)
+            requests.post(resp["complete_url"], data=resp["body"])
             self.post(
                 "/uploads/validate/",
                 json={"sha256": filehash, "object_key": object_key},

seems to work but I guess would lack any additional error handling etc.

@jwodder
Copy link
Copy Markdown
Contributor Author

jwodder commented Jan 25, 2021

@yarikoptic S3 requests now use a separate client instance.

@yarikoptic
Copy link
Copy Markdown
Member

that would do it for now... fails seems to be unrelated, so let's proceed!

@yarikoptic yarikoptic merged commit cb33a88 into master Jan 25, 2021
@yarikoptic yarikoptic deleted the gh-320 branch January 25, 2021 22:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

minor Increment the minor version when merged

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Interface new implementation (API server) in dandi-cli top-level interfaces

3 participants