Skip to content

Update for new API at https://api.dandiarchive.org/api#283

Merged
yarikoptic merged 12 commits into
masterfrom
gh-277
Dec 16, 2020
Merged

Update for new API at https://api.dandiarchive.org/api#283
yarikoptic merged 12 commits into
masterfrom
gh-277

Conversation

@jwodder
Copy link
Copy Markdown
Contributor

@jwodder jwodder commented Nov 24, 2020

Closes #277.

@jwodder jwodder added the patch Increment the patch version when merged label Nov 24, 2020
@codecov
Copy link
Copy Markdown

codecov Bot commented Nov 24, 2020

Codecov Report

Merging #283 (2ef1d00) into master (178e806) will increase coverage by 0.80%.
The diff coverage is 86.60%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #283      +/-   ##
==========================================
+ Coverage   81.72%   82.52%   +0.80%     
==========================================
  Files          54       55       +1     
  Lines        4979     5054      +75     
==========================================
+ Hits         4069     4171     +102     
+ Misses        910      883      -27     
Flag Coverage Δ
unittests 82.52% <86.60%> (+0.80%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
dandi/dandiapi.py 82.71% <84.09%> (+34.09%) ⬆️
dandi/tests/fixtures.py 95.90% <88.88%> (-0.59%) ⬇️
dandi/consts.py 100.00% <100.00%> (ø)
dandi/tests/test_dandiapi.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 178e806...2ef1d00. Read the comment docs.

@yarikoptic
Copy link
Copy Markdown
Member

Let's add download support so we could have a full of upload/download via dandi-cli.
I am still wondering on either we should aim to have this new support in master (and have just a new "instance" of dandiarchive to switch to use in DANDI_DEVEL mode) or have a dedicated branch. What do you think @jwodder -- a branch or straight in master?

@jwodder
Copy link
Copy Markdown
Contributor Author

jwodder commented Dec 14, 2020

@yarikoptic I think it'd be easier to work on if it were merged into master sooner rather than later.

@jwodder
Copy link
Copy Markdown
Contributor Author

jwodder commented Dec 15, 2020

@yarikoptic Download of what, exactly? Just one asset per method call? Any specific API?

@yarikoptic
Copy link
Copy Markdown
Member

yarikoptic commented Dec 15, 2020

@yarikoptic Download of what, exactly? Just one asset per method call? Any specific API?

download of an asset, e.g. the one you would upload in the test. Yes, just one asset at a time but support multiple:

given a dandiset id, and path (and possibly a version, default to draft) use

  • ​/dandisets​/{version__dandiset__pk}​/versions​/{version__version}​/assets​/paths​/ with path parameter - get a list of assets (UUIDs)
  • and then loop /dandisets/{version__dandiset__pk}/versions/{version__version}/assets/{uuid}/download/

after we have that interface, it should be "integrated" into our current code/logic for upload and download. I will outlined that in a dedicated issue #320

@yarikoptic I think it'd be easier to work on if it were merged into master sooner rather than later.

Ok, let's aim then for master/released version of dandi-cli to be able to support both "old" dandiarchive.org and new api.dandiarchive.org

@jwodder
Copy link
Copy Markdown
Contributor Author

jwodder commented Dec 15, 2020

@dchiquit

  • After uploading an asset with the path testing/simple1.nwb, it's listed in /dandisets/{dandiset_id}/versions/{version}/assets/ but not /dandisets/{dandiset_id}/versions/{version}/assets/paths/, regardless of whether the path query parameter is set to / or testing/; the endpoint is just returning an empty list. I don't think that's right.

  • Is it intentional for /dandisets/{dandiset_id}/versions/{version}/assets/paths/ to return a list as the top-most structure? I would have expected a dict with "results" and "next" keys, like /dandisets/{dandiset_id}/versions/{version}/assets/.

@yarikoptic
Copy link
Copy Markdown
Member

may be you could figure out a fix for https://github.com/dandi/dandi-api @jwodder for the /paths issue before @dchiquit can get back to us? familiarizing with that codebase would be important in the long run.

@dchiquito
Copy link
Copy Markdown
Contributor

That endpoint was patched together to provide data for the Dandiset browser in the web UI, so it wasn't designed very thoughtfully and has some weird inconsistencies.
For one thing, the correct argument to use is path_prefix, not path. I assume it's not paginated because whoever implemented the "file browser" widget didn't want to bother with pages. The swagger page presumes pagination for some reason, I will see if I can disable that.
assets/paths/ assumes that path_prefix is a directory, and will automatically append a trailing slash, even if you pass it an empty string. I set up some assets on the draft of dandiset 000001 to demonstrate:

We never made a design decision as to whether or not paths should start with a root slash or not. We can adjust this if one implementation is better for the CLI.

What exactly are you using the assets/paths/ endpoint for? The assets/ endpoint should work fine for fetching all assets. assets/paths/ only returns strings because that's all the UI needs to render.

@jwodder
Copy link
Copy Markdown
Contributor Author

jwodder commented Dec 15, 2020

@yarikoptic

given a dandiset id, and path (and possibly a version, default to draft)

By "path", do you mean that the download function should take an exact path to a single asset file at a time, or should it also accept paths to folders?

@yarikoptic
Copy link
Copy Markdown
Member

@yarikoptic

given a dandiset id, and path (and possibly a version, default to draft)

By "path", do you mean that the download function should take an exact path to a single asset file at a time, or should it also accept paths to folders?

folders as well. logic I guess should be added to deduce if it was to an asset or a folder: if a single asset returned by API and basename matches the basename of the path in url -- it was a single asset. If multiple assets or not matching -- take basename of the url's path and that would be the folder, unless it was an empty path - so it is the top of the dandiset, use dandiset ID as the folder name

@jwodder
Copy link
Copy Markdown
Contributor Author

jwodder commented Dec 15, 2020

@dchiquit What endpoint would you recommend for looking up the IDs and paths of all assets whose paths have a given prefix? Do I just have to filter /dandisets/{dandiset_id}/versions/{version}/assets/ myself?

@dchiquito
Copy link
Copy Markdown
Contributor

@jwodder Yes, that's probably safer than using /assets/paths/. I will see what I can today to add filtering directly to /assets/.

@dchiquito
Copy link
Copy Markdown
Contributor

@jwodder Looks like we did add some rudimentary filtering, that is the path parameter you were using earlier. It is also present on /assets/, it will filter the results to only include assets whose path begins with the parameter. The filter uses a simple string prefix match, so it does not maintain the illusion of a directory structure based on the path: Filtering on /foo would return both /foo/bar.txt and foo/fiz/buz.txt.

https://api.dandiarchive.org/api/dandisets/000001/versions/draft/assets/?path=/ should return two assets, /root-slash/file.txt and /root-slash/foo/bar/test.txt.

@jwodder
Copy link
Copy Markdown
Contributor Author

jwodder commented Dec 16, 2020

@yarikoptic

given a dandiset id, and path (and possibly a version, default to draft)

By "path", do you mean that the download function should take an exact path to a single asset file at a time, or should it also accept paths to folders?

folders as well. logic I guess should be added to deduce if it was to an asset or a folder: if a single asset returned by API and basename matches the basename of the path in url -- it was a single asset. If multiple assets or not matching -- take basename of the url's path and that would be the folder, unless it was an empty path - so it is the top of the dandiset, use dandiset ID as the folder name

Forcing the user to use folder names they didn't ask for seems too high-level, like something that should be implemented on top of the basic download methods. Counterproposal: There are two download methods:

  • download_asset(asset_filename, filepath, ...) downloads a single asset and saves it at filepath. If asset_filepath is not the name of a single asset, it errors.
  • download_asset_directory(asset_dirname, dirpath, ...) downloads a directory of assets and saves them in dirpath. Each asset is saved to a subpath of dirpath calculated by stripping off asset_dirname.

@satra
Copy link
Copy Markdown
Member

satra commented Dec 16, 2020

since asset paths are always in the namespace of the dandiset, would the following be useful:

<root>/dandi-<id>/<version>/<path>

and the function would construct something like this

basedir = basedir or os.getcwd()
namespace = "dandi"
Path(basedir) / f"{namespace}-{dandiset} / f"{version}" / assetpath

assetpath should always be a relative path so no leading path separator

and when we transition to personal dandi infrastructure, the namespace could change. we could make version optional, but that could result in conflicts on a file system.

@jwodder
Copy link
Copy Markdown
Contributor Author

jwodder commented Dec 16, 2020

@satra Are you suggesting this for the path at which to download an asset? I believe that the basic download method should be low-level, allowing the user to specify whatever complete target filepath they want.

@satra
Copy link
Copy Markdown
Member

satra commented Dec 16, 2020

Are you suggesting this for the path at which to download an asset?

yes

my suggestion is primarily about a reasonable default. we could be returning a list of assets as the result of a search. in that scenario this could be a default local layout. for a single asset, indeed a user should be able to override the location to store the asset, but the default location could be something like the one i listed.

@yarikoptic
Copy link
Copy Markdown
Member

Forcing the user to use folder names they didn't ask for seems too high-level, like something that should be implemented on top of the basic download methods. Counterproposal: There are two download methods:

  • download_asset(asset_filename, filepath, ...) downloads a single asset and saves it at filepath. If asset_filepath is not the name of a single asset, it errors.

I think we would want download_asset(asset_uuid, filepath, ...) as the basic download mechanism, and then download_asset_bypath(asset_path, filepath, ...) (asset_path not just asset_filename, but path as in the archive)

  • download_asset_directory(asset_dirname, dirpath, ...) downloads a directory of assets and saves them in dirpath. Each asset is saved to a subpath of dirpath calculated by stripping off asset_dirname.

ok, just make it plural, and again "path" not "name" since IMHO "name" is more of a "basename"? i.e. download_assets_directory(assets_dirpath, dirpath, ...)

@yarikoptic
Copy link
Copy Markdown
Member

basedir = basedir or os.getcwd()
namespace = "dandi"
Path(basedir) / f"{namespace}-{dandiset} / f"{version}" / assetpath

assetpath should always be a relative path so no leading path separator

and when we transition to personal dandi infrastructure, the namespace could change. we could make version optional, but that could result in conflicts on a file system.

I would prefer to avoid custom prefixes, like dandi- since we do not have them in the archive, and the same dandiset could be (eventually) present in multiple archives. I would just keep it an id. Anyways -- ATM we better just aim for parity with current download behavior, while just also supporting new API: what directory etc it would create/layout into should be a separate issue(s), such as #153 and #140 .

@satra
Copy link
Copy Markdown
Member

satra commented Dec 16, 2020

I would prefer to avoid custom prefixes, like dandi- since we do not have them in the archive, and the same dandiset could be (eventually) present in multiple archives.

the archive does not have a root location for a dandiset only an identifier <namespace_prefix>:<number> (e.g., DANDI:000003) so this would actually be reflecting that. the only change was to replace : with - to make it OS neutral.

@yarikoptic
Copy link
Copy Markdown
Member

Filed a dedicated issue #321 for that to not derail discussion here (where goal is "feature parity" ATM)

@jwodder
Copy link
Copy Markdown
Contributor Author

jwodder commented Dec 16, 2020

@yarikoptic This PR should be done now.

@yarikoptic
Copy link
Copy Markdown
Member

Looks good, and I did not spot any change which could impact operation of the client while talking with current girder-based instance, so should be good to go. Thank you @jwodder !

@yarikoptic yarikoptic merged commit 4a2b5d9 into master Dec 16, 2020
@yarikoptic yarikoptic deleted the gh-277 branch December 16, 2020 17:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

patch Increment the patch version when merged

Projects

None yet

Development

Successfully merging this pull request may close these issues.

RF DandiAPIClient et al to use new WiP API

4 participants