Conversation
|
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
|
View / edit / reply to this conversation on ReviewNB wildintellect commented on 2025-02-03T18:15:02Z remove
Where are we using reticulate in this example? remove? smk0033 commented on 2025-02-10T19:23:22Z I think reticulate is used to pull our username for the private bucket wildintellect commented on 2025-02-11T18:49:30Z ok, that's fine. wildintellect commented on 2025-02-11T18:49:55Z you still need to remove rgdal though |
|
View / edit / reply to this conversation on ReviewNB wildintellect commented on 2025-02-03T18:15:03Z show filtering example to say select just the ones ending in tif |
|
View / edit / reply to this conversation on ReviewNB wildintellect commented on 2025-02-03T18:15:04Z You need to drop rgdal, that package is retired, I'm suprised it's still installed. Use sf and terra like it says.
https://r-spatial.github.io/sf/reference/gdal_utils.html sf::gdal_utils("info", ...) |
|
View / edit / reply to this conversation on ReviewNB wildintellect commented on 2025-02-03T18:15:04Z Hmm, this code will actually work for everyone to read from your bucket, maybe we should be pointing to shared data, and mention that it works the same way for private?
also add |
|
View / edit / reply to this conversation on ReviewNB wildintellect commented on 2025-02-03T18:15:05Z limit the print out with |
|
View / edit / reply to this conversation on ReviewNB wildintellect commented on 2025-02-03T18:15:06Z link to gdal documentation about this. |
|
View / edit / reply to this conversation on ReviewNB wildintellect commented on 2025-02-03T18:15:07Z Do we mention that streaming is possible, but we're not doing that for simplicity? |
|
View / edit / reply to this conversation on ReviewNB wildintellect commented on 2025-02-03T18:15:07Z this is bad because it hard codes the number of parts of the path
chuckwondo commented on 2025-02-03T21:04:45Z
strsplit(csv_key, "/", fixed = TRUE)[[1]] |> tail(n = 1) |
|
View / edit / reply to this conversation on ReviewNB wildintellect commented on 2025-02-03T18:15:08Z once again use head or tail to sample some rows but not print the whole thing. |
|
strsplit(csv_key, "/", fixed = TRUE)[[1]] |> tail(n=1) View entire conversation on ReviewNB |
|
View / edit / reply to this conversation on ReviewNB chuckwondo commented on 2025-02-03T21:06:12Z Feel free to ignore this, as it's a very minor suggestion in the context of a notebook. As a general practice, I recommend avoiding setting env vars in code. Code should generally only read env vars.
Therefore, I'd recommend this instead (including deleting the
s3 <- paws::s3(region = 'us-west-2')
I'm just guessing at the syntax here though. It's not clear to me from the docs if that's the way to specify only the region, but the docs suggest specifying only the region is allowable: https://paws-r.github.io/developer_guide/credentials/#service-settings |
|
View / edit / reply to this conversation on ReviewNB chuckwondo commented on 2025-02-03T21:06:13Z Instead of |
|
View / edit / reply to this conversation on ReviewNB chuckwondo commented on 2025-02-03T21:06:14Z I suggest not using
While I realize that we cannot use Windows for this notebook because it is MAAP-specific, I suggest using good habits anyway, so in this case, the vsis3 path must always use forward slashes, thus you should use a function that does not produce OS-specific values, such as
tiff_path <- paste("/vsis3", bucket, objects[13], sep="/")
|
|
I think reticulate is used to pull our username for the private bucket View entire conversation on ReviewNB |
|
@smk0033 isn't the username an ENV variable? Reticulate wouldn't be needed for that. If it specifically uses maap-py that's a different story. |
|
ok, that's fine. View entire conversation on ReviewNB |
|
you still need to remove rgdal though View entire conversation on ReviewNB |
|
@wildintellect @chuckwondo thanks for all the comments! It is ready for another review |
| @@ -0,0 +1,1577 @@ | |||
| { | |||
There was a problem hiding this comment.
| @@ -0,0 +1,1577 @@ | |||
| { | |||
There was a problem hiding this comment.
I don't necessarily recommend changing this, as it might be too unfamiliar to most readers (for now), but in case you're not familiar with it, you can change function(x) {x$Key} to \(x) x$Key, which is a relatively new anonymous function shorthand.
Reply via ReviewNB
There was a problem hiding this comment.
I was not aware of this - thanks!
There was a problem hiding this comment.
Apologies in advance: this was initially a small comment, but the more I thought about things, the more I wrote.
I still see a mix of syntax across cells. Some cells use function(x) {x$Key} (with braces) and some use function(x) x$Key (without braces). I'd like to see consistency. Please either consistently use braces or consistently drop them (of course, only where they can be validly dropped).
Although I would personally drop them around a function consisting solely a single expression, such as here, consistency is more important, so whichever syntax you choose is fine, as long as it's consistently applied.
I'd also suggest naming x more meaningfully, so here's how I suggest reworking the 2 lines in the 2 cells above to make it more readable (notice also that the variable being assigned to is plural since we have a list of values):
s3_response <- s3$list_objects_v2(Bucket = bucket, Prefix = paste(username, "CONUSbiohex2020", sep = "/")) s3_object_keys <- apply(s3_response$Contents, function(s3_object) s3_object$Key)
Also, please consistently use whitespace by always placing a space after a comma and spaces around named args.
For example, this whitespace formatting is off:
paste(username,"CONUSbiohex2020", sep="/")
it should be like so:
paste(username, "CONUSbiohex2020", sep = "/")
Yes, I realize I'm getting very nit-picky and pedantic about syntax consistency and conventions, but when providing examples for others to use, I think it's important to convey recommended coding practices as best as possible in the context, and at a minimum, show consistency.
For reference, see the Tidyverse style guide, particularly the section on spacing: https://style.tidyverse.org/syntax.html#spacing
There was a problem hiding this comment.
No worries at all, I agree consistency is very important and definitely something I need to be more conscious of when working with code. I'll take another pass at the notebook today and get a new PR open - I do appreciate the thorough reviews!
| @@ -0,0 +1,1577 @@ | |||
| { | |||
There was a problem hiding this comment.
What does stac_endpoint[[2]] get us? You may want to split this out and add a comment, or at least a meaningful var name to help the reader understand what you're pulling out of stac_endpoint. Perhaps something along these lines:
# Element 2 of stac_endpoint is blah, blah, blah, ... blah_blah_blah = stac_endpoint[[2]] stac_query <- blah_blah_blah |> stac() |> stac_search(collections = collection) |> get_request()
Reply via ReviewNB
| @@ -0,0 +1,1577 @@ | |||
| { | |||
There was a problem hiding this comment.
For consistency, either put curly braces around x$Key like you did in a previous cell, or remove the braces in the previous cell. I recommend removing the earlier braces, rather than adding them here. Either way, just use consistent syntax.
Reply via ReviewNB
chuckwondo
left a comment
There was a problem hiding this comment.
Looks good. I have only a few very minor suggestions, but nothing blocking, so I'm approving.
|
Thanks, @chuckwondo! I'll take a look at those suggestions and get it merged! |
| @@ -0,0 +1,1588 @@ | |||
| { | |||
There was a problem hiding this comment.
minor: should we say, an, instead of the "R/Python" workspace? Using the makes it sound like there is one and only one specific such workspace, when that's not the case. Rather, the user must create an "R/Python" workspace, no?
Reply via ReviewNB
There was a problem hiding this comment.
Create might be a bit confusing too, MAAP provides an R/Python workspace, Start/Launch might be a better term. For other platforms yes people will need a workspace with R & Python installed.
| @@ -0,0 +1,1588 @@ | |||
| { | |||
There was a problem hiding this comment.
Thanks for the clarifying comment, but now that I see the comment without a corresponding variable, as I also suggested, I think the variable would aid readability (note that I also renamed stac_query to stac_items because it is not a query, but rather the items resulting from making a query -- if you had not called get_request, then you would have a query, not results).
I also recommend removing the comment # print the retrieved items, as it does not convey any information beyond what the line of code indicates: print(stac_items)
stac_url <- stac_endpoint[[2]] stac_items <- stac(stac_url) |> stac_search(collections = collection) |> get_request() print(stac_items)
Reply via ReviewNB
| @@ -0,0 +1,1588 @@ | |||
| { | |||
There was a problem hiding this comment.
nit-pick: add space around =: sep = "/" (per the style guide I linked in another comment)
Reply via ReviewNB
| @@ -0,0 +1,1588 @@ | |||
| { | |||
There was a problem hiding this comment.
| @@ -0,0 +1,1588 @@ | |||
| { | |||
There was a problem hiding this comment.
csv_listing <- s3$list_objects_v2(Bucket = bucket, Prefix = "shared/smk0033/csv_ex/") csv_keys <- sapply(csv_listing$Contents, function(csv_object) csv_object$Key) # Choose an arbitrary key csv_key <- csv_keys[4] csv_key
Reply via ReviewNB
| @@ -0,0 +1,1588 @@ | |||
| { | |||
There was a problem hiding this comment.
| @@ -0,0 +1,1588 @@ | |||
| { | |||
There was a problem hiding this comment.
since using strsplit with tail is rather cryptic, how about this instead:
filename <- sub(".*/", "", csv_key)
Reply via ReviewNB
| @@ -0,0 +1,1588 @@ | |||
| { | |||
There was a problem hiding this comment.
I suggest dropping the assignment since the result is not used anywhere. We only use the side-effect of the file being written to the filesystem, not the response from the function call.
Reply via ReviewNB
MAAP AWS Notebook for https://github.com/NASA-IMPACT/active-maap-sprint/issues/283