Skip to content

MAAP AWS Notebook#471

Merged
smk0033 merged 3 commits intodevelopfrom
aws-r-access
Feb 13, 2025
Merged

MAAP AWS Notebook#471
smk0033 merged 3 commits intodevelopfrom
aws-r-access

Conversation

@smk0033
Copy link
Contributor

@smk0033 smk0033 commented Feb 3, 2025

@smk0033 smk0033 self-assigned this Feb 3, 2025
@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@review-notebook-app
Copy link

review-notebook-app bot commented Feb 3, 2025

View / edit / reply to this conversation on ReviewNB

wildintellect commented on 2025-02-03T18:15:02Z
----------------------------------------------------------------

remove rgdal

Where are we using reticulate in this example? remove?


smk0033 commented on 2025-02-10T19:23:22Z
----------------------------------------------------------------

I think reticulate is used to pull our username for the private bucket

wildintellect commented on 2025-02-11T18:49:30Z
----------------------------------------------------------------

ok, that's fine.

wildintellect commented on 2025-02-11T18:49:55Z
----------------------------------------------------------------

you still need to remove rgdal though

@review-notebook-app
Copy link

review-notebook-app bot commented Feb 3, 2025

View / edit / reply to this conversation on ReviewNB

wildintellect commented on 2025-02-03T18:15:03Z
----------------------------------------------------------------

show filtering example to say select just the ones ending in tif


@review-notebook-app
Copy link

review-notebook-app bot commented Feb 3, 2025

View / edit / reply to this conversation on ReviewNB

wildintellect commented on 2025-02-03T18:15:04Z
----------------------------------------------------------------

You need to drop rgdal, that package is retired, I'm suprised it's still installed.

Use sf and terra like it says.

https://r-spatial.github.io/sf/reference/gdal_utils.html

sf::gdal_utils("info", ...)


@review-notebook-app
Copy link

review-notebook-app bot commented Feb 3, 2025

View / edit / reply to this conversation on ReviewNB

wildintellect commented on 2025-02-03T18:15:04Z
----------------------------------------------------------------

Hmm, this code will actually work for everyone to read from your bucket, maybe we should be pointing to shared data, and mention that it works the same way for private?

also add drivers as a best practice for speed. maybe we need to show people how to list available drivers and the names too?


@review-notebook-app
Copy link

review-notebook-app bot commented Feb 3, 2025

View / edit / reply to this conversation on ReviewNB

wildintellect commented on 2025-02-03T18:15:05Z
----------------------------------------------------------------

limit the print out with head(vector)


@review-notebook-app
Copy link

review-notebook-app bot commented Feb 3, 2025

View / edit / reply to this conversation on ReviewNB

wildintellect commented on 2025-02-03T18:15:06Z
----------------------------------------------------------------

link to gdal documentation about this.


@review-notebook-app
Copy link

review-notebook-app bot commented Feb 3, 2025

View / edit / reply to this conversation on ReviewNB

wildintellect commented on 2025-02-03T18:15:07Z
----------------------------------------------------------------

Do we mention that streaming is possible, but we're not doing that for simplicity?


@review-notebook-app
Copy link

review-notebook-app bot commented Feb 3, 2025

View / edit / reply to this conversation on ReviewNB

wildintellect commented on 2025-02-03T18:15:07Z
----------------------------------------------------------------

this is bad because it hard codes the number of parts of the path

basename() would get you the last part of the path reliably


chuckwondo commented on 2025-02-03T21:04:45Z
----------------------------------------------------------------

basename is OS-specific, so since s3 object keys are not OS-specific, I recommend this instead:

strsplit(csv_key, "/", fixed = TRUE)[[1]] |> tail(n = 1) 

@review-notebook-app
Copy link

review-notebook-app bot commented Feb 3, 2025

View / edit / reply to this conversation on ReviewNB

wildintellect commented on 2025-02-03T18:15:08Z
----------------------------------------------------------------

once again use head or tail to sample some rows but not print the whole thing. dim can be used to see how many rows/cols there are


Copy link

basename is OS-specific, so since s3 object keys are not OS-specific, I recommend this instead:

strsplit(csv_key, "/", fixed = TRUE)[[1]] |> tail(n=1) 

View entire conversation on ReviewNB

@review-notebook-app
Copy link

review-notebook-app bot commented Feb 3, 2025

View / edit / reply to this conversation on ReviewNB

chuckwondo commented on 2025-02-03T21:06:12Z
----------------------------------------------------------------

Feel free to ignore this, as it's a very minor suggestion in the context of a notebook. As a general practice, I recommend avoiding setting env vars in code. Code should generally only read env vars.

Therefore, I'd recommend this instead (including deleting the Sys.setenv line):

s3 <- paws::s3(region = 'us-west-2')

I'm just guessing at the syntax here though. It's not clear to me from the docs if that's the way to specify only the region, but the docs suggest specifying only the region is allowable: https://paws-r.github.io/developer_guide/credentials/#service-settings


@review-notebook-app
Copy link

review-notebook-app bot commented Feb 3, 2025

View / edit / reply to this conversation on ReviewNB

chuckwondo commented on 2025-02-03T21:06:13Z
----------------------------------------------------------------

Instead of var, how about s3_response?


@review-notebook-app
Copy link

review-notebook-app bot commented Feb 3, 2025

View / edit / reply to this conversation on ReviewNB

chuckwondo commented on 2025-02-03T21:06:14Z
----------------------------------------------------------------

I suggest not using file.path here because file.path is specifically for constructing OS-specific filesystem paths, and that's not what we're constructing.

While I realize that we cannot use Windows for this notebook because it is MAAP-specific, I suggest using good habits anyway, so in this case, the vsis3 path must always use forward slashes, thus you should use a function that does not produce OS-specific values, such as paste:

tiff_path <- paste("/vsis3", bucket, objects[13], sep="/")


Copy link
Contributor Author

smk0033 commented Feb 10, 2025

I think reticulate is used to pull our username for the private bucket


View entire conversation on ReviewNB

@wildintellect
Copy link
Collaborator

@smk0033 isn't the username an ENV variable? Reticulate wouldn't be needed for that. If it specifically uses maap-py that's a different story.

Copy link
Collaborator

ok, that's fine.


View entire conversation on ReviewNB

Copy link
Collaborator

you still need to remove rgdal though


View entire conversation on ReviewNB

@smk0033
Copy link
Contributor Author

smk0033 commented Feb 12, 2025

@wildintellect @chuckwondo thanks for all the comments! It is ready for another review

@@ -0,0 +1,1577 @@
{
Copy link

@chuckwondo chuckwondo Feb 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

paste(username, "CONUSbiohex2020", sep = "/")

Reply via ReviewNB

@@ -0,0 +1,1577 @@
{
Copy link

@chuckwondo chuckwondo Feb 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't necessarily recommend changing this, as it might be too unfamiliar to most readers (for now), but in case you're not familiar with it, you can change function(x) {x$Key} to \(x) x$Key, which is a relatively new anonymous function shorthand.


Reply via ReviewNB

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was not aware of this - thanks!

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies in advance: this was initially a small comment, but the more I thought about things, the more I wrote.

I still see a mix of syntax across cells. Some cells use function(x) {x$Key} (with braces) and some use function(x) x$Key (without braces). I'd like to see consistency. Please either consistently use braces or consistently drop them (of course, only where they can be validly dropped).

Although I would personally drop them around a function consisting solely a single expression, such as here, consistency is more important, so whichever syntax you choose is fine, as long as it's consistently applied.

I'd also suggest naming x more meaningfully, so here's how I suggest reworking the 2 lines in the 2 cells above to make it more readable (notice also that the variable being assigned to is plural since we have a list of values):

s3_response <- s3$list_objects_v2(Bucket = bucket, Prefix = paste(username, "CONUSbiohex2020", sep = "/"))
s3_object_keys <- apply(s3_response$Contents, function(s3_object) s3_object$Key)

Also, please consistently use whitespace by always placing a space after a comma and spaces around named args.

For example, this whitespace formatting is off:

paste(username,"CONUSbiohex2020", sep="/")

it should be like so:

paste(username, "CONUSbiohex2020", sep = "/")

Yes, I realize I'm getting very nit-picky and pedantic about syntax consistency and conventions, but when providing examples for others to use, I think it's important to convey recommended coding practices as best as possible in the context, and at a minimum, show consistency.

For reference, see the Tidyverse style guide, particularly the section on spacing: https://style.tidyverse.org/syntax.html#spacing

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No worries at all, I agree consistency is very important and definitely something I need to be more conscious of when working with code. I'll take another pass at the notebook today and get a new PR open - I do appreciate the thorough reviews!

@@ -0,0 +1,1577 @@
{
Copy link

@chuckwondo chuckwondo Feb 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does stac_endpoint[[2]] get us? You may want to split this out and add a comment, or at least a meaningful var name to help the reader understand what you're pulling out of stac_endpoint. Perhaps something along these lines:

# Element 2 of stac_endpoint is blah, blah, blah, ...
blah_blah_blah = stac_endpoint[[2]]
stac_query <- blah_blah_blah |>
  stac() |>
  stac_search(collections = collection) |>
  get_request()  

Reply via ReviewNB

@@ -0,0 +1,1577 @@
{
Copy link

@chuckwondo chuckwondo Feb 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For consistency, either put curly braces around x$Key like you did in a previous cell, or remove the braces in the previous cell. I recommend removing the earlier braces, rather than adding them here. Either way, just use consistent syntax.


Reply via ReviewNB

Copy link

@chuckwondo chuckwondo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. I have only a few very minor suggestions, but nothing blocking, so I'm approving.

@smk0033
Copy link
Contributor Author

smk0033 commented Feb 13, 2025

Thanks, @chuckwondo! I'll take a look at those suggestions and get it merged!

@smk0033 smk0033 merged commit a287cce into develop Feb 13, 2025
@smk0033 smk0033 deleted the aws-r-access branch February 13, 2025 14:57
@@ -0,0 +1,1588 @@
{
Copy link

@chuckwondo chuckwondo Feb 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: should we say, an, instead of the "R/Python" workspace? Using the makes it sound like there is one and only one specific such workspace, when that's not the case. Rather, the user must create an "R/Python" workspace, no?


Reply via ReviewNB

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Create might be a bit confusing too, MAAP provides an R/Python workspace, Start/Launch might be a better term. For other platforms yes people will need a workspace with R & Python installed.

@@ -0,0 +1,1588 @@
{
Copy link

@chuckwondo chuckwondo Feb 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the clarifying comment, but now that I see the comment without a corresponding variable, as I also suggested, I think the variable would aid readability (note that I also renamed stac_query to stac_items because it is not a query, but rather the items resulting from making a query -- if you had not called get_request, then you would have a query, not results).

I also recommend removing the comment # print the retrieved items, as it does not convey any information beyond what the line of code indicates: print(stac_items)

stac_url <- stac_endpoint[[2]]
stac_items <-
  stac(stac_url) |>
  stac_search(collections = collection) |>
  get_request()

print(stac_items)

Reply via ReviewNB

@@ -0,0 +1,1588 @@
{
Copy link

@chuckwondo chuckwondo Feb 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit-pick: add space around =: sep = "/" (per the style guide I linked in another comment)


Reply via ReviewNB

@@ -0,0 +1,1588 @@
{
Copy link

@chuckwondo chuckwondo Feb 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit-pick: spacing: options = c("X_POSSIBLE_NAMES=lon", "Y_POSSIBLE_NAMES=lat") 


Reply via ReviewNB

@@ -0,0 +1,1588 @@
{
Copy link

@chuckwondo chuckwondo Feb 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

csv_listing <- s3$list_objects_v2(Bucket = bucket, Prefix = "shared/smk0033/csv_ex/")
csv_keys <- sapply(csv_listing$Contents, function(csv_object) csv_object$Key)

# Choose an arbitrary key
csv_key <- csv_keys[4]
csv_key

Reply via ReviewNB

@@ -0,0 +1,1588 @@
{
Copy link

@chuckwondo chuckwondo Feb 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we missing a library("paws") line in the preceding cell?


Reply via ReviewNB

@@ -0,0 +1,1588 @@
{
Copy link

@chuckwondo chuckwondo Feb 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since using strsplit with tail is rather cryptic, how about this instead:

filename <- sub(".*/", "", csv_key)  

Reply via ReviewNB

@@ -0,0 +1,1588 @@
{
Copy link

@chuckwondo chuckwondo Feb 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest dropping the assignment since the result is not used anywhere. We only use the side-effect of the file being written to the filesystem, not the response from the function call.


Reply via ReviewNB

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants