Ngwmn#904
Conversation
Merge branch 'main' of github.com:DOI-USGS/dataRetrieval into ngwmn # Conflicts: # R/readNGWMNdata.R
Merge branch 'develop' of https://code.usgs.gov/water/dataRetrieval into ngwmn # Conflicts: # R/construct_api_requests.R # R/deal_with_empty.R # R/get_ogc_documentation.R # R/readNGWMNdata.R # R/walk_pages.R
Merge branch 'develop' of github.com:DOI-USGS/dataRetrieval into ngwmn # Conflicts: # R/sysdata.rda
Merge branch 'develop' of github.com:DOI-USGS/dataRetrieval into ngwmn # Conflicts: # R/construct_api_requests.R
| no_paging = getOption("dataRetrieval.no_paging"), | ||
| chunk_size = getOption("dataRetrieval.site_chunk_size_data"), | ||
| limit = getOption("dataRetrieval.limit"), | ||
| attach_request = getOption("dataRetrieval.attach_request") |
There was a problem hiding this comment.
These make it so that you don't have to retype the exact same default over and over?
There was a problem hiding this comment.
yeah, you could either put this in a script or in your .Renviorn file:
options(dataRetrieval.attach_request = FALSE)If it's in something like the .Renviorn file, it'll stay that way until you change it (so if you really don't like the request attached as an attribute)
| #' box are selected.The bounding box is provided as four or six numbers, depending | ||
| #' on whether the coordinate reference system includes a vertical axis (height or | ||
| #' depth). Coordinates are assumed to be in crs 4326. The expected format is a numeric | ||
| #' vector structured: c(xmin,ymin,xmax,ymax). Another way to think of it is c(Western-most longitude, |
There was a problem hiding this comment.
If I wanted to include a vertical axis with height and depth, what would that look like in a vector?
| #' @description `r get_description("constructionObs", base = "NGWMN")` | ||
| #' | ||
| #' @export | ||
| #' @param monitoring_location_id |
There was a problem hiding this comment.
With the exceptions of the sites and providers services, the swagger doc says that monitoring location id is required: https://api.waterdata.usgs.gov/ngwmn/ogcapi/openapi?f=html#/waterLevelObs
Is that noted anywhere?
There was a problem hiding this comment.
Added to the docs, and added a check in the code.
Ports the NGWMN functions from the R dataRetrieval PR (DOI-USGS/dataRetrieval#904) and, per review, refactors the Water Data OGC machinery into a shared engine so NGWMN and Water Data are sibling layers on top of it rather than NGWMN depending on Water Data. Architecture ------------ dataretrieval/ogc/ generic OGC engine (no API-specific config): chunking.py (moved from waterdata/) the multi-value chunker filters.py (moved) cql-text filter splitting progress.py (moved from waterdata/_progress.py) engine.py request build, paginate, parse, finalize, the chunked get_ogc_data entry point, arg handling dataretrieval/waterdata/ thin Water Data layer on the engine: utils.py service->id map, stats API path, profile checks, WATERDATA_DIALECT, and a get_ogc_data wrapper that injects the Water Data defaults (re-exports engine symbols so api.py/ratings.py are unchanged) dataretrieval/ngwmn.py sibling module: get_sites, get_water_level, get_lithology, get_well_construction, get_providers — imports the engine from dataretrieval.ogc only The engine is API-agnostic: `get_ogc_data(args, service, output_id, *, base_url, extra_id_cols, dialect)`. An `OgcDialect(cql2_services, date_only_services)` (threaded via a context variable, like the base-url context) carries the per-API quirks — Water Data POSTs CQL2 for monitoring-locations and renders `daily` time args date-only; NGWMN needs neither. `ogc.engine` and `dataretrieval.ngwmn` both import with zero `dataretrieval.waterdata` dependency. NGWMN response-shape fixes in the engine (the NGWMN API differs from the main one): key the empty-result short-circuit off `features` rather than the `numberReturned` NGWMN omits; and tolerate observation features that carry no `geometry` key. PEP naming: the engine now snake_cases any non-snake column in finalize, so the package always returns PEP-8 column names regardless of the upstream API (a no-op today since both APIs are already snake_case, but enforced). Tests: live NGWMN tests for all five getters (tests/ngwmn_test.py); a `_to_snake_case` unit test; mock.patch sites repointed to ogc.engine; a module-level fixture activates WATERDATA_DIALECT for the direct _construct_api_requests unit tests. 285 unit tests pass; mypy --strict and ruff clean. waterdata_test.py shows only the 3 known pre-existing live-API drift failures (fixed by DOI-USGS#323), unrelated to this change. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Ports the NGWMN functions from the R dataRetrieval PR (DOI-USGS/dataRetrieval#904) and, per review, refactors the Water Data OGC machinery into a shared engine so NGWMN and Water Data are sibling layers on top of it rather than NGWMN depending on Water Data. Architecture ------------ dataretrieval/ogc/ generic OGC engine (no API-specific config): chunking.py (moved from waterdata/) the multi-value chunker filters.py (moved) cql-text filter splitting progress.py (moved from waterdata/_progress.py) engine.py request build, paginate, parse, finalize, the chunked get_ogc_data entry point, arg handling dataretrieval/waterdata/ thin Water Data layer on the engine: utils.py service->id map, stats API path, profile checks, WATERDATA_DIALECT, and a get_ogc_data wrapper that injects the Water Data defaults (re-exports engine symbols so api.py/ratings.py are unchanged) dataretrieval/ngwmn.py sibling module: get_sites, get_water_level, get_lithology, get_well_construction, get_providers — imports the engine from dataretrieval.ogc only The engine is API-agnostic: `get_ogc_data(args, service, output_id, *, base_url, extra_id_cols, dialect)`. An `OgcDialect(cql2_services, date_only_services)` (threaded via a context variable, like the base-url context) carries the per-API quirks — Water Data POSTs CQL2 for monitoring-locations and renders `daily` time args date-only; NGWMN needs neither. `ogc.engine` and `dataretrieval.ngwmn` both import with zero `dataretrieval.waterdata` dependency. NGWMN response-shape fixes in the engine (the NGWMN API differs from the main one): key the empty-result short-circuit off `features` rather than the `numberReturned` NGWMN omits; and tolerate observation features that carry no `geometry` key. PEP naming: the engine now snake_cases any non-snake column in finalize, so the package always returns PEP-8 column names regardless of the upstream API (a no-op today since both APIs are already snake_case, but enforced). Tests: live NGWMN tests for all five getters (tests/ngwmn_test.py); a `_to_snake_case` unit test; mock.patch sites repointed to ogc.engine; a module-level fixture activates WATERDATA_DIALECT for the direct _construct_api_requests unit tests. 285 unit tests pass; mypy --strict and ruff clean. waterdata_test.py shows only the 3 known pre-existing live-API drift failures (fixed by DOI-USGS#323), unrelated to this change. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Port the NGWMN functions from the R `dataRetrieval` package (DOI-USGS/dataRetrieval#904) and refactor the Water Data OGC machinery into a generic, API-agnostic engine, so NGWMN and Water Data are sibling layers on top of it -- NGWMN does not depend on Water Data. dataretrieval/ogc/ generic OGC engine (no service-specific config) engine.py request build, pagination, parse/finalize, get_ogc_data chunking.py URL-byte multi-value chunker filters.py CQL `filter` splitting progress.py self-updating status line The engine is parameterized by an `OgcDialect` and a base-url context variable rather than branching on service names: Water Data POSTs CQL2 for `monitoring-locations` and renders `daily` time args date-only; NGWMN needs neither. Adding a sibling API is a new dialect + base URL, not engine edits. dataretrieval/ngwmn.py sibling getters that import only dataretrieval.ogc: get_sites, get_water_level, get_lithology, get_well_construction, get_providers dataretrieval/waterdata/ thin Water Data layer on the engine; the Statistics API lives in its own waterdata/stats.py module. Unified `state` parameter across the modern getters, accepting a full name, a two-letter postal code, or a two-digit ANSI/FIPS code; normalized by codes.states.to_state (50 states + DC, fails fast on a typo) and resolved at the getter layer. The native state_code/state_name parameters remain as an escape hatch. Also: export ChunkInterrupted at the package top level; key the empty-result short-circuit off `features` (NGWMN omits `numberReturned`) and tolerate geometry-less features; always return PEP-8 snake_case columns; and add a pre-commit mypy hook mirroring the CI type-check. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
| #' @export | ||
| #' @param monitoring_location_id | ||
| #' `r get_ogc_params("constructionObs", base = "NGWMN")$monitoring_location_id$description` | ||
| #' @param monitoring_location_obs_number |
There was a problem hiding this comment.
When this line is rendered it says :
"This field is required. Combined site identifier of agency code and site number (format of {agency_code}-{monitoring_location_number}). A list of values can be passed for this field, seperated by commas.\n"
Admittedly I hadn't noticed that so we can put in a check that the argument it's NA
Port the NGWMN functions from the R `dataRetrieval` package (DOI-USGS/dataRetrieval#904) and refactor the Water Data OGC machinery into a generic, API-agnostic engine, so NGWMN and Water Data are sibling layers on top of it -- NGWMN does not depend on Water Data. dataretrieval/ogc/ generic OGC engine (no service-specific config) engine.py request build, pagination, parse/finalize, get_ogc_data chunking.py URL-byte multi-value chunker filters.py CQL `filter` splitting progress.py self-updating status line The engine is parameterized by an `OgcDialect` and a base-url context variable rather than branching on service names: Water Data POSTs CQL2 for `monitoring-locations` and renders `daily` time args date-only; NGWMN needs neither. Adding a sibling API is a new dialect + base URL, not engine edits. dataretrieval/ngwmn.py sibling getters that import only dataretrieval.ogc: get_sites, get_water_level, get_lithology, get_well_construction, get_providers dataretrieval/waterdata/ thin Water Data layer on the engine; the Statistics API lives in its own waterdata/stats.py module. Unified `state` parameter across the modern getters, accepting a full name, a two-letter postal code, or a two-digit ANSI/FIPS code; normalized by codes.states.to_state (50 states + DC, fails fast on a typo) and resolved at the getter layer. The native state_code/state_name parameters remain as an escape hatch. Also: export ChunkInterrupted at the package top level; key the empty-result short-circuit off `features` (NGWMN omits `numberReturned`) and tolerate geometry-less features; always return PEP-8 snake_case columns; and add a pre-commit mypy hook mirroring the CI type-check. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
| pkg.env$status <- "https://www.waterqualitydata.us/wqx3/status/" | ||
|
|
||
| pkg.env$NGWMN <- "https://cida.usgs.gov/ngwmn_cache/sos" | ||
| # pkg.env$NGWMN <- "https://www.usgs.gov/apps/ngwmn/ngwmn_cache/sos" |
| #' Available options are: | ||
| #' `r dataRetrieval:::get_properties_for_docs("lithologyObs", base = "NGWMN")`. | ||
| #' The default (`NA`) will return all columns of the data. | ||
| #' @inheritParams check_arguments_non_api |
There was a problem hiding this comment.
Minor, but it looks like dots is included and the description is "not used". Then why include as an argument?
There was a problem hiding this comment.
Mike convinced me that it's a similar idea as the "tidy design principles":
https://design.tidyverse.org/dots-after-required.html
The idea is that the stuff before and after the dots are kind of on different levels. So in our case, the parameters after the dots are ones that I would expect are very seldom changed, and if the user does change them, they should basically be doing it at their own risk. Also, those are arguments that (basically) are specific to dataRetrieval functionality and not sent to the API. The caveat is "limit" because that is sent to the API...but most dataRetrieval users shouldn't have to worry about adjusting "limit" unless they are dealing with crappy internet or something - so that's why we put it under the dots.
There was a problem hiding this comment.
Someone wrote a blog about this too:
https://www.cynkra.com/blog/2026-06-12-dots/
Admittedly, we aren't using it exactly like the tidyverse folks, but kinda like them...
| #' | ||
| #' ngwml_providers2 <- read_ngwmn_providers(state = c("WI", "MN")) | ||
| #' | ||
| #' org_type <- read_ngwmn_providers(organization_type = "NWIS", state = c("WI", "MN")) |
There was a problem hiding this comment.
The link column shows up as "" if there is no associated link. Seems fine, but I first wondered if all of them were empty and saw a few in CA with actual URLs pointing to provider webpages.
There was a problem hiding this comment.
added some states with links, so now we have an example that is mixed.
| #' `r get_ogc_params("sites", base = "NGWMN")$agency_code$description` | ||
| #' @param monitoring_location_number | ||
| #' `r get_ogc_params("sites", base = "NGWMN")$monitoring_location_number$description` | ||
| #' @param altitude |
There was a problem hiding this comment.
Future enhancement could turn this into a range input, since right now you just give it a number as a character. I tried test <- read_ngwmn_sites(altitude = "100") and test <- read_ngwmn_sites(altitude = "100.25") just for fun and the first returned a few sites. But then looking at something like test <- read_ngwmn_sites(state_name = "Minnesota"), altitudes do go out to the hundredths place and are very specific, so unless you know the exact altitude(s) you want, this input seems pretty useless.
There was a problem hiding this comment.
I added read_ngwmn to take custom CQL:
cql <- '{
"op": "between",
"args": [
{ "property": "water_level_above_navd88_ft" },
[ "100.00", "200.00" ]
]
}'
wl_data <- read_ngwmn(service = "waterLevelObs",
monitoring_location_id = c("USGS-272838082142201",
"USGS-404159100494601",
"USGS-401216080362703"),
CQL = cql)maybe down the road we can think about ways to add numeric ranges like we have times...
| #' @param water_depth_below_land_surface_ft | ||
| #' `r get_ogc_params("waterLevelObs", base = "NGWMN")$water_depth_below_land_surface_ft$description` | ||
| #' @param water_level_above_site_datum_ft | ||
| #' `r get_ogc_params("waterLevelObs", base = "NGWMN")$water_level_above_site_datum_ft$description` |
There was a problem hiding this comment.
Same comment as above about someday making these ranges, rather than singular strings.
| #' | ||
| #' @param datetime | ||
| #' `r get_ogc_params("waterLevelObs", base = "NGWMN")$sample_time$descriptiond` | ||
| #' Multiple time_series_ids can be requested as a character vector. |
There was a problem hiding this comment.
A little confused on this parameter. It's called datetime but it takes timeseries ids? Should it be called sample_time? Even that is confusing with the description...
There was a problem hiding this comment.
Whoops! Nope, that's just an artifact of copy/pasting
Co-authored-by: Elise Hinman <121896266+ehinman@users.noreply.github.com>
ehinman
left a comment
There was a problem hiding this comment.
These are working well for me and are straightforward. Added some comments about how some of the input parameters seem a little bit silly without knowledge of how to use CQL2 and/or some custom code in dataRetrieval that allows users to enter ranges -- maybe in a future PR if people would find it useful.
Initial PR for adding the new NGWMN functions:
https://api.waterdata.usgs.gov/ngwmn/ogcapi/collections