Ngwmn by ldecicco-USGS · Pull Request #904 · DOI-USGS/dataRetrieval

ldecicco-USGS · 2026-06-11T17:16:55Z

Initial PR for adding the new NGWMN functions:

https://api.waterdata.usgs.gov/ngwmn/ogcapi/collections

Merge branch 'main' of github.com:DOI-USGS/dataRetrieval into ngwmn # Conflicts: # R/readNGWMNdata.R

Merge branch 'develop' of https://code.usgs.gov/water/dataRetrieval into ngwmn # Conflicts: # R/construct_api_requests.R # R/deal_with_empty.R # R/get_ogc_documentation.R # R/readNGWMNdata.R # R/walk_pages.R

Merge branch 'develop' of github.com:DOI-USGS/dataRetrieval into ngwmn # Conflicts: # R/sysdata.rda

…nto ngwmn

Merge branch 'develop' of github.com:DOI-USGS/dataRetrieval into ngwmn # Conflicts: # R/construct_api_requests.R

…nto ngwmn

ehinman · 2026-06-12T21:33:05Z

+  no_paging = getOption("dataRetrieval.no_paging"),
+  chunk_size = getOption("dataRetrieval.site_chunk_size_data"),
+  limit = getOption("dataRetrieval.limit"),
+  attach_request = getOption("dataRetrieval.attach_request")


These make it so that you don't have to retype the exact same default over and over?

yeah, you could either put this in a script or in your .Renviorn file:

options(dataRetrieval.attach_request = FALSE)

If it's in something like the .Renviorn file, it'll stay that way until you change it (so if you really don't like the request attached as an attribute)

ehinman · 2026-06-12T21:38:07Z

+#' box are selected.The bounding box is provided as four or six numbers, depending
+#' on whether the coordinate reference system includes a vertical axis (height or
+#' depth). Coordinates are assumed to be in crs 4326. The expected format is a numeric
+#' vector structured: c(xmin,ymin,xmax,ymax). Another way to think of it is c(Western-most longitude,


If I wanted to include a vertical axis with height and depth, what would that look like in a vector?

ehinman · 2026-06-12T21:53:05Z

+#' @description `r get_description("constructionObs", base = "NGWMN")`
+#'
+#' @export
+#' @param monitoring_location_id


With the exceptions of the sites and providers services, the swagger doc says that monitoring location id is required: https://api.waterdata.usgs.gov/ngwmn/ogcapi/openapi?f=html#/waterLevelObs

Is that noted anywhere?

Maybe it's obvious.

Added to the docs, and added a check in the code.

Ports the NGWMN functions from the R dataRetrieval PR (DOI-USGS/dataRetrieval#904) and, per review, refactors the Water Data OGC machinery into a shared engine so NGWMN and Water Data are sibling layers on top of it rather than NGWMN depending on Water Data. Architecture ------------ dataretrieval/ogc/ generic OGC engine (no API-specific config): chunking.py (moved from waterdata/) the multi-value chunker filters.py (moved) cql-text filter splitting progress.py (moved from waterdata/_progress.py) engine.py request build, paginate, parse, finalize, the chunked get_ogc_data entry point, arg handling dataretrieval/waterdata/ thin Water Data layer on the engine: utils.py service->id map, stats API path, profile checks, WATERDATA_DIALECT, and a get_ogc_data wrapper that injects the Water Data defaults (re-exports engine symbols so api.py/ratings.py are unchanged) dataretrieval/ngwmn.py sibling module: get_sites, get_water_level, get_lithology, get_well_construction, get_providers — imports the engine from dataretrieval.ogc only The engine is API-agnostic: `get_ogc_data(args, service, output_id, *, base_url, extra_id_cols, dialect)`. An `OgcDialect(cql2_services, date_only_services)` (threaded via a context variable, like the base-url context) carries the per-API quirks — Water Data POSTs CQL2 for monitoring-locations and renders `daily` time args date-only; NGWMN needs neither. `ogc.engine` and `dataretrieval.ngwmn` both import with zero `dataretrieval.waterdata` dependency. NGWMN response-shape fixes in the engine (the NGWMN API differs from the main one): key the empty-result short-circuit off `features` rather than the `numberReturned` NGWMN omits; and tolerate observation features that carry no `geometry` key. PEP naming: the engine now snake_cases any non-snake column in finalize, so the package always returns PEP-8 column names regardless of the upstream API (a no-op today since both APIs are already snake_case, but enforced). Tests: live NGWMN tests for all five getters (tests/ngwmn_test.py); a `_to_snake_case` unit test; mock.patch sites repointed to ogc.engine; a module-level fixture activates WATERDATA_DIALECT for the direct _construct_api_requests unit tests. 285 unit tests pass; mypy --strict and ruff clean. waterdata_test.py shows only the 3 known pre-existing live-API drift failures (fixed by DOI-USGS#323), unrelated to this change. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Port the NGWMN functions from the R `dataRetrieval` package (DOI-USGS/dataRetrieval#904) and refactor the Water Data OGC machinery into a generic, API-agnostic engine, so NGWMN and Water Data are sibling layers on top of it -- NGWMN does not depend on Water Data. dataretrieval/ogc/ generic OGC engine (no service-specific config) engine.py request build, pagination, parse/finalize, get_ogc_data chunking.py URL-byte multi-value chunker filters.py CQL `filter` splitting progress.py self-updating status line The engine is parameterized by an `OgcDialect` and a base-url context variable rather than branching on service names: Water Data POSTs CQL2 for `monitoring-locations` and renders `daily` time args date-only; NGWMN needs neither. Adding a sibling API is a new dialect + base URL, not engine edits. dataretrieval/ngwmn.py sibling getters that import only dataretrieval.ogc: get_sites, get_water_level, get_lithology, get_well_construction, get_providers dataretrieval/waterdata/ thin Water Data layer on the engine; the Statistics API lives in its own waterdata/stats.py module. Unified `state` parameter across the modern getters, accepting a full name, a two-letter postal code, or a two-digit ANSI/FIPS code; normalized by codes.states.to_state (50 states + DC, fails fast on a typo) and resolved at the getter layer. The native state_code/state_name parameters remain as an escape hatch. Also: export ChunkInterrupted at the package top level; key the empty-result short-circuit off `features` (NGWMN omits `numberReturned`) and tolerate geometry-less features; always return PEP-8 snake_case columns; and add a pre-commit mypy hook mirroring the CI type-check. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

ldecicco-USGS · 2026-06-15T13:13:31Z

+#' @export
+#' @param monitoring_location_id
+#' `r get_ogc_params("constructionObs", base = "NGWMN")$monitoring_location_id$description`
+#' @param monitoring_location_obs_number


When this line is rendered it says :

"This field is required. Combined site identifier of agency code and site number (format of {agency_code}-{monitoring_location_number}). A list of values can be passed for this field, seperated by commas.\n"

Admittedly I hadn't noticed that so we can put in a check that the argument it's NA

Port the NGWMN functions from the R `dataRetrieval` package (DOI-USGS/dataRetrieval#904) and refactor the Water Data OGC machinery into a generic, API-agnostic engine, so NGWMN and Water Data are sibling layers on top of it -- NGWMN does not depend on Water Data. dataretrieval/ogc/ generic OGC engine (no service-specific config) engine.py request build, pagination, parse/finalize, get_ogc_data chunking.py URL-byte multi-value chunker filters.py CQL `filter` splitting progress.py self-updating status line The engine is parameterized by an `OgcDialect` and a base-url context variable rather than branching on service names: Water Data POSTs CQL2 for `monitoring-locations` and renders `daily` time args date-only; NGWMN needs neither. Adding a sibling API is a new dialect + base URL, not engine edits. dataretrieval/ngwmn.py sibling getters that import only dataretrieval.ogc: get_sites, get_water_level, get_lithology, get_well_construction, get_providers dataretrieval/waterdata/ thin Water Data layer on the engine; the Statistics API lives in its own waterdata/stats.py module. Unified `state` parameter across the modern getters, accepting a full name, a two-letter postal code, or a two-digit ANSI/FIPS code; normalized by codes.states.to_state (50 states + DC, fails fast on a typo) and resolved at the getter layer. The native state_code/state_name parameters remain as an escape hatch. Also: export ChunkInterrupted at the package top level; key the empty-result short-circuit off `features` (NGWMN omits `numberReturned`) and tolerate geometry-less features; always return PEP-8 snake_case columns; and add a pre-commit mypy hook mirroring the CI type-check. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

ehinman · 2026-06-16T13:43:40Z

  pkg.env$status <- "https://www.waterqualitydata.us/wqx3/status/"

  pkg.env$NGWMN <- "https://cida.usgs.gov/ngwmn_cache/sos"
+  # pkg.env$NGWMN <- "https://www.usgs.gov/apps/ngwmn/ngwmn_cache/sos"


ehinman · 2026-06-16T14:03:13Z

+#' Available options are:
+#' `r dataRetrieval:::get_properties_for_docs("lithologyObs", base = "NGWMN")`.
+#' The default (`NA`) will return all columns of the data.
+#' @inheritParams check_arguments_non_api


Minor, but it looks like dots is included and the description is "not used". Then why include as an argument?

Mike convinced me that it's a similar idea as the "tidy design principles":
https://design.tidyverse.org/dots-after-required.html

The idea is that the stuff before and after the dots are kind of on different levels. So in our case, the parameters after the dots are ones that I would expect are very seldom changed, and if the user does change them, they should basically be doing it at their own risk. Also, those are arguments that (basically) are specific to dataRetrieval functionality and not sent to the API. The caveat is "limit" because that is sent to the API...but most dataRetrieval users shouldn't have to worry about adjusting "limit" unless they are dealing with crappy internet or something - so that's why we put it under the dots.

Someone wrote a blog about this too:
https://www.cynkra.com/blog/2026-06-12-dots/
Admittedly, we aren't using it exactly like the tidyverse folks, but kinda like them...

ehinman · 2026-06-16T14:16:35Z

+#'
+#' ngwml_providers2 <- read_ngwmn_providers(state = c("WI", "MN"))
+#'
+#' org_type <- read_ngwmn_providers(organization_type = "NWIS", state = c("WI", "MN"))


The link column shows up as "" if there is no associated link. Seems fine, but I first wondered if all of them were empty and saw a few in CA with actual URLs pointing to provider webpages.

added some states with links, so now we have an example that is mixed.

ehinman · 2026-06-16T14:25:10Z

+#' `r get_ogc_params("sites", base = "NGWMN")$agency_code$description`
+#' @param monitoring_location_number
+#' `r get_ogc_params("sites", base = "NGWMN")$monitoring_location_number$description`
+#' @param altitude


Future enhancement could turn this into a range input, since right now you just give it a number as a character. I tried test <- read_ngwmn_sites(altitude = "100") and test <- read_ngwmn_sites(altitude = "100.25") just for fun and the first returned a few sites. But then looking at something like test <- read_ngwmn_sites(state_name = "Minnesota"), altitudes do go out to the hundredths place and are very specific, so unless you know the exact altitude(s) you want, this input seems pretty useless.

I added read_ngwmn to take custom CQL:

cql <- '{ "op": "between", "args": [ { "property": "water_level_above_navd88_ft" }, [ "100.00", "200.00" ] ] }' wl_data <- read_ngwmn(service = "waterLevelObs", monitoring_location_id = c("USGS-272838082142201", "USGS-404159100494601", "USGS-401216080362703"), CQL = cql)

maybe down the road we can think about ways to add numeric ranges like we have times...

ehinman · 2026-06-16T14:42:58Z

+#' @param water_depth_below_land_surface_ft
+#' `r get_ogc_params("waterLevelObs", base = "NGWMN")$water_depth_below_land_surface_ft$description`
+#' @param water_level_above_site_datum_ft
+#' `r get_ogc_params("waterLevelObs", base = "NGWMN")$water_level_above_site_datum_ft$description`


Same comment as above about someday making these ranges, rather than singular strings.

ehinman · 2026-06-16T14:53:11Z

+#'
+#' @param datetime
+#' `r get_ogc_params("waterLevelObs", base = "NGWMN")$sample_time$descriptiond`
+#' Multiple time_series_ids can be requested as a character vector.


A little confused on this parameter. It's called datetime but it takes timeseries ids? Should it be called sample_time? Even that is confusing with the description...

Whoops! Nope, that's just an artifact of copy/pasting

Co-authored-by: Elise Hinman <121896266+ehinman@users.noreply.github.com>

…gwmn

ehinman

These are working well for me and are straightforward. Added some comments about how some of the input parameters seem a little bit silly without knowledge of how to use CQL2 and/or some custom code in dataRetrieval that allows users to enter ranges -- maybe in a future PR if people would find it useful.

ldecicco-USGS added 16 commits March 5, 2026 11:55

getting ready for NGWMN

758533b

gah

98a441d

get all functions ready

3379d61

Get upstream data

9ee4770

Merge branch 'main' of github.com:DOI-USGS/dataRetrieval into ngwmn # Conflicts: # R/readNGWMNdata.R

from upstream

7d51660

Merge branch 'develop' of https://code.usgs.gov/water/dataRetrieval into ngwmn # Conflicts: # R/construct_api_requests.R # R/deal_with_empty.R # R/get_ogc_documentation.R # R/readNGWMNdata.R # R/walk_pages.R

Updates from develop

72b2aa2

Merge branch 'develop' of github.com:DOI-USGS/dataRetrieval into ngwmn # Conflicts: # R/sysdata.rda

Merge branch 'develop' of https://code.usgs.gov/water/dataRetrieval i…

e32db79

…nto ngwmn

upstream updates

f45e702

Merge branch 'develop' of github.com:DOI-USGS/dataRetrieval into ngwmn # Conflicts: # R/construct_api_requests.R

Merge branch 'develop' of https://code.usgs.gov/water/dataRetrieval i…

8f3ac58

…nto ngwmn

Merge branch 'main' of github.com:DOI-USGS/dataRetrieval into ngwmn

c881609

some updates from API

c2947ba

updates from APIs

a125abf

rebuilding docs

977d870

remove old code

aba54d7

Merge branch 'develop' of github.com:DOI-USGS/dataRetrieval into ngwmn

424b6f9

remove old examples, add new ones

f06fcf5

ldecicco-USGS temporarily deployed to CI_config June 11, 2026 17:17 — with GitHub Actions Inactive

ldecicco-USGS requested a review from ehinman June 11, 2026 17:38

thodson-usgs mentioned this pull request Jun 12, 2026

feat: add NGWMN getters on a shared, API-agnostic OGC engine DOI-USGS/dataretrieval-python#324

Merged

ehinman reviewed Jun 12, 2026

View reviewed changes

Comment thread R/get_ogc_documentation.R Outdated

ehinman reviewed Jun 12, 2026

View reviewed changes

ldecicco-USGS commented Jun 15, 2026

View reviewed changes

ehinman reviewed Jun 16, 2026

View reviewed changes

Update R/get_ogc_documentation.R

c8c6dbd

Co-authored-by: Elise Hinman <121896266+ehinman@users.noreply.github.com>

ldecicco-USGS temporarily deployed to CI_config June 16, 2026 15:06 — with GitHub Actions Inactive

ldecicco-USGS added 2 commits June 16, 2026 10:11

Add check for required arguments

4758026

Merge branch 'ngwmn' of github.com:ldecicco-USGS/dataRetrieval into n…

77df23c

…gwmn

ldecicco-USGS temporarily deployed to CI_config June 16, 2026 15:13 — with GitHub Actions Inactive

ehinman approved these changes Jun 16, 2026

View reviewed changes

fix na bug

cacc4ed

ldecicco-USGS temporarily deployed to CI_config June 23, 2026 13:06 — with GitHub Actions Inactive

Adding read_ngwmn function

7f65eb9

ldecicco-USGS had a problem deploying to CI_config June 23, 2026 16:16 — with GitHub Actions Failure

Add some tests

6c6a2f7

ldecicco-USGS temporarily deployed to CI_config June 23, 2026 16:57 — with GitHub Actions Inactive

add more tests

0e79ee8

ldecicco-USGS temporarily deployed to CI_config June 23, 2026 17:22 — with GitHub Actions Inactive

ldecicco-USGS merged commit aa0ee50 into DOI-USGS:develop Jun 23, 2026
1 check passed

Uh oh!

Conversation

ldecicco-USGS commented Jun 11, 2026

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ehinman Jun 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ehinman left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ehinman Jun 12, 2026 •

edited

Loading