Skip to content

Ngwmn#904

Merged
ldecicco-USGS merged 23 commits into
DOI-USGS:developfrom
ldecicco-USGS:ngwmn
Jun 23, 2026
Merged

Ngwmn#904
ldecicco-USGS merged 23 commits into
DOI-USGS:developfrom
ldecicco-USGS:ngwmn

Conversation

@ldecicco-USGS

Copy link
Copy Markdown
Collaborator

Initial PR for adding the new NGWMN functions:

https://api.waterdata.usgs.gov/ngwmn/ogcapi/collections

Merge branch 'main' of github.com:DOI-USGS/dataRetrieval into ngwmn

# Conflicts:
#	R/readNGWMNdata.R
Merge branch 'develop' of https://code.usgs.gov/water/dataRetrieval into ngwmn

# Conflicts:
#	R/construct_api_requests.R
#	R/deal_with_empty.R
#	R/get_ogc_documentation.R
#	R/readNGWMNdata.R
#	R/walk_pages.R
Merge branch 'develop' of github.com:DOI-USGS/dataRetrieval into ngwmn

# Conflicts:
#	R/sysdata.rda
Merge branch 'develop' of github.com:DOI-USGS/dataRetrieval into ngwmn

# Conflicts:
#	R/construct_api_requests.R
Comment thread R/get_ogc_documentation.R Outdated
Comment thread R/read_ngwmn_lithology.R
no_paging = getOption("dataRetrieval.no_paging"),
chunk_size = getOption("dataRetrieval.site_chunk_size_data"),
limit = getOption("dataRetrieval.limit"),
attach_request = getOption("dataRetrieval.attach_request")

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These make it so that you don't have to retype the exact same default over and over?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, you could either put this in a script or in your .Renviorn file:

options(dataRetrieval.attach_request = FALSE)

If it's in something like the .Renviorn file, it'll stay that way until you change it (so if you really don't like the request attached as an attribute)

Comment thread R/read_ngwmn_sites.R
#' box are selected.The bounding box is provided as four or six numbers, depending
#' on whether the coordinate reference system includes a vertical axis (height or
#' depth). Coordinates are assumed to be in crs 4326. The expected format is a numeric
#' vector structured: c(xmin,ymin,xmax,ymax). Another way to think of it is c(Western-most longitude,

@ehinman ehinman Jun 12, 2026

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I wanted to include a vertical axis with height and depth, what would that look like in a vector?

#' @description `r get_description("constructionObs", base = "NGWMN")`
#'
#' @export
#' @param monitoring_location_id

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the exceptions of the sites and providers services, the swagger doc says that monitoring location id is required: https://api.waterdata.usgs.gov/ngwmn/ogcapi/openapi?f=html#/waterLevelObs

Is that noted anywhere?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it's obvious.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added to the docs, and added a check in the code.

thodson-usgs added a commit to thodson-usgs/dataretrieval-python that referenced this pull request Jun 12, 2026
Ports the NGWMN functions from the R dataRetrieval PR
(DOI-USGS/dataRetrieval#904) and, per review, refactors the Water Data OGC
machinery into a shared engine so NGWMN and Water Data are sibling layers on
top of it rather than NGWMN depending on Water Data.

Architecture
------------
  dataretrieval/ogc/        generic OGC engine (no API-specific config):
    chunking.py             (moved from waterdata/) the multi-value chunker
    filters.py              (moved) cql-text filter splitting
    progress.py             (moved from waterdata/_progress.py)
    engine.py               request build, paginate, parse, finalize, the
                            chunked get_ogc_data entry point, arg handling
  dataretrieval/waterdata/  thin Water Data layer on the engine:
    utils.py                service->id map, stats API path, profile checks,
                            WATERDATA_DIALECT, and a get_ogc_data wrapper that
                            injects the Water Data defaults (re-exports engine
                            symbols so api.py/ratings.py are unchanged)
  dataretrieval/ngwmn.py    sibling module: get_sites, get_water_level,
                            get_lithology, get_well_construction, get_providers
                            — imports the engine from dataretrieval.ogc only

The engine is API-agnostic: `get_ogc_data(args, service, output_id, *,
base_url, extra_id_cols, dialect)`. An `OgcDialect(cql2_services,
date_only_services)` (threaded via a context variable, like the base-url
context) carries the per-API quirks — Water Data POSTs CQL2 for
monitoring-locations and renders `daily` time args date-only; NGWMN needs
neither. `ogc.engine` and `dataretrieval.ngwmn` both import with zero
`dataretrieval.waterdata` dependency.

NGWMN response-shape fixes in the engine (the NGWMN API differs from the main
one): key the empty-result short-circuit off `features` rather than the
`numberReturned` NGWMN omits; and tolerate observation features that carry no
`geometry` key.

PEP naming: the engine now snake_cases any non-snake column in finalize, so the
package always returns PEP-8 column names regardless of the upstream API
(a no-op today since both APIs are already snake_case, but enforced).

Tests: live NGWMN tests for all five getters (tests/ngwmn_test.py); a
`_to_snake_case` unit test; mock.patch sites repointed to ogc.engine; a
module-level fixture activates WATERDATA_DIALECT for the direct
_construct_api_requests unit tests. 285 unit tests pass; mypy --strict and
ruff clean. waterdata_test.py shows only the 3 known pre-existing live-API
drift failures (fixed by DOI-USGS#323), unrelated to this change.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
thodson-usgs added a commit to thodson-usgs/dataretrieval-python that referenced this pull request Jun 12, 2026
Ports the NGWMN functions from the R dataRetrieval PR
(DOI-USGS/dataRetrieval#904) and, per review, refactors the Water Data OGC
machinery into a shared engine so NGWMN and Water Data are sibling layers on
top of it rather than NGWMN depending on Water Data.

Architecture
------------
  dataretrieval/ogc/        generic OGC engine (no API-specific config):
    chunking.py             (moved from waterdata/) the multi-value chunker
    filters.py              (moved) cql-text filter splitting
    progress.py             (moved from waterdata/_progress.py)
    engine.py               request build, paginate, parse, finalize, the
                            chunked get_ogc_data entry point, arg handling
  dataretrieval/waterdata/  thin Water Data layer on the engine:
    utils.py                service->id map, stats API path, profile checks,
                            WATERDATA_DIALECT, and a get_ogc_data wrapper that
                            injects the Water Data defaults (re-exports engine
                            symbols so api.py/ratings.py are unchanged)
  dataretrieval/ngwmn.py    sibling module: get_sites, get_water_level,
                            get_lithology, get_well_construction, get_providers
                            — imports the engine from dataretrieval.ogc only

The engine is API-agnostic: `get_ogc_data(args, service, output_id, *,
base_url, extra_id_cols, dialect)`. An `OgcDialect(cql2_services,
date_only_services)` (threaded via a context variable, like the base-url
context) carries the per-API quirks — Water Data POSTs CQL2 for
monitoring-locations and renders `daily` time args date-only; NGWMN needs
neither. `ogc.engine` and `dataretrieval.ngwmn` both import with zero
`dataretrieval.waterdata` dependency.

NGWMN response-shape fixes in the engine (the NGWMN API differs from the main
one): key the empty-result short-circuit off `features` rather than the
`numberReturned` NGWMN omits; and tolerate observation features that carry no
`geometry` key.

PEP naming: the engine now snake_cases any non-snake column in finalize, so the
package always returns PEP-8 column names regardless of the upstream API
(a no-op today since both APIs are already snake_case, but enforced).

Tests: live NGWMN tests for all five getters (tests/ngwmn_test.py); a
`_to_snake_case` unit test; mock.patch sites repointed to ogc.engine; a
module-level fixture activates WATERDATA_DIALECT for the direct
_construct_api_requests unit tests. 285 unit tests pass; mypy --strict and
ruff clean. waterdata_test.py shows only the 3 known pre-existing live-API
drift failures (fixed by DOI-USGS#323), unrelated to this change.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
thodson-usgs added a commit to thodson-usgs/dataretrieval-python that referenced this pull request Jun 14, 2026
Port the NGWMN functions from the R `dataRetrieval` package
(DOI-USGS/dataRetrieval#904) and refactor the Water Data OGC machinery into a
generic, API-agnostic engine, so NGWMN and Water Data are sibling layers on
top of it -- NGWMN does not depend on Water Data.

dataretrieval/ogc/  generic OGC engine (no service-specific config)
  engine.py    request build, pagination, parse/finalize, get_ogc_data
  chunking.py  URL-byte multi-value chunker
  filters.py   CQL `filter` splitting
  progress.py  self-updating status line
The engine is parameterized by an `OgcDialect` and a base-url context
variable rather than branching on service names: Water Data POSTs CQL2 for
`monitoring-locations` and renders `daily` time args date-only; NGWMN needs
neither. Adding a sibling API is a new dialect + base URL, not engine edits.

dataretrieval/ngwmn.py  sibling getters that import only dataretrieval.ogc:
  get_sites, get_water_level, get_lithology, get_well_construction,
  get_providers

dataretrieval/waterdata/  thin Water Data layer on the engine; the Statistics
API lives in its own waterdata/stats.py module.

Unified `state` parameter across the modern getters, accepting a full name, a
two-letter postal code, or a two-digit ANSI/FIPS code; normalized by
codes.states.to_state (50 states + DC, fails fast on a typo) and resolved at
the getter layer. The native state_code/state_name parameters remain as an
escape hatch.

Also: export ChunkInterrupted at the package top level; key the empty-result
short-circuit off `features` (NGWMN omits `numberReturned`) and tolerate
geometry-less features; always return PEP-8 snake_case columns; and add a
pre-commit mypy hook mirroring the CI type-check.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
#' @export
#' @param monitoring_location_id
#' `r get_ogc_params("constructionObs", base = "NGWMN")$monitoring_location_id$description`
#' @param monitoring_location_obs_number

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When this line is rendered it says :

"This field is required. Combined site identifier of agency code and site number (format of {agency_code}-{monitoring_location_number}). A list of values can be passed for this field, seperated by commas.\n"

Admittedly I hadn't noticed that so we can put in a check that the argument it's NA

thodson-usgs added a commit to thodson-usgs/dataretrieval-python that referenced this pull request Jun 15, 2026
Port the NGWMN functions from the R `dataRetrieval` package
(DOI-USGS/dataRetrieval#904) and refactor the Water Data OGC machinery into a
generic, API-agnostic engine, so NGWMN and Water Data are sibling layers on
top of it -- NGWMN does not depend on Water Data.

dataretrieval/ogc/  generic OGC engine (no service-specific config)
  engine.py    request build, pagination, parse/finalize, get_ogc_data
  chunking.py  URL-byte multi-value chunker
  filters.py   CQL `filter` splitting
  progress.py  self-updating status line
The engine is parameterized by an `OgcDialect` and a base-url context
variable rather than branching on service names: Water Data POSTs CQL2 for
`monitoring-locations` and renders `daily` time args date-only; NGWMN needs
neither. Adding a sibling API is a new dialect + base URL, not engine edits.

dataretrieval/ngwmn.py  sibling getters that import only dataretrieval.ogc:
  get_sites, get_water_level, get_lithology, get_well_construction,
  get_providers

dataretrieval/waterdata/  thin Water Data layer on the engine; the Statistics
API lives in its own waterdata/stats.py module.

Unified `state` parameter across the modern getters, accepting a full name, a
two-letter postal code, or a two-digit ANSI/FIPS code; normalized by
codes.states.to_state (50 states + DC, fails fast on a typo) and resolved at
the getter layer. The native state_code/state_name parameters remain as an
escape hatch.

Also: export ChunkInterrupted at the package top level; key the empty-result
short-circuit off `features` (NGWMN omits `numberReturned`) and tolerate
geometry-less features; always return PEP-8 snake_case columns; and add a
pre-commit mypy hook mirroring the CI type-check.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Comment thread R/setAccess.R Outdated
pkg.env$status <- "https://www.waterqualitydata.us/wqx3/status/"

pkg.env$NGWMN <- "https://cida.usgs.gov/ngwmn_cache/sos"
# pkg.env$NGWMN <- "https://www.usgs.gov/apps/ngwmn/ngwmn_cache/sos"

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needed?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope!

Comment thread R/read_ngwmn_lithology.R
#' Available options are:
#' `r dataRetrieval:::get_properties_for_docs("lithologyObs", base = "NGWMN")`.
#' The default (`NA`) will return all columns of the data.
#' @inheritParams check_arguments_non_api

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor, but it looks like dots is included and the description is "not used". Then why include as an argument?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mike convinced me that it's a similar idea as the "tidy design principles":
https://design.tidyverse.org/dots-after-required.html

The idea is that the stuff before and after the dots are kind of on different levels. So in our case, the parameters after the dots are ones that I would expect are very seldom changed, and if the user does change them, they should basically be doing it at their own risk. Also, those are arguments that (basically) are specific to dataRetrieval functionality and not sent to the API. The caveat is "limit" because that is sent to the API...but most dataRetrieval users shouldn't have to worry about adjusting "limit" unless they are dealing with crappy internet or something - so that's why we put it under the dots.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Someone wrote a blog about this too:
https://www.cynkra.com/blog/2026-06-12-dots/
Admittedly, we aren't using it exactly like the tidyverse folks, but kinda like them...

Comment thread R/read_ngwmn_providers.R
#'
#' ngwml_providers2 <- read_ngwmn_providers(state = c("WI", "MN"))
#'
#' org_type <- read_ngwmn_providers(organization_type = "NWIS", state = c("WI", "MN"))

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The link column shows up as "" if there is no associated link. Seems fine, but I first wondered if all of them were empty and saw a few in CA with actual URLs pointing to provider webpages.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added some states with links, so now we have an example that is mixed.

Comment thread R/read_ngwmn_sites.R
#' `r get_ogc_params("sites", base = "NGWMN")$agency_code$description`
#' @param monitoring_location_number
#' `r get_ogc_params("sites", base = "NGWMN")$monitoring_location_number$description`
#' @param altitude

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Future enhancement could turn this into a range input, since right now you just give it a number as a character. I tried test <- read_ngwmn_sites(altitude = "100") and test <- read_ngwmn_sites(altitude = "100.25") just for fun and the first returned a few sites. But then looking at something like test <- read_ngwmn_sites(state_name = "Minnesota"), altitudes do go out to the hundredths place and are very specific, so unless you know the exact altitude(s) you want, this input seems pretty useless.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added read_ngwmn to take custom CQL:

 cql <- '{
  "op": "between",
    "args": [
       { "property": "water_level_above_navd88_ft" },
       [ "100.00", "200.00" ]
    ]
 }'

 wl_data <- read_ngwmn(service = "waterLevelObs",
                       monitoring_location_id = c("USGS-272838082142201", 
                                                  "USGS-404159100494601", 
                                                  "USGS-401216080362703"),
                       CQL = cql)

maybe down the road we can think about ways to add numeric ranges like we have times...

#' @param water_depth_below_land_surface_ft
#' `r get_ogc_params("waterLevelObs", base = "NGWMN")$water_depth_below_land_surface_ft$description`
#' @param water_level_above_site_datum_ft
#' `r get_ogc_params("waterLevelObs", base = "NGWMN")$water_level_above_site_datum_ft$description`

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as above about someday making these ranges, rather than singular strings.

Comment thread R/read_ngwmn_water_level.R Outdated
#'
#' @param datetime
#' `r get_ogc_params("waterLevelObs", base = "NGWMN")$sample_time$descriptiond`
#' Multiple time_series_ids can be requested as a character vector.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A little confused on this parameter. It's called datetime but it takes timeseries ids? Should it be called sample_time? Even that is confusing with the description...

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops! Nope, that's just an artifact of copy/pasting

Co-authored-by: Elise Hinman <121896266+ehinman@users.noreply.github.com>

@ehinman ehinman left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are working well for me and are straightforward. Added some comments about how some of the input parameters seem a little bit silly without knowledge of how to use CQL2 and/or some custom code in dataRetrieval that allows users to enter ranges -- maybe in a future PR if people would find it useful.

@ldecicco-USGS ldecicco-USGS merged commit aa0ee50 into DOI-USGS:develop Jun 23, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants