diff --git a/.github/workflows/python-package.yml b/.github/workflows/python-package.yml index c7d487f2..4563b449 100644 --- a/.github/workflows/python-package.yml +++ b/.github/workflows/python-package.yml @@ -36,7 +36,5 @@ jobs: flake8 . --count --exit-zero --max-complexity=10 --max-line-length=127 --statistics - name: Test with pytest and report coverage run: | - cd tests - coverage run -m pytest + coverage run -m pytest tests/ coverage report -m - cd .. diff --git a/README.md b/README.md index 74641211..f7a8664d 100644 --- a/README.md +++ b/README.md @@ -6,124 +6,219 @@ ## Latest Announcements -:mega: **10/01/2025:** `dataretrieval` is pleased to offer a new, *in-development* module, `waterdata`, which gives users access USGS's modernized [Water Data APIs](https://api.waterdata.usgs.gov/). The Water Data API endpoints include daily values, instantaneous values, field measurements (modernized groundwater levels service), time series metadata, and discrete water quality data from the Samples database. Though there will be a period of overlap, the functions within `waterdata` will eventually replace the `nwis` module, which currently provides access to the legacy [NWIS Water Services](https://waterservices.usgs.gov/). More example workflows and functions coming soon. Check `help(waterdata)` for more information. - -**Important:** Users of the Water Data APIs are strongly encouraged to obtain an API key, which gives users higher rate limits and thus greater access to USGS data. [Register for an API key](https://api.waterdata.usgs.gov/signup/) and then place that API key in your python environment as an environment variable named "API_USGS_PAT". One option is to set the variable as follows: +:mega: **10/01/2025:** `dataretrieval` now features the new `waterdata` module, +which provides access to USGS's modernized [Water Data +APIs](https://api.waterdata.usgs.gov/). The Water Data API endpoints include +daily values, instantaneous values, field measurements, time series metadata, +and discrete water quality data from the Samples database. This new module will +eventually replace the `nwis` module, which provides access to the legacy [NWIS +Water Services](https://waterservices.usgs.gov/). + +**Important:** Users of the Water Data APIs are strongly encouraged to obtain an +API key for higher rate limits and greater access to USGS data. [Register for +an API key](https://api.waterdata.usgs.gov/signup/) and set it as an +environment variable: ```python import os os.environ["API_USGS_PAT"] = "your_api_key_here" ``` -Note that you may need to restart your python session for the environment variable to be recognized. -Check out the [NEWS](NEWS.md) file for all updates and announcements, or track updates to the package via the GitHub releases. +Check out the [NEWS](NEWS.md) file for all updates and announcements. ## What is dataretrieval? -`dataretrieval` was created to simplify the process of loading hydrologic data into the Python environment. -Like the original R version [`dataRetrieval`](https://github.com/DOI-USGS/dataRetrieval), -it is designed to retrieve the major data types of U.S. Geological Survey (USGS) hydrology -data that are available on the Web, as well as data from the Water -Quality Portal (WQP), which currently houses water quality data from the -Environmental Protection Agency (EPA), U.S. Department of Agriculture -(USDA), and USGS. Direct USGS data is obtained from a service called the -National Water Information System (NWIS). -Note that the python version is not a direct port of the original: it attempts to reproduce the functionality of the R package, though its organization and interface often differ. +`dataretrieval` simplifies the process of loading hydrologic data into Python. +Like the original R version +[`dataRetrieval`](https://github.com/DOI-USGS/dataRetrieval), it retrieves major +U.S. Geological Survey (USGS) hydrology data types available on the Web, as well +as data from the Water Quality Portal (WQP) and Network Linked Data Index +(NLDI). -If there's a hydrologic or environmental data portal that you'd like dataretrieval to -work with, raise it as an [issue](https://github.com/USGS-python/dataretrieval/issues). +## Usage Examples -Here's an example using `dataretrieval` to retrieve data from the National Water Information System (NWIS). +### Water Data API (Recommended - Modern USGS Data) -```python -# first import the functions for downloading data from NWIS -import dataretrieval.nwis as nwis +The `waterdata` module provides access to modern USGS Water Data APIs: -# specify the USGS site code for which we want data. -site = '03339000' +```python +import dataretrieval.waterdata as waterdata + +# Get daily streamflow data (returns DataFrame and metadata) +df, metadata = waterdata.get_daily( + monitoring_location_id='USGS-01646500', + parameter_code='00060', # Discharge + time='2024-10-01/2024-10-02' +) + +print(f"Retrieved {len(df)} records") +print(f"Site: {df['monitoring_location_id'].iloc[0]}") +print(f"Mean discharge: {df['value'].mean():.2f} {df['unit_of_measure'].iloc[0]}") +``` -# get instantaneous values (iv) -df = nwis.get_record(sites=site, service='iv', start='2017-12-31', end='2018-01-01') +```python +# Get monitoring location information +locations, metadata = waterdata.get_monitoring_locations( + state_name='Maryland', + site_type_code='ST' # Stream sites +) -# get basic info about the site -df2 = nwis.get_record(sites=site, service='site') +print(f"Found {len(locations)} stream monitoring locations in Maryland") ``` -Services available from NWIS include: -- instantaneous values (iv) -- daily values (dv) -- statistics (stat) -- site info (site) -- discharge peaks (peaks) -- discharge measurements (measurements) - -Water quality data are available from: -- [Samples](https://waterdata.usgs.gov/download-samples/#dataProfile=site) - Discrete USGS water quality data only -- [Water Quality Portal](https://www.waterqualitydata.us/) - Discrete water quality data from USGS and EPA. Older data are available in the legacy WQX version 2 format; all data are available in the beta WQX3.0 format. - -To access the full functionality available from NWIS web services, `nwis.get_record()` appends any additional kwargs into the REST request. For example, this function call: + +### NWIS Legacy Services (Deprecated but still functional) + +The `nwis` module accesses legacy NWIS Water Services: + ```python -nwis.get_record(sites='03339000', service='dv', start='2017-12-31', parameterCd='00060') +import dataretrieval.nwis as nwis + +# Get site information +info, metadata = nwis.get_info(sites='01646500') + +print(f"Site name: {info['station_nm'].iloc[0]}") + +# Get daily values +dv, metadata = nwis.get_dv( + sites='01646500', + start='2024-10-01', + end='2024-10-02', + parameterCd='00060', +) + +print(f"Retrieved {len(dv)} daily values") ``` -...will download daily data with the parameter code 00060 (discharge). -## Accessing the "Internal" NWIS -If you're connected to the USGS network, dataretrieval call pull from the internal (non-public) NWIS interface. -Most dataretrieval functions pass kwargs directly to NWIS's REST API, which provides simple access to internal data; simply specify "access='3'". -For example +### Water Quality Portal (WQP) + +Access water quality data from multiple agencies: + ```python -nwis.get_record(sites='05404147',service='iv', start='2021-01-01', end='2021-3-01', access='3') +import dataretrieval.wqp as wqp + +# Find water quality monitoring sites +sites = wqp.what_sites( + statecode='US:55', # Wisconsin + siteType='Stream' +) + +print(f"Found {len(sites)} stream monitoring sites in Wisconsin") + +# Get water quality results +results = wqp.get_results( + siteid='USGS-05427718', + characteristicName='Temperature, water' +) + +print(f"Retrieved {len(results)} temperature measurements") ``` -## Quick start +### Network Linked Data Index (NLDI) -dataretrieval can be installed using pip: - - $ python3 -m pip install -U dataretrieval +Discover and navigate hydrologic networks: -or conda: +```python +import dataretrieval.nldi as nldi - $ conda install -c conda-forge dataretrieval +# Get watershed basin for a stream reach +basin = nldi.get_basin( + feature_source='comid', + feature_id='13293474' # NHD reach identifier +) -More examples of use are include in [`demos`](https://github.com/USGS-python/dataretrieval/tree/main/demos). +print(f"Basin contains {len(basin)} feature(s)") -## Issue tracker +# Find upstream flowlines +flowlines = nldi.get_flowlines( + feature_source='comid', + feature_id='13293474', + navigation_mode='UT', # Upstream tributaries + distance=50 # km +) -Please report any bugs and enhancement ideas using the dataretrieval issue -tracker: +print(f"Found {len(flowlines)} upstream tributaries within 50km") +``` - https://github.com/USGS-python/dataretrieval/issues +## Available Data Services + +### Modern USGS Water Data APIs (Recommended) +- **Daily values**: Daily statistical summaries (mean, min, max) +- **Instantaneous values**: High-frequency continuous data +- **Field measurements**: Discrete measurements from field visits +- **Monitoring locations**: Site information and metadata +- **Time series metadata**: Information about available data parameters + +### Legacy NWIS Services (Deprecated) +- **Daily values (dv)**: Legacy daily statistical data +- **Instantaneous values (iv)**: Legacy continuous data +- **Site info (site)**: Basic site information +- **Statistics (stat)**: Statistical summaries +- **Discharge peaks (peaks)**: Annual peak discharge events +- **Discharge measurements (measurements)**: Direct flow measurements + +### Water Quality Portal +- **Results**: Water quality analytical results from USGS, EPA, and other agencies +- **Sites**: Monitoring location information +- **Organizations**: Data provider information +- **Projects**: Sampling project details + +### Network Linked Data Index (NLDI) +- **Basin delineation**: Watershed boundaries for any point +- **Flow navigation**: Upstream/downstream network traversal +- **Feature discovery**: Find monitoring sites, dams, and other features +- **Hydrologic connectivity**: Link data across the stream network + +## Installation + +Install dataretrieval using pip: + +```bash +pip install dataretrieval +``` -Feel free to also ask questions on the tracker. +Or using conda: +```bash +conda install -c conda-forge dataretrieval +``` -## Contributing +## More Examples -Any help in testing, development, documentation and other tasks is welcome. -For more details, see the file [CONTRIBUTING.md](CONTRIBUTING.md). +Explore additional examples in the +[`demos`](https://github.com/USGS-python/dataretrieval/tree/main/demos) +directory, including Jupyter notebooks demonstrating advanced usage patterns. +## Getting Help -## Need help? +- **Issue tracker**: Report bugs and request features at https://github.com/USGS-python/dataretrieval/issues +- **Documentation**: Full API documentation available in the source code docstrings -The Water Mission Area of the USGS supports the development and maintenance of `dataretrieval`. Any questions can be directed to the Computational Tools team at comptools@usgs.gov. +## Contributing -Resources are available primarily for maintenance and responding to user questions. -Priorities on the development of new features are determined by the `dataretrieval` development team. +Contributions are welcome! Please see [CONTRIBUTING.md](CONTRIBUTING.md) for +development guidelines. ## Acknowledgments -This material is partially based upon work supported by the National Science Foundation (NSF) under award 1931297. -Any opinions, findings, conclusions, or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the NSF. + +This material is partially based upon work supported by the National Science +Foundation (NSF) under award 1931297. Any opinions, findings, conclusions, or +recommendations expressed in this material are those of the authors and do not +necessarily reflect the views of the NSF. ## Disclaimer -This software is preliminary or provisional and is subject to revision. -It is being provided to meet the need for timely best science. -The software has not received final approval by the U.S. Geological Survey (USGS). -No warranty, expressed or implied, is made by the USGS or the U.S. Government as to the functionality of the software and related material nor shall the fact of release constitute any such warranty. -The software is provided on the condition that neither the USGS nor the U.S. Government shall be held liable for any damages resulting from the authorized or unauthorized use of the software. +This software is preliminary or provisional and is subject to revision. It is +being provided to meet the need for timely best science. The software has not +received final approval by the U.S. Geological Survey (USGS). No warranty, +expressed or implied, is made by the USGS or the U.S. Government as to the +functionality of the software and related material nor shall the fact of release +constitute any such warranty. The software is provided on the condition that +neither the USGS nor the U.S. Government shall be held liable for any damages +resulting from the authorized or unauthorized use of the software. ## Citation -Hodson, T.O., Hariharan, J.A., Black, S., and Horsburgh, J.S., 2023, dataretrieval (Python): a Python package for discovering -and retrieving water data available from U.S. federal hydrologic web services: -U.S. Geological Survey software release, -https://doi.org/10.5066/P94I5TX3. +Hodson, T.O., Hariharan, J.A., Black, S., and Horsburgh, J.S., 2023, +dataretrieval (Python): a Python package for discovering and retrieving water +data available from U.S. federal hydrologic web services: U.S. Geological Survey +software release, https://doi.org/10.5066/P94I5TX3. diff --git a/dataretrieval/nwis.py b/dataretrieval/nwis.py index 1189b790..e4615d10 100644 --- a/dataretrieval/nwis.py +++ b/dataretrieval/nwis.py @@ -2,13 +2,6 @@ .. _National Water Information System (NWIS): https://waterdata.usgs.gov/nwis - -.. todo:: - - * Create a test to check whether functions pull multiple sites - * Work on multi-index capabilities. - * Check that all timezones are handled properly for each service. - """ import re @@ -19,7 +12,7 @@ import pandas as pd import requests -from dataretrieval.utils import BaseMetadata, format_datetime, to_str +from dataretrieval.utils import BaseMetadata, format_datetime from .utils import query @@ -28,6 +21,14 @@ except ImportError: gpd = None +# Issue deprecation warning upon import +warnings.warn( + "The 'nwis' services are deprecated and being decommissioned. " + "Please use the 'waterdata' module to access the new services.", + DeprecationWarning, + stacklevel=2 +) + WATERDATA_BASE_URL = "https://nwis.waterdata.usgs.gov/" WATERDATA_URL = WATERDATA_BASE_URL + "nwis/" WATERSERVICE_URL = "https://waterservices.usgs.gov/nwis/" diff --git a/dataretrieval/samples.py b/dataretrieval/samples.py index c55c1a84..a6df85b3 100644 --- a/dataretrieval/samples.py +++ b/dataretrieval/samples.py @@ -11,18 +11,17 @@ import pandas as pd import warnings -from dataretrieval.utils import BaseMetadata, to_str -from dataretrieval.waterdata import get_samples +from dataretrieval.utils import BaseMetadata if TYPE_CHECKING: from typing import Optional, Tuple, Union - from dataretrieval.waterdata import _SERVICES, _PROFILES + from dataretrieval.waterdata import SERVICES, PROFILES from pandas import DataFrame def get_usgs_samples( ssl_check: bool = True, - service: _SERVICES = "results", - profile: _PROFILES = "fullphyschem", + service: SERVICES = "results", + profile: PROFILES = "fullphyschem", activityMediaName: Optional[Union[str, list[str]]] = None, activityStartDateLower: Optional[str] = None, activityStartDateUpper: Optional[str] = None, @@ -212,7 +211,8 @@ def get_usgs_samples( DeprecationWarning, stacklevel=2, ) - + + from dataretrieval.waterdata import get_samples result = get_samples( ssl_check=ssl_check, service=service, diff --git a/dataretrieval/waterdata/__init__.py b/dataretrieval/waterdata/__init__.py new file mode 100644 index 00000000..7d87f79c --- /dev/null +++ b/dataretrieval/waterdata/__init__.py @@ -0,0 +1,43 @@ +""" +Water Data API module for accessing USGS water data services. + +This module provides functions for downloading data from the Water Data APIs, +including the USGS Aquarius Samples database. + +See https://api.waterdata.usgs.gov/ for API reference. +""" + +from __future__ import annotations + +# Public API exports +from .api import ( + get_codes, + get_daily, + get_field_measurements, + get_latest_continuous, + get_monitoring_locations, + get_samples, + get_time_series_metadata, + _check_profiles, +) +from .types import ( + CODE_SERVICES, + SERVICES, + PROFILES, + PROFILE_LOOKUP, +) + +__all__ = [ + "get_codes", + "get_daily", + "get_field_measurements", + "get_latest_continuous", + "get_monitoring_locations", + "get_samples", + "get_time_series_metadata", + "_check_profiles", + "CODE_SERVICES", + "SERVICES", + "PROFILES", + "PROFILE_LOOKUP", +] diff --git a/dataretrieval/waterdata.py b/dataretrieval/waterdata/api.py similarity index 63% rename from dataretrieval/waterdata.py rename to dataretrieval/waterdata/api.py index 7c503b23..ad40d132 100644 --- a/dataretrieval/waterdata.py +++ b/dataretrieval/waterdata/api.py @@ -1,107 +1,62 @@ -"""Functions for downloading data from the Water Data APIs, including the USGS Aquarius Samples database. +"""Functions for downloading data from the Water Data APIs, including the USGS +Aquarius Samples database. See https://api.waterdata.usgs.gov/ for API reference. """ -from __future__ import annotations - import json +import logging from io import StringIO -from typing import TYPE_CHECKING, Literal, List, get_args +from typing import Optional, List, Tuple, Union, get_args import pandas as pd import requests from requests.models import PreparedRequest from dataretrieval.utils import BaseMetadata, to_str -from dataretrieval import waterdata_helpers - -if TYPE_CHECKING: - from typing import Optional, Tuple, Union - - from pandas import DataFrame - - -_BASE_URL = "https://api.waterdata.usgs.gov/" - -_SAMPLES_URL = _BASE_URL + "samples-data" - -_CODE_SERVICES = Literal[ - "characteristicgroup", - "characteristics", - "counties", - "countries", - "observedproperty", - "samplemedia", - "sitetype", - "states", -] - -_SERVICES = Literal["activities", "locations", "organizations", "projects", "results"] - -_PROFILES = Literal[ - "actgroup", - "actmetric", - "basicbio", - "basicphyschem", - "count", - "fullbio", - "fullphyschem", - "labsampleprep", - "narrow", - "organization", - "project", - "projectmonitoringlocationweight", - "resultdetectionquantitationlimit", - "sampact", - "site", -] - -_PROFILE_LOOKUP = { - "activities": ["sampact", "actmetric", "actgroup", "count"], - "locations": ["site", "count"], - "organizations": ["organization", "count"], - "projects": ["project", "projectmonitoringlocationweight"], - "results": [ - "fullphyschem", - "basicphyschem", - "fullbio", - "basicbio", - "narrow", - "resultdetectionquantitationlimit", - "labsampleprep", - "count", - ], -} +from dataretrieval.waterdata.types import ( + CODE_SERVICES, + PROFILE_LOOKUP, + PROFILES, + SERVICES, +) +from dataretrieval.waterdata.utils import SAMPLES_URL, get_ogc_data + +# Set up logger for this module +logger = logging.getLogger(__name__) + def get_daily( - monitoring_location_id: Optional[Union[str, List[str]]] = None, - parameter_code: Optional[Union[str, List[str]]] = None, - statistic_id: Optional[Union[str, List[str]]] = None, - properties: Optional[List[str]] = None, - time_series_id: Optional[Union[str, List[str]]] = None, - daily_id: Optional[Union[str, List[str]]] = None, - approval_status: Optional[Union[str, List[str]]] = None, - unit_of_measure: Optional[Union[str, List[str]]] = None, - qualifier: Optional[Union[str, List[str]]] = None, - value: Optional[Union[str, List[str]]] = None, - last_modified: Optional[str] = None, - skipGeometry: Optional[bool] = None, - time: Optional[Union[str, List[str]]] = None, - bbox: Optional[List[float]] = None, - limit: Optional[int] = None, - max_results: Optional[int] = None, - convertType: bool = True - ) -> pd.DataFrame: - """Daily data provide one data value to represent water conditions for the day. - Throughout much of the history of the USGS, the primary water data available was - daily data collected manually at the monitoring location once each day. With - improved availability of computer storage and automated transmission of data, the - daily data published today are generally a statistical summary or metric of the - continuous data collected each day, such as the daily mean, minimum, or maximum - value. Daily data are automatically calculated from the continuous data of the same - parameter code and are described by parameter code and a statistic code. These data - have also been referred to as “daily values” or “DV”. + monitoring_location_id: Optional[Union[str, List[str]]] = None, + parameter_code: Optional[Union[str, List[str]]] = None, + statistic_id: Optional[Union[str, List[str]]] = None, + properties: Optional[List[str]] = None, + time_series_id: Optional[Union[str, List[str]]] = None, + daily_id: Optional[Union[str, List[str]]] = None, + approval_status: Optional[Union[str, List[str]]] = None, + unit_of_measure: Optional[Union[str, List[str]]] = None, + qualifier: Optional[Union[str, List[str]]] = None, + value: Optional[Union[str, List[str]]] = None, + last_modified: Optional[str] = None, + skip_geometry: Optional[bool] = None, + time: Optional[Union[str, List[str]]] = None, + bbox: Optional[List[float]] = None, + limit: Optional[int] = None, + max_results: Optional[int] = None, + convert_type: bool = True, +) -> Tuple[pd.DataFrame, BaseMetadata]: + """Daily data provide one data value to represent water conditions for the + day. + + Throughout much of the history of the USGS, the primary water data available + was daily data collected manually at the monitoring location once each day. + With improved availability of computer storage and automated transmission of + data, the daily data published today are generally a statistical summary or + metric of the continuous data collected each day, such as the daily mean, + minimum, or maximum value. Daily data are automatically calculated from the + continuous data of the same parameter code and are described by parameter + code and a statistic code. These data have also been referred to as “daily + values” or “DV”. Parameters ---------- @@ -131,25 +86,17 @@ def get_daily( A unique identifier representing a single time series. This corresponds to the id field in the time-series-metadata endpoint. daily_id : string or list of strings, optional - A universally unique identifier (UUID) representing a single - version of a record. It is not stable over time. Every time the - record is refreshed in our database (which may happen as part of - normal operations and does not imply any change to the data itself) - a new ID will be generated. To uniquely identify a single observation - over time, compare the time and time_series_id fields; each time series - will only have a single observation at a given time. + A universally unique identifier (UUID) representing a single version of + a record. It is not stable over time. Every time the record is refreshed + in our database (which may happen as part of normal operations and does + not imply any change to the data itself) a new ID will be generated. To + uniquely identify a single observation over time, compare the time and + time_series_id fields; each time series will only have a single + observation at a given time. approval_status : string or list of strings, optional - Some of the data that you have obtained from this U.S. Geological - Survey database may not have received Director's approval. Any such - data values are qualified as provisional and are subject to revision. - Provisional data are released on the condition that neither the USGS - nor the United States Government may be held liable for any damages - resulting from its use. This field reflects the approval status of - each record, and is either "Approved", meaining processing review has - been completed and the data is approved for publication, or - "Provisional" and subject to revision. For more information about - provisional data, go to - https://waterdata.usgs.gov/provisional-data-statement/. + Some of the data that you have obtained from this U.S. Geological Survey + database may not have received Director's approval. Any such data values + are qualified as provisional and are subject to revision. unit_of_measure : string or list of strings, optional A human-readable description of the units of measurement associated with an observation. @@ -166,44 +113,55 @@ def get_daily( anything about the measurement has changed. You can query this field using date-times or intervals, adhering to RFC 3339, or using ISO 8601 duration objects. Intervals may be bounded or half-bounded (double-dots - at start or end). Examples: + at start or end). + Examples: - A date-time: "2018-02-12T23:20:50Z" - A bounded interval: "2018-02-12T00:00:00Z/2018-03-18T12:31:12Z" - - Half-bounded intervals: "2018-02-12T00:00:00Z/.." or "../2018-03-18T12:31:12Z" - - Duration objects: "P1M" for data from the past month or "PT36H" for the last 36 hours - Only features that have a last_modified that intersects the value of datetime are selected. - skipGeometry : boolean, optional - This option can be used to skip response geometries for each feature. The returning - object will be a data frame with no spatial information. + - Half-bounded intervals: "2018-02-12T00:00:00Z/.." or + "../2018-03-18T12:31:12Z" + - Duration objects: "P1M" for data from the past month or "PT36H" + for the last 36 hours + Only features that have a last_modified that intersects the value of + datetime are selected. + skip_geometry : boolean, optional + This option can be used to skip response geometries for each feature. + The returning object will be a data frame with no spatial information. time : string, optional - The date an observation represents. You can query this field using date-times - or intervals, adhering to RFC 3339, or using ISO 8601 duration objects. - Intervals may be bounded or half-bounded (double-dots at start or end). + The date an observation represents. You can query this field using + date-times or intervals, adhering to RFC 3339, or using ISO 8601 + duration objects. Intervals may be bounded or half-bounded (double-dots + at start or end). Only features that have a time that intersects the + value of datetime are selected. If a feature has multiple temporal + properties, it is the decision of the server whether only a single + temporal property is used to determine the extent or all relevant + temporal properties. Examples: - A date-time: "2018-02-12T23:20:50Z" - A bounded interval: "2018-02-12T00:00:00Z/2018-03-18T12:31:12Z" - - Half-bounded intervals: "2018-02-12T00:00:00Z/.." or "../2018-03-18T12:31:12Z" - - Duration objects: "P1M" for data from the past month or "PT36H" for the last 36 hours - Only features that have a time that intersects the value of datetime are selected. If - a feature has multiple temporal properties, it is the decision of the server whether - only a single temporal property is used to determine the extent or all relevant temporal properties. + - Half-bounded intervals: "2018-02-12T00:00:00Z/.." or + "../2018-03-18T12:31:12Z" + - Duration objects: "P1M" for data from the past month or "PT36H" + for the last 36 hours bbox : list of numbers, optional - Only features that have a geometry that intersects the bounding box are selected. - The bounding box is provided as four or six numbers, depending on whether the - coordinate reference system includes a vertical axis (height or depth). Coordinates - are assumed to be in crs 4326. The expected format is a numeric vector structured: - c(xmin,ymin,xmax,ymax). Another way to think of it is c(Western-most longitude, - Southern-most latitude, Eastern-most longitude, Northern-most longitude). + Only features that have a geometry that intersects the bounding box are + selected. The bounding box is provided as four or six numbers, + depending on whether the coordinate reference system includes a vertical + axis (height or depth). Coordinates are assumed to be in crs 4326. The + expected format is a numeric vector structured: c(xmin,ymin,xmax,ymax). + Another way to think of it is c(Western-most longitude, Southern-most + latitude, Eastern-most longitude, Northern-most longitude). limit : numeric, optional - The optional limit parameter is used to control the subset of the selected features - that should be returned in each page. The maximum allowable limit is 10000. It may - be beneficial to set this number lower if your internet connection is spotty. The - default (NA) will set the limit to the maximum allowable limit for the service. + The optional limit parameter is used to control the subset of the + selected features that should be returned in each page. The maximum + allowable limit is 10000. It may be beneficial to set this number lower + if your internet connection is spotty. The default (NA) will set the + limit to the maximum allowable limit for the service. max_results : numeric, optional - The optional maximum number of rows to return. This value must be less than the - requested limit. - convertType : boolean, optional - If True, the function will convert the data to dates and qualifier to string vector + The optional maximum number of rows to return. This value must be less + than the requested limit. + convert_type : boolean, optional + If True, the function will convert the data to dates and qualifier to + string vector Returns ------- @@ -216,10 +174,10 @@ def get_daily( >>> # Get daily flow data from a single site >>> # over a yearlong period - >>> df = dataretrieval.waterdata.get_daily( - ... monitoring_location_id = "USGS-02238500", - ... parameter_code = "00060", - ... time = "2021-01-01T00:00:00Z/2022-01-01T00:00:00Z" + >>> df, metadata = dataretrieval.waterdata.get_daily( + ... monitoring_location_id="USGS-02238500", + ... parameter_code="00060", + ... time="2021-01-01T00:00:00Z/2022-01-01T00:00:00Z", ... ) >>> # Get monitoring location info for specific sites @@ -228,73 +186,75 @@ def get_daily( ... monitoring_location_id = ["USGS-05114000", "USGS-09423350"], ... approval_status = "Approved", ... time = "2024-01-01/.." - """ + """ service = "daily" output_id = "daily_id" # Build argument dictionary, omitting None values - args = { - k: v for k, v in locals().items() + args = { + k: v + for k, v in locals().items() if k not in {"service", "output_id"} and v is not None } - return waterdata_helpers.get_ogc_data(args, output_id, service) + return get_ogc_data(args, output_id, service) + def get_monitoring_locations( - monitoring_location_id: Optional[List[str]] = None, - agency_code: Optional[List[str]] = None, - agency_name: Optional[List[str]] = None, - monitoring_location_number: Optional[List[str]] = None, - monitoring_location_name: Optional[List[str]] = None, - district_code: Optional[List[str]] = None, - country_code: Optional[List[str]] = None, - country_name: Optional[List[str]] = None, - state_code: Optional[List[str]] = None, - state_name: Optional[List[str]] = None, - county_code: Optional[List[str]] = None, - county_name: Optional[List[str]] = None, - minor_civil_division_code: Optional[List[str]] = None, - site_type_code: Optional[List[str]] = None, - site_type: Optional[List[str]] = None, - hydrologic_unit_code: Optional[List[str]] = None, - basin_code: Optional[List[str]] = None, - altitude: Optional[List[str]] = None, - altitude_accuracy: Optional[List[str]] = None, - altitude_method_code: Optional[List[str]] = None, - altitude_method_name: Optional[List[str]] = None, - vertical_datum: Optional[List[str]] = None, - vertical_datum_name: Optional[List[str]] = None, - horizontal_positional_accuracy_code: Optional[List[str]] = None, - horizontal_positional_accuracy: Optional[List[str]] = None, - horizontal_position_method_code: Optional[List[str]] = None, - horizontal_position_method_name: Optional[List[str]] = None, - original_horizontal_datum: Optional[List[str]] = None, - original_horizontal_datum_name: Optional[List[str]] = None, - drainage_area: Optional[List[str]] = None, - contributing_drainage_area: Optional[List[str]] = None, - time_zone_abbreviation: Optional[List[str]] = None, - uses_daylight_savings: Optional[List[str]] = None, - construction_date: Optional[List[str]] = None, - aquifer_code: Optional[List[str]] = None, - national_aquifer_code: Optional[List[str]] = None, - aquifer_type_code: Optional[List[str]] = None, - well_constructed_depth: Optional[List[str]] = None, - hole_constructed_depth: Optional[List[str]] = None, - depth_source_code: Optional[List[str]] = None, - properties: Optional[List[str]] = None, - skipGeometry: Optional[bool] = None, - time: Optional[Union[str, List[str]]] = None, - bbox: Optional[List[float]] = None, - limit: Optional[int] = None, - max_results: Optional[int] = None, - convertType: bool = True - ) -> pd.DataFrame: + monitoring_location_id: Optional[List[str]] = None, + agency_code: Optional[List[str]] = None, + agency_name: Optional[List[str]] = None, + monitoring_location_number: Optional[List[str]] = None, + monitoring_location_name: Optional[List[str]] = None, + district_code: Optional[List[str]] = None, + country_code: Optional[List[str]] = None, + country_name: Optional[List[str]] = None, + state_code: Optional[List[str]] = None, + state_name: Optional[List[str]] = None, + county_code: Optional[List[str]] = None, + county_name: Optional[List[str]] = None, + minor_civil_division_code: Optional[List[str]] = None, + site_type_code: Optional[List[str]] = None, + site_type: Optional[List[str]] = None, + hydrologic_unit_code: Optional[List[str]] = None, + basin_code: Optional[List[str]] = None, + altitude: Optional[List[str]] = None, + altitude_accuracy: Optional[List[str]] = None, + altitude_method_code: Optional[List[str]] = None, + altitude_method_name: Optional[List[str]] = None, + vertical_datum: Optional[List[str]] = None, + vertical_datum_name: Optional[List[str]] = None, + horizontal_positional_accuracy_code: Optional[List[str]] = None, + horizontal_positional_accuracy: Optional[List[str]] = None, + horizontal_position_method_code: Optional[List[str]] = None, + horizontal_position_method_name: Optional[List[str]] = None, + original_horizontal_datum: Optional[List[str]] = None, + original_horizontal_datum_name: Optional[List[str]] = None, + drainage_area: Optional[List[str]] = None, + contributing_drainage_area: Optional[List[str]] = None, + time_zone_abbreviation: Optional[List[str]] = None, + uses_daylight_savings: Optional[List[str]] = None, + construction_date: Optional[List[str]] = None, + aquifer_code: Optional[List[str]] = None, + national_aquifer_code: Optional[List[str]] = None, + aquifer_type_code: Optional[List[str]] = None, + well_constructed_depth: Optional[List[str]] = None, + hole_constructed_depth: Optional[List[str]] = None, + depth_source_code: Optional[List[str]] = None, + properties: Optional[List[str]] = None, + skip_geometry: Optional[bool] = None, + time: Optional[Union[str, List[str]]] = None, + bbox: Optional[List[float]] = None, + limit: Optional[int] = None, + max_results: Optional[int] = None, + convert_type: bool = True, +) -> Tuple[pd.DataFrame, BaseMetadata]: """Location information is basic information about the monitoring location including the name, identifier, agency responsible for data collection, and the date the location was established. It also includes information about the type of location, such as stream, lake, or groundwater, and geographic - information about the location, such as state, county, latitude and longitude, - and hydrologic unit code (HUC). + information about the location, such as state, county, latitude and + longitude, and hydrologic unit code (HUC). Parameters ---------- @@ -364,23 +324,25 @@ def get_monitoring_locations( hydrologic_unit_code : string or list of strings, optional The United States is divided and sub-divided into successively smaller hydrologic units which are classified into four levels: regions, - sub-regions, accounting units, and cataloging units. The hydrologic units - are arranged within each other, from the smallest (cataloging units) to the - largest (regions). Each hydrologic unit is identified by a unique hydrologic - unit code (HUC) consisting of two to eight digits based on the four levels - of classification in the hydrologic unit system. + sub-regions, accounting units, and cataloging units. The hydrologic + units are arranged within each other, from the smallest (cataloging + units) to the largest (regions). Each hydrologic unit is identified by a + unique hydrologic unit code (HUC) consisting of two to eight digits + based on the four levels of classification in the hydrologic unit + system. basin_code : string or list of strings, optional The Basin Code or "drainage basin code" is a two-digit code that further subdivides the 8-digit hydrologic-unit code. The drainage basin code is - defined by the USGS State Office where the monitoring location is located. + defined by the USGS State Office where the monitoring location is + located. altitude : string or list of strings, optional Altitude of the monitoring location referenced to the specified Vertical Datum. altitude_accuracy : string or list of strings, optional Accuracy of the altitude, in feet. An accuracy of +/- 0.1 foot would be entered as “.1”. Many altitudes are interpolated from the contours on - topographic maps; accuracies determined in this way are generally entered - as one-half of the contour interval. + topographic maps; accuracies determined in this way are generally + entered as one-half of the contour interval. altitude_method_code : string or list of strings, optional Codes representing the method used to measure altitude. A [list of codes](https://help.waterdata.usgs.gov/code/alt_meth_cd_query?fmt=html) is available. @@ -426,12 +388,13 @@ def get_monitoring_locations( point. contributing_drainage_area : string or list of strings, optional The contributing drainage area of a lake, stream, wetland, or estuary - monitoring location, in square miles. This item should be present only if - the contributing area is different from the total drainage area. This - situation can occur when part of the drainage area consists of very porous - soil or depressions that either allow all runoff to enter the groundwater - or traps the water in ponds so that rainfall does not contribute to runoff. - A transbasin diversion can also affect the total drainage area. + monitoring location, in square miles. This item should be present only + if the contributing area is different from the total drainage area. This + situation can occur when part of the drainage area consists of very + porous soil or depressions that either allow all runoff to enter the + groundwater or traps the water in ponds so that rainfall does not + contribute to runoff. A transbasin diversion can also affect the total + drainage area. time_zone_abbreviation : string or list of strings, optional A short code describing the time zone used by a monitoring location. uses_daylight_savings : string or list of strings, optional @@ -441,8 +404,9 @@ def get_monitoring_locations( aquifer_code : string or list of strings, optional Local aquifers in the USGS water resources data base are identified by a geohydrologic unit code (a three-digit number related to the age of the - formation, followed by a 4 or 5 character abbreviation for the geologic unit - or aquifer name). Additional information is available [at this link](https://help.waterdata.usgs.gov/faq/groundwater/local-aquifer-description). + formation, followed by a 4 or 5 character abbreviation for the geologic + unit or aquifer name). Additional information is available + [at this link](https://help.waterdata.usgs.gov/faq/groundwater/local-aquifer-description). national_aquifer_code : string or list of strings, optional National aquifers are the principal aquifers or aquifer systems in the United States, defined as regionally extensive aquifers or aquifer systems that have @@ -472,36 +436,41 @@ def get_monitoring_locations( A code indicating the source of water-level data. A [list of codes](https://help.waterdata.usgs.gov/code/water_level_src_cd_query?fmt=html) is available. properties : string or list of strings, optional - A vector of requested columns to be returned from the query. Available options - are: geometry, id, agency_code, agency_name, monitoring_location_number, - monitoring_location_name, district_code, country_code, country_name, state_code, - state_name, county_code, county_name, minor_civil_division_code, site_type_code, - site_type, hydrologic_unit_code, basin_code, altitude, altitude_accuracy, - altitude_method_code, altitude_method_name, vertical_datum, vertical_datum_name, - horizontal_positional_accuracy_code, horizontal_positional_accuracy, - horizontal_position_method_code, horizontal_position_method_name, - original_horizontal_datum, original_horizontal_datum_name, drainage_area, - contributing_drainage_area, time_zone_abbreviation, uses_daylight_savings, - construction_date, aquifer_code, national_aquifer_code, aquifer_type_code, - well_constructed_depth, hole_constructed_depth, depth_source_code. + A vector of requested columns to be returned from the query. Available + options are: geometry, id, agency_code, agency_name, + monitoring_location_number, monitoring_location_name, district_code, + country_code, country_name, state_code, state_name, county_code, + county_name, minor_civil_division_code, site_type_code, site_type, + hydrologic_unit_code, basin_code, altitude, altitude_accuracy, + altitude_method_code, altitude_method_name, vertical_datum, + vertical_datum_name, horizontal_positional_accuracy_code, + horizontal_positional_accuracy, horizontal_position_method_code, + horizontal_position_method_name, original_horizontal_datum, + original_horizontal_datum_name, drainage_area, + contributing_drainage_area, time_zone_abbreviation, + uses_daylight_savings, construction_date, aquifer_code, + national_aquifer_code, aquifer_type_code, well_constructed_depth, + hole_constructed_depth, depth_source_code. bbox : list of numbers, optional - Only features that have a geometry that intersects the bounding box are selected. - The bounding box is provided as four or six numbers, depending on whether the - coordinate reference system includes a vertical axis (height or depth). Coordinates - are assumed to be in crs 4326. The expected format is a numeric vector structured: - c(xmin,ymin,xmax,ymax). Another way to think of it is c(Western-most longitude, - Southern-most latitude, Eastern-most longitude, Northern-most longitude). + Only features that have a geometry that intersects the bounding box are + selected. The bounding box is provided as four or six numbers, + depending on whether the coordinate reference system includes a vertical + axis (height or depth). Coordinates are assumed to be in crs 4326. The + expected format is a numeric vector structured: c(xmin,ymin,xmax,ymax). + Another way to think of it is c(Western-most longitude, Southern-most + latitude, Eastern-most longitude, Northern-most longitude). limit : numeric, optional - The optional limit parameter is used to control the subset of the selected features - that should be returned in each page. The maximum allowable limit is 10000. It may - be beneficial to set this number lower if your internet connection is spotty. The - default (NA) will set the limit to the maximum allowable limit for the service. + The optional limit parameter is used to control the subset of the + selected features that should be returned in each page. The maximum + allowable limit is 10000. It may be beneficial to set this number lower + if your internet connection is spotty. The default (NA) will set the + limit to the maximum allowable limit for the service. max_results : numeric, optional - The optional maximum number of rows to return. This value must be less than the - requested limit. - skipGeometry : boolean, optional - This option can be used to skip response geometries for each feature. The returning - object will be a data frame with no spatial information. + The optional maximum number of rows to return. This value must be less + than the requested limit. + skip_geometry : boolean, optional + This option can be used to skip response geometries for each feature. + The returning object will be a data frame with no spatial information. Returns ------- @@ -515,54 +484,54 @@ def get_monitoring_locations( >>> # Get monitoring locations within a bounding box >>> # and leave out geometry >>> df = dataretrieval.waterdata.get_monitoring_locations( - ... bbox=[-90.2,42.6,-88.7,43.2], - ... skipGeometry=True + ... bbox=[-90.2, 42.6, -88.7, 43.2], skip_geometry=True ... ) >>> # Get monitoring location info for specific sites >>> # and only specific properties >>> df = dataretrieval.waterdata.get_monitoring_locations( - ... monitoring_location_id = ["USGS-05114000", "USGS-09423350"], - ... properties = ["monitoring_location_id", - ... "state_name", - ... "country_name"]) - """ + ... monitoring_location_id=["USGS-05114000", "USGS-09423350"], + ... properties=["monitoring_location_id", "state_name", "country_name"], + ... ) + """ service = "monitoring-locations" output_id = "monitoring_location_id" # Build argument dictionary, omitting None values - args = { - k: v for k, v in locals().items() + args = { + k: v + for k, v in locals().items() if k not in {"service", "output_id"} and v is not None } - return waterdata_helpers.get_ogc_data(args, output_id, service) + return get_ogc_data(args, output_id, service) + def get_time_series_metadata( - monitoring_location_id: Optional[Union[str, List[str]]] = None, - parameter_code: Optional[Union[str, List[str]]] = None, - parameter_name: Optional[Union[str, List[str]]] = None, - properties: Optional[Union[str, List[str]]] = None, - statistic_id: Optional[Union[str, List[str]]] = None, - last_modified: Optional[Union[str, List[str]]] = None, - begin: Optional[Union[str, List[str]]] = None, - end: Optional[Union[str, List[str]]] = None, - unit_of_measure: Optional[Union[str, List[str]]] = None, - computation_period_identifier: Optional[Union[str, List[str]]] = None, - computation_identifier: Optional[Union[str, List[str]]] = None, - thresholds: Optional[int] = None, - sublocation_identifier: Optional[Union[str, List[str]]] = None, - primary: Optional[Union[str, List[str]]] = None, - parent_time_series_id: Optional[Union[str, List[str]]] = None, - time_series_id: Optional[Union[str, List[str]]] = None, - web_description: Optional[Union[str, List[str]]] = None, - skipGeometry: Optional[bool] = None, - time: Optional[Union[str, List[str]]] = None, - bbox: Optional[List[float]] = None, - limit: Optional[int] = None, - max_results: Optional[int] = None, - convertType: bool = True -) -> pd.DataFrame: + monitoring_location_id: Optional[Union[str, List[str]]] = None, + parameter_code: Optional[Union[str, List[str]]] = None, + parameter_name: Optional[Union[str, List[str]]] = None, + properties: Optional[Union[str, List[str]]] = None, + statistic_id: Optional[Union[str, List[str]]] = None, + last_modified: Optional[Union[str, List[str]]] = None, + begin: Optional[Union[str, List[str]]] = None, + end: Optional[Union[str, List[str]]] = None, + unit_of_measure: Optional[Union[str, List[str]]] = None, + computation_period_identifier: Optional[Union[str, List[str]]] = None, + computation_identifier: Optional[Union[str, List[str]]] = None, + thresholds: Optional[int] = None, + sublocation_identifier: Optional[Union[str, List[str]]] = None, + primary: Optional[Union[str, List[str]]] = None, + parent_time_series_id: Optional[Union[str, List[str]]] = None, + time_series_id: Optional[Union[str, List[str]]] = None, + web_description: Optional[Union[str, List[str]]] = None, + skip_geometry: Optional[bool] = None, + time: Optional[Union[str, List[str]]] = None, + bbox: Optional[List[float]] = None, + limit: Optional[int] = None, + max_results: Optional[int] = None, + convert_type: bool = True, +) -> Tuple[pd.DataFrame, BaseMetadata]: """Daily data and continuous measurements are grouped into time series, which represent a collection of observations of a single parameter, potentially aggregated using a standard statistic, at a single monitoring @@ -602,30 +571,30 @@ def get_time_series_metadata( anything about the measurement has changed. You can query this field using date-times or intervals, adhering to RFC 3339, or using ISO 8601 duration objects. Intervals may be bounded or half-bounded (double-dots - at start or end). Examples: + at start or end). Only features that have a last_modified that + intersects the value of datetime are selected. + Examples: - A date-time: "2018-02-12T23:20:50Z" - A bounded interval: "2018-02-12T00:00:00Z/2018-03-18T12:31:12Z" - - Half-bounded intervals: "2018-02-12T00:00:00Z/.." or "../2018-03-18T12:31:12Z" - - Duration objects: "P1M" for data from the past month or "PT36H" for the last 36 hours - Only features that have a last_modified that intersects the value of datetime are selected. + - Half-bounded intervals: "2018-02-12T00:00:00Z/.." or + "../2018-03-18T12:31:12Z" + - Duration objects: "P1M" for data from the past month or "PT36H" + for the last 36 hours begin : string or list of strings, optional - The datetime of the earliest observation in the time series. Together with end, - this field represents the period of record of a time series. Note that some time - series may have large gaps in their collection record. This field is currently - in the local time of the monitoring location. We intend to update this in version - v0 to use UTC with a time zone. You can query this field using date-times or - intervals, adhering to RFC 3339, or using ISO 8601 duration objects. Intervals - may be bounded or half-bounded (double-dots at start or end). Examples: - + The datetime of the earliest observation in the time series. Together + with end, this field represents the period of record of a time series. + Note that some time series may have large gaps in their collection + record. This field is currently in the local time of the monitoring + location. We intend to update this in version v0 to use UTC with a time + zone. You can query this field using date-times or intervals, adhering + to RFC 3339, or using ISO 8601 duration objects. Intervals may be + bounded or half-bounded (double-dots at start or end). Only features + that have a begin that intersects the value of datetime are selected. + Examples: - A date-time: "2018-02-12T23:20:50Z" - - A bounded interval: "2018-02-12T00:00:00Z/2018-03-18T12:31:12Z" - - Half-bounded intervals: "2018-02-12T00:00:00Z/.." or "../2018-03-18T12:31:12Z" - - Duration objects: "P1M" for data from the past month or "PT36H" for the last 36 hours - - Only features that have a begin that intersects the value of datetime are selected. end : string or list of strings, optional The datetime of the most recent observation in the time series. Data returned by this endpoint updates at most once per day, and potentially less frequently than @@ -635,31 +604,30 @@ def get_time_series_metadata( determine whether a time series is "active". We intend to update this in version v0 to use UTC with a time zone. You can query this field using date-times or intervals, adhering to RFC 3339, or using ISO 8601 duration objects. Intervals - may be bounded or half-bounded (double-dots at start or end). Examples: - + may be bounded or half-bounded (double-dots at start or end). Only + features that have a end that intersects the value of datetime are + selected. + Examples: - A date-time: "2018-02-12T23:20:50Z" - - A bounded interval: "2018-02-12T00:00:00Z/2018-03-18T12:31:12Z" - - - Half-bounded intervals: "2018-02-12T00:00:00Z/.." or "../2018-03-18T12:31:12Z" - - - Duration objects: "P1M" for data from the past month or "PT36H" for the last 36 hours - - Only features that have a end that intersects the value of datetime are selected. + - Half-bounded intervals: "2018-02-12T00:00:00Z/.." or + "../2018-03-18T12:31:12Z" + - Duration objects: "P1M" for data from the past month or "PT36H" for + the last 36 hours unit_of_measure : string or list of strings, optional A human-readable description of the units of measurement associated with an observation. computation_period_identifier : string or list of strings, optional Indicates the period of data used for any statistical computations. computation_identifier : string or list of strings, optional - Indicates whether the data from this time series represent a specific statistical - computation. + Indicates whether the data from this time series represent a specific + statistical computation. thresholds : numeric or list of numbers, optional - Thresholds represent known numeric limits for a time series, for example the - historic maximum value for a parameter or a level below which a sensor is - non-operative. These thresholds are sometimes used to automatically determine if - an observation is erroneous due to sensor error, and therefore shouldn't be included - in the time series. + Thresholds represent known numeric limits for a time series, for example + the historic maximum value for a parameter or a level below which a + sensor is non-operative. These thresholds are sometimes used to + automatically determine if an observation is erroneous due to sensor + error, and therefore shouldn't be included in the time series. sublocation_identifier : string or list of strings, optional primary : string or list of strings, optional parent_time_series_id : string or list of strings, optional @@ -667,28 +635,31 @@ def get_time_series_metadata( A unique identifier representing a single time series. This corresponds to the id field in the time-series-metadata endpoint. web_description : string or list of strings, optional - A description of what this time series represents, as used by WDFN and other USGS - data dissemination products. - skipGeometry : boolean, optional - This option can be used to skip response geometries for each feature. The returning - object will be a data frame with no spatial information. + A description of what this time series represents, as used by WDFN and + other USGS data dissemination products. + skip_geometry : boolean, optional + This option can be used to skip response geometries for each feature. + The returning object will be a data frame with no spatial information. bbox : list of numbers, optional - Only features that have a geometry that intersects the bounding box are selected. - The bounding box is provided as four or six numbers, depending on whether the - coordinate reference system includes a vertical axis (height or depth). Coordinates - are assumed to be in crs 4326. The expected format is a numeric vector structured: - c(xmin,ymin,xmax,ymax). Another way to think of it is c(Western-most longitude, - Southern-most latitude, Eastern-most longitude, Northern-most longitude). + Only features that have a geometry that intersects the bounding box are + selected. The bounding box is provided as four or six numbers, + depending on whether the coordinate reference system includes a vertical + axis (height or depth). Coordinates are assumed to be in crs 4326. The + expected format is a numeric vector structured: c(xmin,ymin,xmax,ymax). + Another way to think of it is c(Western-most longitude, Southern-most + latitude, Eastern-most longitude, Northern-most longitude). limit : numeric, optional - The optional limit parameter is used to control the subset of the selected features - that should be returned in each page. The maximum allowable limit is 10000. It may - be beneficial to set this number lower if your internet connection is spotty. The - default (None) will set the limit to the maximum allowable limit for the service. + The optional limit parameter is used to control the subset of the + selected features that should be returned in each page. The maximum + allowable limit is 10000. It may be beneficial to set this number lower + if your internet connection is spotty. The default (None) will set the + limit to the maximum allowable limit for the service. max_results : numeric, optional - The optional maximum number of rows to return. This value must be less than the - requested limit. - convertType : boolean, optional - If True, the function will convert the data to dates and qualifier to string vector + The optional maximum number of rows to return. This value must be less + than the requested limit. + convert_type : boolean, optional + If True, the function will convert the data to dates and qualifier to + string vector Returns ------- @@ -701,48 +672,50 @@ def get_time_series_metadata( >>> # Get daily flow data from a single site >>> # over a yearlong period - >>> df = dataretrieval.waterdata.get_time_series_metadata( - ... monitoring_location_id = "USGS-02238500", - ... parameter_code = "00060", - ... time = "2021-01-01T00:00:00Z/2022-01-01T00:00:00Z" + >>> df, metadata = dataretrieval.waterdata.get_time_series_metadata( + ... monitoring_location_id="USGS-02238500", + ... parameter_code="00060", + ... time="2021-01-01T00:00:00Z/2022-01-01T00:00:00Z", ... ) >>> # Get monitoring location info for specific sites >>> # and only specific properties - >>> df = dataretrieval.waterdata.get_time_series_metadata( + >>> df, metadata = dataretrieval.waterdata.get_time_series_metadata( ... monitoring_location_id = ["USGS-05114000", "USGS-09423350"], ... time = "2024-01-01/.." - """ + """ service = "time-series-metadata" output_id = "time_series_id" # Build argument dictionary, omitting None values - args = { - k: v for k, v in locals().items() + args = { + k: v + for k, v in locals().items() if k not in {"service", "output_id"} and v is not None } - return waterdata_helpers.get_ogc_data(args, output_id, service) + return get_ogc_data(args, output_id, service) + def get_latest_continuous( - monitoring_location_id: Optional[Union[str, List[str]]] = None, - parameter_code: Optional[Union[str, List[str]]] = None, - statistic_id: Optional[Union[str, List[str]]] = None, - properties: Optional[Union[str, List[str]]] = None, - time_series_id: Optional[Union[str, List[str]]] = None, - latest_continuous_id: Optional[Union[str, List[str]]] = None, - approval_status: Optional[Union[str, List[str]]] = None, - unit_of_measure: Optional[Union[str, List[str]]] = None, - qualifier: Optional[Union[str, List[str]]] = None, - value: Optional[int] = None, - last_modified: Optional[Union[str, List[str]]] = None, - skipGeometry: Optional[bool] = None, - time: Optional[Union[str, List[str]]] = None, - bbox: Optional[List[float]] = None, - limit: Optional[int] = None, - max_results: Optional[int] = None, - convertType: bool = True - ) -> pd.DataFrame: + monitoring_location_id: Optional[Union[str, List[str]]] = None, + parameter_code: Optional[Union[str, List[str]]] = None, + statistic_id: Optional[Union[str, List[str]]] = None, + properties: Optional[Union[str, List[str]]] = None, + time_series_id: Optional[Union[str, List[str]]] = None, + latest_continuous_id: Optional[Union[str, List[str]]] = None, + approval_status: Optional[Union[str, List[str]]] = None, + unit_of_measure: Optional[Union[str, List[str]]] = None, + qualifier: Optional[Union[str, List[str]]] = None, + value: Optional[int] = None, + last_modified: Optional[Union[str, List[str]]] = None, + skip_geometry: Optional[bool] = None, + time: Optional[Union[str, List[str]]] = None, + bbox: Optional[List[float]] = None, + limit: Optional[int] = None, + max_results: Optional[int] = None, + convert_type: bool = True, +) -> Tuple[pd.DataFrame, BaseMetadata]: """This endpoint provides the most recent observation for each time series of continuous data. Continuous data are collected via automated sensors installed at a monitoring location. They are collected at a high frequency @@ -759,14 +732,14 @@ def get_latest_continuous( monitoring_location_id : string or list of strings, optional A unique identifier representing a single monitoring location. This corresponds to the id field in the monitoring-locations endpoint. - Monitoring location IDs are created by combining the agency code of - the agency responsible for the monitoring location (e.g. USGS) with - the ID number of the monitoring location (e.g. 02238500), separated - by a hyphen (e.g. USGS-02238500). + Monitoring location IDs are created by combining the agency code of the + agency responsible for the monitoring location (e.g. USGS) with the ID + number of the monitoring location (e.g. 02238500), separated by a hyphen + (e.g. USGS-02238500). parameter_code : string or list of strings, optional Parameter codes are 5-digit codes used to identify the constituent - measured and the units of measure. A complete list of parameter - codes and associated groupings can be found at + measured and the units of measure. A complete list of parameter codes + and associated groupings can be found at https://help.waterdata.usgs.gov/codes-and-parameters/parameters. statistic_id : string or list of strings, optional A code corresponding to the statistic an observation represents. @@ -774,33 +747,25 @@ def get_latest_continuous( A complete list of codes and their descriptions can be found at https://help.waterdata.usgs.gov/code/stat_cd_nm_query?stat_nm_cd=%25&fmt=html. properties : string or list of strings, optional - A vector of requested columns to be returned from the query. - Available options are: geometry, id, time_series_id, - monitoring_location_id, parameter_code, statistic_id, time, value, - unit_of_measure, approval_status, qualifier, last_modified + A vector of requested columns to be returned from the query. Available + options are: geometry, id, time_series_id, monitoring_location_id, + parameter_code, statistic_id, time, value, unit_of_measure, + approval_status, qualifier, last_modified time_series_id : string or list of strings, optional A unique identifier representing a single time series. This corresponds to the id field in the time-series-metadata endpoint. latest_continuous_id : string or list of strings, optional - A universally unique identifier (UUID) representing a single - version of a record. It is not stable over time. Every time the - record is refreshed in our database (which may happen as part of - normal operations and does not imply any change to the data itself) - a new ID will be generated. To uniquely identify a single observation - over time, compare the time and time_series_id fields; each time series - will only have a single observation at a given time. + A universally unique identifier (UUID) representing a single version of + a record. It is not stable over time. Every time the record is refreshed + in our database (which may happen as part of normal operations and does + not imply any change to the data itself) a new ID will be generated. To + uniquely identify a single observation over time, compare the time and + time_series_id fields; each time series will only have a single + observation at a given time. approval_status : string or list of strings, optional - Some of the data that you have obtained from this U.S. Geological - Survey database may not have received Director's approval. Any such - data values are qualified as provisional and are subject to revision. - Provisional data are released on the condition that neither the USGS - nor the United States Government may be held liable for any damages - resulting from its use. This field reflects the approval status of - each record, and is either "Approved", meaining processing review has - been completed and the data is approved for publication, or - "Provisional" and subject to revision. For more information about - provisional data, go to - https://waterdata.usgs.gov/provisional-data-statement/. + Some of the data that you have obtained from this U.S. Geological Survey + database may not have received Director's approval. Any such data values + are qualified as provisional and are subject to revision. unit_of_measure : string or list of strings, optional A human-readable description of the units of measurement associated with an observation. @@ -817,44 +782,54 @@ def get_latest_continuous( anything about the measurement has changed. You can query this field using date-times or intervals, adhering to RFC 3339, or using ISO 8601 duration objects. Intervals may be bounded or half-bounded (double-dots - at start or end). Examples: + at start or end). Only features that have a last_modified that + intersects the value of datetime are selected. + Examples: - A date-time: "2018-02-12T23:20:50Z" - A bounded interval: "2018-02-12T00:00:00Z/2018-03-18T12:31:12Z" - - Half-bounded intervals: "2018-02-12T00:00:00Z/.." or "../2018-03-18T12:31:12Z" - - Duration objects: "P1M" for data from the past month or "PT36H" for the last 36 hours - Only features that have a last_modified that intersects the value of datetime are selected. - skipGeometry : boolean, optional - This option can be used to skip response geometries for each feature. The returning - object will be a data frame with no spatial information. + - Half-bounded intervals: "2018-02-12T00:00:00Z/.." or + "../2018-03-18T12:31:12Z" + - Duration objects: "P1M" for data from the past month or "PT36H" + for the last 36 hours + skip_geometry : boolean, optional + This option can be used to skip response geometries for each feature. + The returning object will be a data frame with no spatial information. time : string, optional - The date an observation represents. You can query this field using date-times - or intervals, adhering to RFC 3339, or using ISO 8601 duration objects. - Intervals may be bounded or half-bounded (double-dots at start or end). + The date an observation represents. You can query this field using + date-times or intervals, adhering to RFC 3339, or using ISO 8601 + duration objects. Intervals may be bounded or half-bounded (double-dots + at start or end). Only features that have a time that intersects the + value of datetime are selected. If a feature has multiple temporal + properties, it is the decision of the server whether only a single + temporal property is used to determine the extent or all relevant + temporal properties. Examples: - A date-time: "2018-02-12T23:20:50Z" - A bounded interval: "2018-02-12T00:00:00Z/2018-03-18T12:31:12Z" - - Half-bounded intervals: "2018-02-12T00:00:00Z/.." or "../2018-03-18T12:31:12Z" - - Duration objects: "P1M" for data from the past month or "PT36H" for the last 36 hours - Only features that have a time that intersects the value of datetime are selected. If - a feature has multiple temporal properties, it is the decision of the server whether - only a single temporal property is used to determine the extent or all relevant temporal properties. + - Half-bounded intervals: "2018-02-12T00:00:00Z/.." or + "../2018-03-18T12:31:12Z" + - Duration objects: "P1M" for data from the past month or "PT36H" + for the last 36 hours bbox : list of numbers, optional - Only features that have a geometry that intersects the bounding box are selected. - The bounding box is provided as four or six numbers, depending on whether the - coordinate reference system includes a vertical axis (height or depth). Coordinates - are assumed to be in crs 4326. The expected format is a numeric vector structured: - c(xmin,ymin,xmax,ymax). Another way to think of it is c(Western-most longitude, - Southern-most latitude, Eastern-most longitude, Northern-most longitude). + Only features that have a geometry that intersects the bounding box are + selected. The bounding box is provided as four or six numbers, + depending on whether the coordinate reference system includes a vertical + axis (height or depth). Coordinates are assumed to be in crs 4326. The + expected format is a numeric vector structured: c(xmin,ymin,xmax,ymax). + Another way to think of it is c(Western-most longitude, Southern-most + latitude, Eastern-most longitude, Northern-most longitude). limit : numeric, optional - The optional limit parameter is used to control the subset of the selected features - that should be returned in each page. The maximum allowable limit is 10000. It may - be beneficial to set this number lower if your internet connection is spotty. The - default (None) will set the limit to the maximum allowable limit for the service. + The optional limit parameter is used to control the subset of the + selected features that should be returned in each page. The maximum + allowable limit is 10000. It may be beneficial to set this number lower + if your internet connection is spotty. The default (None) will set the + limit to the maximum allowable limit for the service. max_results : numeric, optional - The optional maximum number of rows to return. This value must be less than the - requested limit. - convertType : boolean, optional - If True, the function will convert the data to dates and qualifier to string vector + The optional maximum number of rows to return. This value must be less + than the requested limit. + convert_type : boolean, optional + If True, the function will convert the data to dates and qualifier to + string vector Returns ------- @@ -868,93 +843,85 @@ def get_latest_continuous( >>> # Get daily flow data from a single site >>> # over a yearlong period >>> df = dataretrieval.waterdata.get_latest_continuous( - ... monitoring_location_id = "USGS-02238500", - ... parameter_code = "00060" + ... monitoring_location_id="USGS-02238500", parameter_code="00060" ... ) >>> # Get monitoring location info for specific sites >>> # and only specific properties >>> df = dataretrieval.waterdata.get_daily( - ... monitoring_location_id = ["USGS-05114000", "USGS-09423350"] + ... monitoring_location_id=["USGS-05114000", "USGS-09423350"] ... ) """ service = "latest-continuous" output_id = "latest_continuous_id" # Build argument dictionary, omitting None values - args = { - k: v for k, v in locals().items() + args = { + k: v + for k, v in locals().items() if k not in {"service", "output_id"} and v is not None } - return waterdata_helpers.get_ogc_data(args, output_id, service) + return get_ogc_data(args, output_id, service) + def get_field_measurements( - monitoring_location_id: Optional[Union[str, List[str]]] = None, - parameter_code: Optional[Union[str, List[str]]] = None, - observing_procedure_code: Optional[Union[str, List[str]]] = None, - properties: Optional[List[str]] = None, - field_visit_id: Optional[Union[str, List[str]]] = None, - approval_status: Optional[Union[str, List[str]]] = None, - unit_of_measure: Optional[Union[str, List[str]]] = None, - qualifier: Optional[Union[str, List[str]]] = None, - value: Optional[Union[str, List[str]]] = None, - last_modified: Optional[Union[str, List[str]]] = None, - observing_procedure: Optional[Union[str, List[str]]] = None, - vertical_datum: Optional[Union[str, List[str]]] = None, - measuring_agency: Optional[Union[str, List[str]]] = None, - skipGeometry: Optional[bool] = None, - time: Optional[Union[str, List[str]]] = None, - bbox: Optional[List[float]] = None, - limit: Optional[int] = None, - max_results: Optional[int] = None, - convertType: bool = True - ) -> pd.DataFrame: - """Field measurements are physically measured values collected during - a visit to the monitoring location. Field measurements consist of - measurements of gage height and discharge, and readings of groundwater - levels, and are primarily used as calibration readings for the automated - sensors collecting continuous data. They are collected at a low frequency, - and delivery of the data in WDFN may be delayed due to data processing - time. + monitoring_location_id: Optional[Union[str, List[str]]] = None, + parameter_code: Optional[Union[str, List[str]]] = None, + observing_procedure_code: Optional[Union[str, List[str]]] = None, + properties: Optional[List[str]] = None, + field_visit_id: Optional[Union[str, List[str]]] = None, + approval_status: Optional[Union[str, List[str]]] = None, + unit_of_measure: Optional[Union[str, List[str]]] = None, + qualifier: Optional[Union[str, List[str]]] = None, + value: Optional[Union[str, List[str]]] = None, + last_modified: Optional[Union[str, List[str]]] = None, + observing_procedure: Optional[Union[str, List[str]]] = None, + vertical_datum: Optional[Union[str, List[str]]] = None, + measuring_agency: Optional[Union[str, List[str]]] = None, + skip_geometry: Optional[bool] = None, + time: Optional[Union[str, List[str]]] = None, + bbox: Optional[List[float]] = None, + limit: Optional[int] = None, + max_results: Optional[int] = None, + convert_type: bool = True, +) -> Tuple[pd.DataFrame, BaseMetadata]: + """Field measurements are physically measured values collected during a + visit to the monitoring location. Field measurements consist of measurements + of gage height and discharge, and readings of groundwater levels, and are + primarily used as calibration readings for the automated sensors collecting + continuous data. They are collected at a low frequency, and delivery of the + data in WDFN may be delayed due to data processing time. Parameters ---------- monitoring_location_id : string or list of strings, optional A unique identifier representing a single monitoring location. This corresponds to the id field in the monitoring-locations endpoint. - Monitoring location IDs are created by combining the agency code of - the agency responsible for the monitoring location (e.g. USGS) with - the ID number of the monitoring location (e.g. 02238500), separated - by a hyphen (e.g. USGS-02238500). + Monitoring location IDs are created by combining the agency code of the + agency responsible for the monitoring location (e.g. USGS) with the ID + number of the monitoring location (e.g. 02238500), separated by a hyphen + (e.g. USGS-02238500). parameter_code : string or list of strings, optional Parameter codes are 5-digit codes used to identify the constituent - measured and the units of measure. A complete list of parameter - codes and associated groupings can be found at + measured and the units of measure. A complete list of parameter codes + and associated groupings can be found at https://help.waterdata.usgs.gov/codes-and-parameters/parameters. observing_procedure_code : string or list of strings, optional A short code corresponding to the observing procedure for the field measurement. properties : string or list of strings, optional - A vector of requested columns to be returned from the query. - Available options are: geometry, id, time_series_id, - monitoring_location_id, parameter_code, statistic_id, time, value, - unit_of_measure, approval_status, qualifier, last_modified + A vector of requested columns to be returned from the query. Available + options are: geometry, id, time_series_id, monitoring_location_id, + parameter_code, statistic_id, time, value, unit_of_measure, + approval_status, qualifier, last_modified field_visit_id : string or list of strings, optional A universally unique identifier (UUID) for the field visit. Multiple measurements may be made during a single field visit. approval_status : string or list of strings, optional - Some of the data that you have obtained from this U.S. Geological - Survey database may not have received Director's approval. Any such - data values are qualified as provisional and are subject to revision. - Provisional data are released on the condition that neither the USGS - nor the United States Government may be held liable for any damages - resulting from its use. This field reflects the approval status of - each record, and is either "Approved", meaining processing review has - been completed and the data is approved for publication, or - "Provisional" and subject to revision. For more information about - provisional data, go to - https://waterdata.usgs.gov/provisional-data-statement/. + Some of the data that you have obtained from this U.S. Geological Survey + database may not have received Director's approval. Any such data values + are qualified as provisional and are subject to revision. unit_of_measure : string or list of strings, optional A human-readable description of the units of measurement associated with an observation. @@ -971,12 +938,13 @@ def get_field_measurements( anything about the measurement has changed. You can query this field using date-times or intervals, adhering to RFC 3339, or using ISO 8601 duration objects. Intervals may be bounded or half-bounded (double-dots - at start or end). Examples: + at start or end). Only features that have a last_modified that + intersects the value of datetime are selected. + Examples: - A date-time: "2018-02-12T23:20:50Z" - A bounded interval: "2018-02-12T00:00:00Z/2018-03-18T12:31:12Z" - Half-bounded intervals: "2018-02-12T00:00:00Z/.." or "../2018-03-18T12:31:12Z" - Duration objects: "P1M" for data from the past month or "PT36H" for the last 36 hours - Only features that have a last_modified that intersects the value of datetime are selected. observing_procedure : string or list of strings, optional Water measurement or water-quality observing procedure descriptions. vertical_datum : string or list of strings, optional @@ -984,38 +952,44 @@ def get_field_measurements( A list of codes is available. measuring_agency : string or list of strings, optional The agency performing the measurement. - skipGeometry : boolean, optional + skip_geometry : boolean, optional This option can be used to skip response geometries for each feature. The returning object will be a data frame with no spatial information. time : string, optional The date an observation represents. You can query this field using date-times or intervals, adhering to RFC 3339, or using ISO 8601 duration objects. Intervals may be bounded or half-bounded (double-dots at start or end). + Only features that have a time that intersects the value of datetime are + selected. If a feature has multiple temporal properties, it is the + decision of the server whether only a single temporal property is used + to determine the extent or all relevant temporal properties. Examples: - A date-time: "2018-02-12T23:20:50Z" - A bounded interval: "2018-02-12T00:00:00Z/2018-03-18T12:31:12Z" - - Half-bounded intervals: "2018-02-12T00:00:00Z/.." or "../2018-03-18T12:31:12Z" - - Duration objects: "P1M" for data from the past month or "PT36H" for the last 36 hours - Only features that have a time that intersects the value of datetime are selected. If - a feature has multiple temporal properties, it is the decision of the server whether - only a single temporal property is used to determine the extent or all relevant temporal properties. + - Half-bounded intervals: "2018-02-12T00:00:00Z/.." or + "../2018-03-18T12:31:12Z" + - Duration objects: "P1M" for data from the past month or "PT36H" + for the last 36 hours bbox : list of numbers, optional - Only features that have a geometry that intersects the bounding box are selected. - The bounding box is provided as four or six numbers, depending on whether the - coordinate reference system includes a vertical axis (height or depth). Coordinates - are assumed to be in crs 4326. The expected format is a numeric vector structured: - c(xmin,ymin,xmax,ymax). Another way to think of it is c(Western-most longitude, - Southern-most latitude, Eastern-most longitude, Northern-most longitude). + Only features that have a geometry that intersects the bounding box are + selected. The bounding box is provided as four or six numbers, + depending on whether the coordinate reference system includes a vertical + axis (height or depth). Coordinates are assumed to be in crs 4326. The + expected format is a numeric vector structured: c(xmin,ymin,xmax,ymax). + Another way to think of it is c(Western-most longitude, Southern-most + latitude, Eastern-most longitude, Northern-most longitude). limit : numeric, optional - The optional limit parameter is used to control the subset of the selected features - that should be returned in each page. The maximum allowable limit is 10000. It may - be beneficial to set this number lower if your internet connection is spotty. The - default (None) will set the limit to the maximum allowable limit for the service. + The optional limit parameter is used to control the subset of the + selected features that should be returned in each page. The maximum + allowable limit is 10000. It may be beneficial to set this number lower + if your internet connection is spotty. The default (None) will set the + limit to the maximum allowable limit for the service. max_results : numeric, optional - The optional maximum number of rows to return. This value must be less than the - requested limit. - convertType : boolean, optional - If True, the function will convert the data to dates and qualifier to string vector + The optional maximum number of rows to return. This value must be less + than the requested limit. + convert_type : boolean, optional + If True, the function will convert the data to dates and qualifier to + string vector Returns ------- @@ -1029,9 +1003,9 @@ def get_field_measurements( >>> # Get daily flow data from a single site >>> # over a yearlong period >>> df = dataretrieval.waterdata.get_field_measurements( - ... monitoring_location_id = "USGS-375907091432201", - ... parameter_code = "72019", - ... skipGeometry = True + ... monitoring_location_id="USGS-375907091432201", + ... parameter_code="72019", + ... skip_geometry=True, ... ) >>> # Get monitoring location info for specific sites @@ -1047,16 +1021,18 @@ def get_field_measurements( output_id = "field_measurement_id" # Build argument dictionary, omitting None values - args = { - k: v for k, v in locals().items() + args = { + k: v + for k, v in locals().items() if k not in {"service", "output_id"} and v is not None } - return waterdata_helpers.get_ogc_data(args, output_id, service) - -def get_codes(code_service: _CODE_SERVICES) -> DataFrame: + return get_ogc_data(args, output_id, service) + + +def get_codes(code_service: CODE_SERVICES) -> pd.DataFrame: """Return codes from a Samples code service. - + Parameters ---------- code_service : string @@ -1064,30 +1040,31 @@ def get_codes(code_service: _CODE_SERVICES) -> DataFrame: "sitetype", "samplemedia", "characteristicgroup", "characteristics", or "observedproperty" """ - valid_code_services = get_args(_CODE_SERVICES) + valid_code_services = get_args(CODE_SERVICES) if code_service not in valid_code_services: raise ValueError( f"Invalid code service: '{code_service}'. " f"Valid options are: {valid_code_services}." ) - url = f"{_SAMPLES_URL}/codeservice/{code_service}?mimeType=application%2Fjson" - + url = f"{SAMPLES_URL}/codeservice/{code_service}?mimeType=application%2Fjson" + response = requests.get(url) - + response.raise_for_status() data_dict = json.loads(response.text) - data_list = data_dict['data'] + data_list = data_dict["data"] df = pd.DataFrame(data_list) return df + def get_samples( ssl_check: bool = True, - service: _SERVICES = "results", - profile: _PROFILES = "fullphyschem", + service: SERVICES = "results", + profile: PROFILES = "fullphyschem", activityMediaName: Optional[Union[str, list[str]]] = None, activityStartDateLower: Optional[str] = None, activityStartDateUpper: Optional[str] = None, @@ -1110,7 +1087,7 @@ def get_samples( pointLocationWithinMiles: Optional[float] = None, projectIdentifier: Optional[Union[str, list[str]]] = None, recordIdentifierUserSupplied: Optional[Union[str, list[str]]] = None, -) -> Tuple[DataFrame, BaseMetadata]: +) -> Tuple[pd.DataFrame, BaseMetadata]: """Search Samples database for USGS water quality data. This is a wrapper function for the Samples database API. All potential filters are provided as arguments to the function, but please do not @@ -1177,7 +1154,7 @@ def get_samples( A user supplied characteristic name describing one or more results. boundingBox: list of four floats, optional Filters on the the associated monitoring location's point location - by checking if it is located within the specified geographic area. + by checking if it is located within the specified geographic area. The logic is inclusive, i.e. it will include locations that overlap with the edge of the bounding box. Values are separated by commas, expressed in decimal degrees, NAD83, and longitudes west of Greenwich @@ -1186,7 +1163,7 @@ def get_samples( - Western-most longitude - Southern-most latitude - Eastern-most longitude - - Northern-most longitude + - Northern-most longitude Example: [-92.8,44.2,-88.9,46.0] countryFips : string or list of strings, optional Example: "US" (United States) @@ -1209,7 +1186,7 @@ def get_samples( usgsPCode : string or list of strings, optional 5-digit number used in the US Geological Survey computerized data system, National Water Information System (NWIS), to - uniquely identify a specific constituent. Check the + uniquely identify a specific constituent. Check the `characteristic_lookup()` function in this module for all possible inputs. Example: "00060" (Discharge, cubic feet per second) @@ -1239,7 +1216,7 @@ def get_samples( recordIdentifierUserSupplied : string or list of strings, optional Internal AQS record identifier that returns 1 entry. Only available for the "results" service. - + Returns ------- df : ``pandas.DataFrame`` @@ -1253,8 +1230,8 @@ def get_samples( >>> # Get PFAS results within a bounding box >>> df, md = dataretrieval.waterdata.get_samples( - ... boundingBox=[-90.2,42.6,-88.7,43.2], - ... characteristicGroup="Organics, PFAS" + ... boundingBox=[-90.2, 42.6, -88.7, 43.2], + ... characteristicGroup="Organics, PFAS", ... ) >>> # Get all activities for the Commonwealth of Virginia over a date range @@ -1263,34 +1240,38 @@ def get_samples( ... profile="sampact", ... activityStartDateLower="2023-10-01", ... activityStartDateUpper="2024-01-01", - ... stateFips="US:51") + ... stateFips="US:51", + ... ) >>> # Get all pH samples for two sites in Utah >>> df, md = dataretrieval.waterdata.get_samples( - ... monitoringLocationIdentifier=['USGS-393147111462301', 'USGS-393343111454101'], - ... usgsPCode='00400') + ... monitoringLocationIdentifier=[ + ... "USGS-393147111462301", + ... "USGS-393343111454101", + ... ], + ... usgsPCode="00400", + ... ) """ _check_profiles(service, profile) params = { - k: v for k, v in locals().items() - if k not in ["ssl_check", "service", "profile"] - and v is not None - } - + k: v + for k, v in locals().items() + if k not in ["ssl_check", "service", "profile"] and v is not None + } params.update({"mimeType": "text/csv"}) if "boundingBox" in params: params["boundingBox"] = to_str(params["boundingBox"]) - url = f"{_SAMPLES_URL}/{service}/{profile}" + url = f"{SAMPLES_URL}/{service}/{profile}" req = PreparedRequest() req.prepare_url(url, params=params) - print(f"Request: {req.url}") + logger.info("Request: %s", req.url) response = requests.get(url, params=params, verify=ssl_check) @@ -1300,9 +1281,10 @@ def get_samples( return df, BaseMetadata(response) + def _check_profiles( - service: _SERVICES, - profile: _PROFILES, + service: SERVICES, + profile: PROFILES, ) -> None: """Check whether a service profile is valid. @@ -1313,19 +1295,17 @@ def _check_profiles( profile : string One of the profile names from "results_profiles", "locations_profiles", "activities_profiles", - "projects_profiles" or "organizations_profiles". + "projects_profiles" or "organizations_profiles". """ - valid_services = get_args(_SERVICES) + valid_services = get_args(SERVICES) if service not in valid_services: raise ValueError( - f"Invalid service: '{service}'. " - f"Valid options are: {valid_services}." + f"Invalid service: '{service}'. Valid options are: {valid_services}." ) - valid_profiles = _PROFILE_LOOKUP[service] + valid_profiles = PROFILE_LOOKUP[service] if profile not in valid_profiles: raise ValueError( f"Invalid profile: '{profile}' for service '{service}'. " f"Valid options are: {valid_profiles}." ) - diff --git a/dataretrieval/waterdata/types.py b/dataretrieval/waterdata/types.py new file mode 100644 index 00000000..07e000c0 --- /dev/null +++ b/dataretrieval/waterdata/types.py @@ -0,0 +1,56 @@ +from typing import Literal + + +CODE_SERVICES = Literal[ + "characteristicgroup", + "characteristics", + "counties", + "countries", + "observedproperty", + "samplemedia", + "sitetype", + "states", +] + +SERVICES = Literal[ + "activities", + "locations", + "organizations", + "projects", + "results", +] + +PROFILES = Literal[ + "actgroup", + "actmetric", + "basicbio", + "basicphyschem", + "count", + "fullbio", + "fullphyschem", + "labsampleprep", + "narrow", + "organization", + "project", + "projectmonitoringlocationweight", + "resultdetectionquantitationlimit", + "sampact", + "site", +] + +PROFILE_LOOKUP = { + "activities": ["sampact", "actmetric", "actgroup", "count"], + "locations": ["site", "count"], + "organizations": ["organization", "count"], + "projects": ["project", "projectmonitoringlocationweight"], + "results": [ + "fullphyschem", + "basicphyschem", + "fullbio", + "basicbio", + "narrow", + "resultdetectionquantitationlimit", + "labsampleprep", + "count", + ], +} diff --git a/dataretrieval/waterdata_helpers.py b/dataretrieval/waterdata/utils.py similarity index 50% rename from dataretrieval/waterdata_helpers.py rename to dataretrieval/waterdata/utils.py index 70e9530c..10857503 100644 --- a/dataretrieval/waterdata_helpers.py +++ b/dataretrieval/waterdata/utils.py @@ -1,54 +1,30 @@ -import httpx +import requests import os -from typing import List, Dict, Any, Optional, Union +import logging +from typing import List, Dict, Any, Optional, Union, Tuple from datetime import datetime import pandas as pd import json from zoneinfo import ZoneInfo import re -try: - import geopandas as gpd - geopd = True -except ImportError: - geopd = False - - - -BASE_API = "https://api.waterdata.usgs.gov/ogcapi/" -API_VERSION = "v0" - -# --- Caching for repeated calls --- -_cached_base_url = None -def _base_url(): - """ - Returns the base URL for the USGS Water Data APIs. - Uses a cached value to avoid repeated string formatting. If the cached value - is not set, it constructs the base URL using the BASE_API and API_VERSION constants. +from dataretrieval.utils import BaseMetadata - Returns: - str: The base URL for the API (e.g., "https://api.waterdata.usgs.gov/ogcapi/v0/"). - """ - global _cached_base_url - if _cached_base_url is None: - _cached_base_url = f"{BASE_API}{API_VERSION}/" - return _cached_base_url +try: + import geopandas as gpd -def _setup_api(service: str): - """ - Constructs and returns the API endpoint URL for a specified service. + GEOPANDAS = True +except ImportError: + GEOPANDAS = False - Args: - service (str): The name of the service to be used in the API endpoint. +# Set up logger for this module +logger = logging.getLogger(__name__) - Returns: - str: The full URL for the API endpoint corresponding to the given service. +BASE_URL = "https://api.waterdata.usgs.gov" +OGC_API_VERSION = "v0" +OGC_API_URL = f"{BASE_URL}/ogcapi/{OGC_API_VERSION}" +SAMPLES_URL = f"{BASE_URL}/samples-data" - Example: - >>> _setup_api("daily") - 'https://api.waterdata.usgs.gov/ogcapi/v0/collections/daily/items' - """ - return f"{_base_url()}collections/{service}/items" def _switch_arg_id(ls: Dict[str, Any], id_name: str, service: str): """ @@ -59,16 +35,25 @@ def _switch_arg_id(ls: Dict[str, Any], id_name: str, service: str): with the value from either the service name or the expected id column name. If neither key exists, "id" will be set to None. - Example: for service "time-series-metadata", the function will look for either "time_series_metadata_id" - or "time_series_id" and change the key to simply "id". + Parameters + ---------- + ls : Dict[str, Any] + The dictionary containing identifier keys to be standardized. + id_name : str + The name of the specific identifier key to look for. + service : str + The service name. - Args: - ls (Dict[str, Any]): The dictionary containing identifier keys to be standardized. - id_name (str): The name of the specific identifier key to look for. - service (str): The service name. + Returns + ------- + Dict[str, Any] + The modified dictionary with the "id" key set appropriately. - Returns: - Dict[str, Any]: The modified dictionary with the "id" key set appropriately. + Examples + -------- + For service "time-series-metadata", the function will look for either + "time_series_metadata_id" or "time_series_id" and change the key to simply + "id". """ service_id = service.replace("-", "_") + "_id" @@ -88,22 +73,33 @@ def _switch_arg_id(ls: Dict[str, Any], id_name: str, service: str): def _switch_properties_id(properties: Optional[List[str]], id_name: str, service: str): """ - Switch properties id from its package-specific identifier to the standardized "id" key - that the API recognizes. + Switch properties id from its package-specific identifier to the + standardized "id" key that the API recognizes. - Sets the "id" key in the provided dictionary `ls` with the value from either the service name - or the expected id column name. If neither key exists, "id" will be set to None. - - Example: for service "monitoring-locations", it will look for "monitoring_location_id" and change - it to "id". + Sets the "id" key in the provided dictionary `ls` with the value from either + the service name or the expected id column name. If neither key exists, "id" + will be set to None. - Args: - properties (List[str]): A list containing the properties or column names to be pulled from the service. - id_name (str): The name of the specific identifier key to look for. - service (str): The service name. + Parameters + ---------- + properties : Optional[List[str]] + A list containing the properties or column names to be pulled from the + service, or None. + id_name : str + The name of the specific identifier key to look for. + service : str + The service name. - Returns: - List[str]: The modified list with the "id" key set appropriately. + Returns + ------- + List[str] + The modified list with the "id" key set appropriately. + + Examples + -------- + For service "monitoring-locations", it will look for + "monitoring_location_id" and change + it to "id". """ if not properties: return [] @@ -119,197 +115,276 @@ def _switch_properties_id(properties: Optional[List[str]], id_name: str, service # Remove unwanted fields return [p for p in properties if p not in ["geometry", service_id]] -def _format_api_dates(datetime_input: Union[str, List[str]], date: bool = False) -> Union[str, None]: + +def _format_api_dates( + datetime_input: Union[str, List[str]], date: bool = False +) -> Union[str, None]: """ - Formats date or datetime input(s) for use with an API, handling single values or ranges, and converting to ISO 8601 or date-only formats as needed. + Formats date or datetime input(s) for use with an API. + + Handles single values or ranges, and converting to ISO 8601 or date-only + formats as needed. + Parameters ---------- datetime_input : Union[str, List[str]] - A single date/datetime string or a list of one or two date/datetime strings. Accepts formats like "%Y-%m-%d %H:%M:%S", ISO 8601, or relative periods (e.g., "P7D"). + A single date/datetime string or a list of one or two date/datetime + strings. Accepts formats like "%Y-%m-%d %H:%M:%S", ISO 8601, or relative + periods (e.g., "P7D"). date : bool, optional - If True, uses only the date portion ("YYYY-MM-DD"). If False (default), returns full datetime in UTC ISO 8601 format ("YYYY-MM-DDTHH:MM:SSZ"). + If True, uses only the date portion ("YYYY-MM-DD"). If False (default), + returns full datetime in UTC ISO 8601 format ("YYYY-MM-DDTHH:MM:SSZ"). + Returns ------- Union[str, None] - - If input is a single value, returns the formatted date/datetime string or None if parsing fails. - - If input is a list of two values, returns a date/datetime range string separated by "/" (e.g., "YYYY-MM-DD/YYYY-MM-DD" or "YYYY-MM-DDTHH:MM:SSZ/YYYY-MM-DDTHH:MM:SSZ"). + - If input is a single value, returns the formatted date/datetime string + or None if parsing fails. + - If input is a list of two values, returns a date/datetime range string + separated by "/" (e.g., "YYYY-MM-DD/YYYY-MM-DD" or + "YYYY-MM-DDTHH:MM:SSZ/YYYY-MM-DDTHH:MM:SSZ"). - Returns None if input is empty, all NA, or cannot be parsed. + Raises ------ ValueError If `datetime_input` contains more than two values. + Notes ----- - Handles blank or NA values by returning None. - - Supports relative period strings (e.g., "P7D") and passes them through unchanged. - - Converts datetimes to UTC and formats as ISO 8601 with 'Z' suffix when `date` is False. + - Supports relative period strings (e.g., "P7D") and passes them through + unchanged. + - Converts datetimes to UTC and formats as ISO 8601 with 'Z' suffix when + `date` is False. - For date ranges, replaces "nan" with ".." in the output. """ # Get timezone local_timezone = datetime.now().astimezone().tzinfo - + # Convert single string to list for uniform processing if isinstance(datetime_input, str): datetime_input = [datetime_input] - + # Check for null or all NA and return None if all(pd.isna(dt) or dt == "" or dt == None for dt in datetime_input): return None - if len(datetime_input) <=2: + if len(datetime_input) <= 2: # If the list is of length 1, first look for things like "P7D" or dates # already formatted in ISO08601. Otherwise, try to coerce to datetime - if len(datetime_input) == 1 and re.search(r"P", datetime_input[0], re.IGNORECASE) or "/" in datetime_input[0]: + if ( + len(datetime_input) == 1 + and re.search(r"P", datetime_input[0], re.IGNORECASE) + or "/" in datetime_input[0] + ): return datetime_input[0] # Otherwise, use list comprehension to parse dates else: try: # Parse to naive datetime - parsed_dates = [datetime.strptime(dt, "%Y-%m-%d %H:%M:%S") for dt in datetime_input] + parsed_dates = [ + datetime.strptime(dt, "%Y-%m-%d %H:%M:%S") for dt in datetime_input + ] except Exception: # Parse to date only try: - parsed_dates = [datetime.strptime(dt, "%Y-%m-%d") for dt in datetime_input] + parsed_dates = [ + datetime.strptime(dt, "%Y-%m-%d") for dt in datetime_input + ] except Exception: return None - # If the service only accepts dates for this input, not datetimes (e.g. "daily"), - # return just the dates separated by a "/", otherwise, return the datetime in UTC - # format. + # If the service only accepts dates for this input, not + # datetimes (e.g. "daily"), return just the dates separated by a + # "/", otherwise, return the datetime in UTC format. if date: return "/".join(dt.strftime("%Y-%m-%d") for dt in parsed_dates) else: - parsed_locals = [dt.replace(tzinfo=local_timezone) for dt in parsed_dates] - formatted = "/".join(dt.astimezone(ZoneInfo("UTC")).strftime("%Y-%m-%dT%H:%M:%SZ") for dt in parsed_locals) + parsed_locals = [ + dt.replace(tzinfo=local_timezone) for dt in parsed_dates + ] + formatted = "/".join( + dt.astimezone(ZoneInfo("UTC")).strftime("%Y-%m-%dT%H:%M:%SZ") + for dt in parsed_locals + ) return formatted else: raise ValueError("datetime_input should only include 1-2 values") -def _cql2_param(args): + +def _cql2_param(args: Dict[str, Any]) -> str: + """ + Convert query parameters to CQL2 JSON format for POST requests. + + Parameters + ---------- + args : Dict[str, Any] + Dictionary of query parameters to convert to CQL2 format. + + Returns + ------- + str + JSON string representation of the CQL2 query. + """ filters = [] for key, values in args.items(): - filters.append({ - "op": "in", - "args": [ - {"property": key}, - values - ] - }) - - query = { - "op": "and", - "args": filters - } + filters.append({"op": "in", "args": [{"property": key}, values]}) + + query = {"op": "and", "args": filters} return json.dumps(query, indent=4) + def _default_headers(): """ Generate default HTTP headers for API requests. - Returns: - dict: A dictionary containing default headers including 'Accept-Encoding', - 'Accept', 'User-Agent', and 'lang'. If the environment variable 'API_USGS_PAT' - is set, its value is included as the 'X-Api-Key' header. + Returns + ------- + dict + A dictionary containing default headers including 'Accept-Encoding', + 'Accept', 'User-Agent', and 'lang'. If the environment variable + 'API_USGS_PAT' is set, its value is included as the 'X-Api-Key' header. """ headers = { "Accept-Encoding": "compress, gzip", "Accept": "application/json", "User-Agent": "python-dataretrieval/1.0", - "lang": "en-US" + "lang": "en-US", } token = os.getenv("API_USGS_PAT") if token: headers["X-Api-Key"] = token return headers + def _check_ogc_requests(endpoint: str = "daily", req_type: str = "queryables"): """ - Sends an HTTP GET request to the specified OGC endpoint and request type, returning the JSON response. + Sends an HTTP GET request to the specified OGC endpoint and request type, + returning the JSON response. - Args: - endpoint (str): The OGC collection endpoint to query. Defaults to "daily". - req_type (str): The type of request to make. Must be either "queryables" or "schema". Defaults to "queryables". + Parameters + ---------- + endpoint : str, optional + The OGC collection endpoint to query (default is "daily"). + req_type : str, optional + The type of request to make. Must be either "queryables" or "schema" + (default is "queryables"). - Returns: - dict: The JSON response from the OGC endpoint. + Returns + ------- + dict + The JSON response from the OGC endpoint. - Raises: - AssertionError: If req_type is not "queryables" or "schema". - httpx.HTTPStatusError: If the HTTP request returns an unsuccessful status code. + Raises + ------ + AssertionError + If req_type is not "queryables" or "schema". + requests.HTTPError + If the HTTP request returns an unsuccessful status code. """ assert req_type in ["queryables", "schema"] - url = f"{_base_url()}collections/{endpoint}/{req_type}" - resp = httpx.get(url, headers=_default_headers()) + url = f"{OGC_API_URL}/collections/{endpoint}/{req_type}" + resp = requests.get(url, headers=_default_headers()) resp.raise_for_status() return resp.json() -def _error_body(resp: httpx.Response): + +def _error_body(resp: requests.Response): """ Provide more informative error messages based on the response status. - Args: - resp (httpx.Response): The HTTP response object to extract the error message from. + Parameters + ---------- + resp : requests.Response + The HTTP response object to extract the error message from. - Returns: - str: The extracted error message. For status code 429, returns the 'message' field from the JSON error object. - For status code 403, returns a predefined message indicating possible reasons for denial. - For other status codes, returns the raw response text. + Returns + ------- + str + The extracted error message. For status code 429, returns the 'message' + field from the JSON error object. For status code 403, returns a + predefined message indicating possible reasons for denial. For other + status codes, returns the raw response text. """ if resp.status_code == 429: - return resp.json().get('error', {}).get('message') + return resp.json().get("error", {}).get("message") elif resp.status_code == 403: return "Query request denied. Possible reasons include query exceeding server limits." return resp.text + def _construct_api_requests( service: str, properties: Optional[List[str]] = None, bbox: Optional[List[float]] = None, limit: Optional[int] = None, max_results: Optional[int] = None, - skipGeometry: bool = False, - **kwargs + skip_geometry: bool = False, + **kwargs, ): """ Constructs an HTTP request object for the specified water data API service. - Depending on the input parameters (whether there's lists of multiple argument values), - the function determines whether to use a GET or POST request, formats parameters - appropriately, and sets required headers. - - Args: - service (str): The name of the API service to query (e.g., "daily"). - properties (Optional[List[str]], optional): List of property names to include in the request. - bbox (Optional[List[float]], optional): Bounding box coordinates as a list of floats. - limit (Optional[int], optional): Maximum number of results to return per request. - max_results (Optional[int], optional): Maximum number of rows to return. - skipGeometry (bool, optional): Whether to exclude geometry from the response. - **kwargs: Additional query parameters, including date/time filters and other API-specific options. - Returns: - httpx.Request: The constructed HTTP request object ready to be sent. - Raises: - ValueError: If `limit` is greater than `max_results`. - Notes: - - Date/time parameters are automatically formatted to ISO8601. - - If multiple values are provided for non-single parameters, a POST request is constructed. - - The function sets appropriate headers for GET and POST requests. + + Depending on the input parameters (whether there's lists of multiple + argument values), the function determines whether to use a GET or POST + request, formats parameters appropriately, and sets required headers. + + Parameters + ---------- + service : str + The name of the API service to query (e.g., "daily"). + properties : Optional[List[str]], optional + List of property names to include in the request. + bbox : Optional[List[float]], optional + Bounding box coordinates as a list of floats. + limit : Optional[int], optional + Maximum number of results to return per request. + max_results : Optional[int], optional + Maximum number of rows to return. + skip_geometry : bool, optional + Whether to exclude geometry from the response (default is False). + **kwargs + Additional query parameters, including date/time filters and other + API-specific options. + + Returns + ------- + requests.PreparedRequest + The constructed HTTP request object ready to be sent. + + Raises + ------ + ValueError + If `limit` is greater than `max_results`. + + Notes + ----- + - Date/time parameters are automatically formatted to ISO8601. + - If multiple values are provided for non-single parameters, a POST request + is constructed. + - The function sets appropriate headers for GET and POST requests. """ - baseURL = _setup_api(service) + service_url = f"{OGC_API_URL}/collections/{service}/items" # Single parameters can only have one value single_params = {"datetime", "last_modified", "begin", "end", "time"} - + # Identify which parameters should be included in the POST content body post_params = { - k: v for k, v in kwargs.items() - if k not in single_params and isinstance(v, (list, tuple)) and len(v) > 1 - } - + k: v + for k, v in kwargs.items() + if k not in single_params and isinstance(v, (list, tuple)) and len(v) > 1 + } + # Everything else goes into the params dictionary for the URL params = {k: v for k, v in kwargs.items() if k not in post_params} - # Set skipGeometry parameter - params["skipGeometry"] = skipGeometry + # Set skipGeometry parameter (API expects camelCase) + params["skipGeometry"] = skip_geometry # If limit is none and max_results is not none, then set limit to max results. Otherwise, # if max_results is none, set it to 10000 (the API max). - params["limit"] = max_results if limit is None and max_results is not None else limit or 10000 + params["limit"] = ( + max_results if limit is None and max_results is not None else limit or 10000 + ) # Add max results as a parameter if it is not None if max_results is not None: params["max_results"] = max_results @@ -338,66 +413,93 @@ def _construct_api_requests( if POST: headers["Content-Type"] = "application/query-cql-json" - req = httpx.Request(method="POST", url=baseURL, headers=headers, content=_cql2_param(post_params), params=params) + request = requests.Request( + method="POST", + url=service_url, + headers=headers, + data=_cql2_param(post_params), + params=params, + ) else: - req = httpx.Request(method="GET", url=baseURL, headers=headers, params=params) - return req + request = requests.Request( + method="GET", + url=service_url, + headers=headers, + params=params, + ) + return request.prepare() -def _next_req_url(resp: httpx.Response) -> Optional[str]: - """ - Extracts the URL for the next page of results from an HTTP response from a water data endpoint. - Parameters: - resp (httpx.Response): The HTTP response object containing JSON data and headers. +def _next_req_url(resp: requests.Response) -> Optional[str]: + """ + Extracts the URL for the next page of results from an HTTP response from a + water data endpoint. - Returns: - Optional[str]: The URL for the next page of results if available, otherwise None. + Parameters + ---------- + resp : requests.Response + The HTTP response object containing JSON data and headers. - Side Effects: - If the environment variable "API_USGS_PAT" is set, prints the remaining requests for the current hour. - Prints the next URL if found. + Returns + ------- + Optional[str] + The URL for the next page of results if available, otherwise None. - Notes: - - Expects the response JSON to contain a "links" list with objects having "rel" and "href" keys. - - Checks for the "next" relation in the "links" to determine the next URL. + Notes + ----- + - If the environment variable "API_USGS_PAT" is set, logs the remaining + requests for the current hour. + - Logs the next URL if found at debug level. + - Expects the response JSON to contain a "links" list with objects having + "rel" and "href" keys. + - Checks for the "next" relation in the "links" to determine the next URL. """ body = resp.json() if not body.get("numberReturned"): return None header_info = resp.headers if os.getenv("API_USGS_PAT", ""): - print("Remaining requests this hour:", header_info.get("x-ratelimit-remaining", "")) + logger.info( + "Remaining requests this hour: %s", + header_info.get("x-ratelimit-remaining", ""), + ) for link in body.get("links", []): if link.get("rel") == "next": next_url = link.get("href") - print(f"Next URL: {next_url}") + logger.debug("Next URL: %s", next_url) return next_url return None -def _get_resp_data(resp: httpx.Response, geopd: bool) -> pd.DataFrame: + +def _get_resp_data(resp: requests.Response, geopd: bool) -> pd.DataFrame: """ - Extracts and normalizes data from an httpx.Response object containing GeoJSON features. + Extracts and normalizes data from an HTTP response containing GeoJSON features. - Parameters: - resp (httpx.Response): The HTTP response object expected to contain a JSON body with a "features" key. - geopd (bool): Indicates whether geopandas is installed and should be used to handle geometries. + Parameters + ---------- + resp : requests.Response + The HTTP response object expected to contain a JSON body with a "features" key. + geopd : bool + Indicates whether geopandas is installed and should be used to handle geometries. - Returns: - gpd.GeoDataFrame or pd.DataFrame: A geopandas GeoDataFrame if geometry is included, or a - pandas DataFrame containing the feature properties and each row's service-specific id. + Returns + ------- + gpd.GeoDataFrame or pd.DataFrame + A geopandas GeoDataFrame if geometry is included, or a pandas DataFrame + containing the feature properties and each row's service-specific id. Returns an empty pandas DataFrame if no features are returned. """ # Check if it's an empty response body = resp.json() if not body.get("numberReturned"): return pd.DataFrame() - + # If geopandas not installed, return a pandas dataframe if not geopd: - df = pd.json_normalize( - body["features"], - sep="_") - df = df.drop(columns=["type", "geometry", "AsGeoJSON(geometry)"], errors="ignore") + df = pd.json_normalize(body["features"], sep="_") + df = df.drop( + columns=["type", "geometry", "AsGeoJSON(geometry)"], errors="ignore" + ) df.columns = [col.replace("properties_", "") for col in df.columns] df.rename(columns={"geometry_coordinates": "geometry"}, inplace=True) return df @@ -414,25 +516,36 @@ def _get_resp_data(resp: httpx.Response, geopd: bool) -> pd.DataFrame: return df -def _walk_pages(geopd: bool, req: httpx.Request, max_results: Optional[int], client: Optional[httpx.Client] = None) -> pd.DataFrame: + +def _walk_pages( + geopd: bool, + req: requests.PreparedRequest, + max_results: Optional[int], + client: Optional[requests.Session] = None, +) -> Tuple[pd.DataFrame, requests.Response]: """ Iterates through paginated API responses and aggregates the results into a single DataFrame. Parameters ---------- geopd : bool - Indicates whether geopandas is installed and should be used for handling geometries. - req : httpx.Request + Indicates whether geopandas is installed and should be used for handling + geometries. + req : requests.PreparedRequest The initial HTTP request to send. max_results : Optional[int] - Maximum number of rows to return. If None or NaN, retrieves all available pages. - client : Optional[httpx.Client], default None - An optional HTTP client to use for requests. If not provided, a new client is created. + Maximum number of rows to return. If None or NaN, retrieves all + available pages. + client : Optional[requests.Session], default None + An optional HTTP client to use for requests. If not provided, a new + client is created. Returns ------- pd.DataFrame A DataFrame containing the aggregated results from all pages. + requests.Response + The initial response object containing metadata about the first request. Raises ------ @@ -441,48 +554,71 @@ def _walk_pages(geopd: bool, req: httpx.Request, max_results: Optional[int], cli Notes ----- - - If `max_results` is None or NaN, the function will continue to request subsequent pages until no more pages are available. - - Failed requests are tracked and reported, but do not halt the entire process unless the initial request fails. + - If `max_results` is None or NaN, the function will continue to request + subsequent pages until no more pages are available. + - Failed requests are tracked and reported, but do not halt the entire + process unless the initial request fails. """ - print(f"Requesting:\n{req.url}") + logger.info("Requesting: %s", req.url) if not geopd: - print("Geopandas is not installed. Data frames containing geometry will be returned as pandas DataFrames.") + logger.warning( + "Geopandas is not installed. ", + "Geometries will be flattened into pandas DataFrames.", + ) # Get first response from client # using GET or POST call - client = client or httpx.Client() - resp = client.send(req) - if resp.status_code != 200: raise Exception(_error_body(resp)) - - # Grab some aspects of the original request: headers and the - # request type (GET or POST) - method = req.method.upper() - headers = req.headers - content = req.content if method == "POST" else None - - if max_results is None or pd.isna(max_results): - dfs = _get_resp_data(resp, geopd=geopd) - curr_url = _next_req_url(resp) - failures = [] - while curr_url: - try: - resp = client.request(method, curr_url, headers=headers, content=content if method == "POST" else None) - if resp.status_code != 200: raise Exception(_error_body(resp)) - df1 = _get_resp_data(resp, geopd=geopd) - dfs = pd.concat([dfs, df1], ignore_index=True) - curr_url = _next_req_url(resp) - except Exception: - failures.append(curr_url) - curr_url = None - if failures: - print(f"There were {len(failures)} failed requests.") - return dfs - else: - resp.raise_for_status() - return _get_resp_data(resp, geopd=geopd) + close_client = client is None + client = client or requests.Session() + try: + resp = client.send(req) + if resp.status_code != 200: + raise Exception(_error_body(resp)) + + # Store the initial response for metadata + initial_response = resp + + # Grab some aspects of the original request: headers and the + # request type (GET or POST) + method = req.method.upper() + headers = dict(req.headers) + content = req.body if method == "POST" else None + + if max_results is None or pd.isna(max_results): + dfs = _get_resp_data(resp, geopd=geopd) + curr_url = _next_req_url(resp) + failures = [] + while curr_url: + try: + resp = client.request( + method, + curr_url, + headers=headers, + data=content if method == "POST" else None, + ) + if resp.status_code != 200: + raise Exception(_error_body(resp)) + df1 = _get_resp_data(resp, geopd=geopd) + dfs = pd.concat([dfs, df1], ignore_index=True) + curr_url = _next_req_url(resp) + except Exception: + failures.append(curr_url) + curr_url = None + if failures: + logger.warning("There were %d failed requests.", len(failures)) + return dfs, initial_response + else: + resp.raise_for_status() + return _get_resp_data(resp, geopd=geopd), initial_response + finally: + if close_client: + client.close() -def _deal_with_empty(return_list: pd.DataFrame, properties: Optional[List[str]], service: str) -> pd.DataFrame: + +def _deal_with_empty( + return_list: pd.DataFrame, properties: Optional[List[str]], service: str +) -> pd.DataFrame: """ Handles empty DataFrame results by returning a DataFrame with appropriate columns. @@ -490,13 +626,19 @@ def _deal_with_empty(return_list: pd.DataFrame, properties: Optional[List[str]], - If `properties` is not provided or contains only NaN values, retrieves the schema properties from the specified service. - Otherwise, uses the provided `properties` list as column names. - Args: - return_list (pd.DataFrame): The DataFrame to check for emptiness. - properties (Optional[List[str]]): List of property names to use as columns, or None. - service (str): The service endpoint to query for schema properties if needed. + Parameters + ---------- + return_list : pd.DataFrame + The DataFrame to check for emptiness. + properties : Optional[List[str]] + List of property names to use as columns, or None. + service : str + The service endpoint to query for schema properties if needed. - Returns: - pd.DataFrame: The original DataFrame if not empty, otherwise an empty DataFrame with the appropriate columns. + Returns + ------- + pd.DataFrame + The original DataFrame if not empty, otherwise an empty DataFrame with the appropriate columns. """ if return_list.empty: if not properties or all(pd.isna(properties)): @@ -505,7 +647,10 @@ def _deal_with_empty(return_list: pd.DataFrame, properties: Optional[List[str]], return pd.DataFrame(columns=properties) return return_list -def _arrange_cols(df: pd.DataFrame, properties: Optional[List[str]], output_id: str) -> pd.DataFrame: + +def _arrange_cols( + df: pd.DataFrame, properties: Optional[List[str]], output_id: str +) -> pd.DataFrame: """ Rearranges and renames columns in a DataFrame based on provided properties and service's output id. @@ -526,7 +671,7 @@ def _arrange_cols(df: pd.DataFrame, properties: Optional[List[str]], output_id: if properties and not all(pd.isna(properties)): if "id" not in properties: # If user refers to service-specific output id in properties, - # then rename the "id" column to the output_id (id column is + # then rename the "id" column to the output_id (id column is # automatically included). if output_id in properties: df = df.rename(columns={"id": output_id}) @@ -541,6 +686,7 @@ def _arrange_cols(df: pd.DataFrame, properties: Optional[List[str]], output_id: else: return df.rename(columns={"id": output_id}) + def _cleanup_cols(df: pd.DataFrame, service: str = "daily") -> pd.DataFrame: """ Cleans and standardizes columns in a pandas DataFrame for water data endpoints. @@ -569,27 +715,37 @@ def _cleanup_cols(df: pd.DataFrame, service: str = "daily") -> pd.DataFrame: df[col] = pd.to_numeric(df[col], errors="coerce") return df -def get_ogc_data(args: Dict[str, Any], output_id: str, service: str) -> pd.DataFrame: + +def get_ogc_data( + args: Dict[str, Any], output_id: str, service: str +) -> Tuple[pd.DataFrame, BaseMetadata]: """ - Retrieves OGC (Open Geospatial Consortium) data from a specified water data endpoint and returns it as a pandas DataFrame. + Retrieves OGC (Open Geospatial Consortium) data from a specified water data endpoint and returns it as a pandas DataFrame with metadata. This function prepares request arguments, constructs API requests, handles pagination, processes the results, and formats the output DataFrame according to the specified parameters. - Args: - args (Dict[str, Any]): Dictionary of request arguments for the OGC service. - output_id (str): The name of the output identifier to use in the request. - service (str): The OGC service type (e.g., "wfs", "wms"). + Parameters + ---------- + args : Dict[str, Any] + Dictionary of request arguments for the OGC service. + output_id : str + The name of the output identifier to use in the request. + service : str + The OGC service type (e.g., "wfs", "wms"). - Returns: - pd.DataFrame or gpd.GeoDataFrame: A DataFrame containing the retrieved and processed OGC data, - with metadata attributes including the request URL and query timestamp. + Returns + ------- + pd.DataFrame or gpd.GeoDataFrame + A DataFrame containing the retrieved and processed OGC data. + BaseMetadata + A metadata object containing request information including URL and query time. - Notes: - - The function does not mutate the input `args` dictionary. - - Handles optional arguments such as `max_results` and `convertType`. - - Applies column cleanup and reordering based on service and properties. - - Metadata is attached to the DataFrame via the `.attrs` attribute. + Notes + ----- + - The function does not mutate the input `args` dictionary. + - Handles optional arguments such as `max_results` and `convert_type`. + - Applies column cleanup and reordering based on service and properties. """ args = args.copy() # Add service as an argument @@ -600,22 +756,26 @@ def get_ogc_data(args: Dict[str, Any], output_id: str, service: str) -> pd.DataF args = _switch_arg_id(args, id_name=output_id, service=service) properties = args.get("properties") # Switch properties id to "id" if needed - args["properties"] = _switch_properties_id(properties, id_name=output_id, service=service) - convertType = args.pop("convertType", False) + args["properties"] = _switch_properties_id( + properties, id_name=output_id, service=service + ) + convert_type = args.pop("convert_type", False) # Create fresh dictionary of args without any None values args = {k: v for k, v in args.items() if v is not None} # Build API request req = _construct_api_requests(**args) # Run API request and iterate through pages if needed - return_list = _walk_pages(geopd=geopd, req=req, max_results=max_results) + return_list, response = _walk_pages( + geopd=GEOPANDAS, req=req, max_results=max_results + ) # Manage some aspects of the returned dataset return_list = _deal_with_empty(return_list, properties, service) - if convertType: + if convert_type: return_list = _cleanup_cols(return_list, service=service) return_list = _arrange_cols(return_list, properties, output_id) - # Add metadata - return_list.attrs.update(request=req.url, queryTime=pd.Timestamp.now()) - return return_list + # Create metadata object from response + metadata = BaseMetadata(response) + return return_list, metadata # def _get_description(service: str): @@ -627,14 +787,13 @@ def get_ogc_data(args: Dict[str, Any], output_id: str, service: str) -> pd.DataF # def _get_params(service: str): # url = f"{_base_url()}collections/{service}/schema" -# resp = httpx.get(url, headers=_default_headers()) +# resp = requests.get(url, headers=_default_headers()) # resp.raise_for_status() # properties = resp.json().get("properties", {}) # return {k: v.get("description") for k, v in properties.items()} # def _get_collection(): # url = f"{_base_url()}openapi?f=json" -# resp = httpx.get(url, headers=_default_headers()) +# resp = requests.get(url, headers=_default_headers()) # resp.raise_for_status() # return resp.json() - diff --git a/tests/nldi_test.py b/tests/nldi_test.py index c4d6675f..9993a899 100644 --- a/tests/nldi_test.py +++ b/tests/nldi_test.py @@ -47,7 +47,7 @@ def test_get_basin(requests_mock): f"{NLDI_API_BASE_URL}/WQP/USGS-054279485/basin" f"?simplified=true&splitCatchment=false" ) - response_file_path = "data/nldi_get_basin.json" + response_file_path = "tests/data/nldi_get_basin.json" mock_request_data_sources(requests_mock) mock_request(requests_mock, request_url, response_file_path) @@ -62,7 +62,7 @@ def test_get_flowlines(requests_mock): f"{NLDI_API_BASE_URL}/WQP/USGS-054279485/navigation/UM/flowlines" f"?distance=5&trimStart=false" ) - response_file_path = "data/nldi_get_flowlines.json" + response_file_path = "tests/data/nldi_get_flowlines.json" mock_request_data_sources(requests_mock) mock_request(requests_mock, request_url, response_file_path) @@ -78,7 +78,7 @@ def test_get_flowlines_by_comid(requests_mock): request_url = ( f"{NLDI_API_BASE_URL}/comid/13294314/navigation/UM/flowlines?distance=50" ) - response_file_path = "data/nldi_get_flowlines_by_comid.json" + response_file_path = "tests/data/nldi_get_flowlines_by_comid.json" mock_request_data_sources(requests_mock) mock_request(requests_mock, request_url, response_file_path) @@ -94,7 +94,7 @@ def test_features_by_feature_source_with_navigation(requests_mock): request_url = ( f"{NLDI_API_BASE_URL}/WQP/USGS-054279485/navigation/UM/nwissite?distance=50" ) - response_file_path = "data/nldi_get_features_by_feature_source_with_nav_mode.json" + response_file_path = "tests/data/nldi_get_features_by_feature_source_with_nav_mode.json" mock_request_data_sources(requests_mock) mock_request(requests_mock, request_url, response_file_path) @@ -115,7 +115,7 @@ def test_features_by_feature_source_without_navigation(requests_mock): """ request_url = f"{NLDI_API_BASE_URL}/WQP/USGS-054279485" response_file_path = ( - "data/nldi_get_features_by_feature_source_without_nav_mode.json" + "tests/data/nldi_get_features_by_feature_source_without_nav_mode.json" ) mock_request_data_sources(requests_mock) mock_request(requests_mock, request_url, response_file_path) @@ -128,7 +128,7 @@ def test_features_by_feature_source_without_navigation(requests_mock): def test_get_features_by_comid(requests_mock): """Tests NLDI get features query using comid as the origin""" request_url = f"{NLDI_API_BASE_URL}/comid/13294314/navigation/UM/WQP?distance=5" - response_file_path = "data/nldi_get_features_by_comid.json" + response_file_path = "tests/data/nldi_get_features_by_comid.json" mock_request_data_sources(requests_mock) mock_request(requests_mock, request_url, response_file_path) @@ -144,7 +144,7 @@ def test_get_features_by_lat_long(requests_mock): request_url = ( f"{NLDI_API_BASE_URL}/comid/position?coords=POINT%28-89.509%2043.087%29" ) - response_file_path = "data/nldi_get_features_by_lat_long.json" + response_file_path = "tests/data/nldi_get_features_by_lat_long.json" mock_request_data_sources(requests_mock) mock_request(requests_mock, request_url, response_file_path) @@ -156,7 +156,7 @@ def test_get_features_by_lat_long(requests_mock): def test_search_for_basin(requests_mock): """Tests NLDI search query for basin""" request_url = f"{NLDI_API_BASE_URL}/WQP/USGS-054279485/basin" - response_file_path = "data/nldi_get_basin.json" + response_file_path = "tests/data/nldi_get_basin.json" mock_request_data_sources(requests_mock) mock_request(requests_mock, request_url, response_file_path) @@ -172,7 +172,7 @@ def test_search_for_basin(requests_mock): def test_search_for_flowlines(requests_mock): """Tests NLDI search query for flowlines""" request_url = f"{NLDI_API_BASE_URL}/WQP/USGS-054279485/navigation/UM/flowlines" - response_file_path = "data/nldi_get_flowlines.json" + response_file_path = "tests/data/nldi_get_flowlines.json" mock_request_data_sources(requests_mock) mock_request(requests_mock, request_url, response_file_path) @@ -191,7 +191,7 @@ def test_search_for_flowlines(requests_mock): def test_search_for_flowlines_by_comid(requests_mock): """Tests NLDI search query for flowlines by comid""" request_url = f"{NLDI_API_BASE_URL}/comid/13294314/navigation/UM/flowlines" - response_file_path = "data/nldi_get_flowlines_by_comid.json" + response_file_path = "tests/data/nldi_get_flowlines_by_comid.json" mock_request_data_sources(requests_mock) mock_request(requests_mock, request_url, response_file_path) @@ -207,7 +207,7 @@ def test_search_for_features_by_feature_source_with_navigation(requests_mock): request_url = ( f"{NLDI_API_BASE_URL}/WQP/USGS-054279485/navigation/UM/nwissite?distance=50" ) - response_file_path = "data/nldi_get_features_by_feature_source_with_nav_mode.json" + response_file_path = "tests/data/nldi_get_features_by_feature_source_with_nav_mode.json" mock_request_data_sources(requests_mock) mock_request(requests_mock, request_url, response_file_path) @@ -228,7 +228,7 @@ def test_search_for_features_by_feature_source_without_navigation(requests_mock) """Tests NLDI search query for features by feature source""" request_url = f"{NLDI_API_BASE_URL}/WQP/USGS-054279485" response_file_path = ( - "data/nldi_get_features_by_feature_source_without_nav_mode.json" + "tests/data/nldi_get_features_by_feature_source_without_nav_mode.json" ) mock_request_data_sources(requests_mock) mock_request(requests_mock, request_url, response_file_path) @@ -245,7 +245,7 @@ def test_search_for_features_by_feature_source_without_navigation(requests_mock) def test_search_for_features_by_comid(requests_mock): """Tests NLDI search query for features by comid""" request_url = f"{NLDI_API_BASE_URL}/comid/13294314/navigation/UM/WQP?distance=5" - response_file_path = "data/nldi_get_features_by_comid.json" + response_file_path = "tests/data/nldi_get_features_by_comid.json" mock_request_data_sources(requests_mock) mock_request(requests_mock, request_url, response_file_path) @@ -267,7 +267,7 @@ def test_search_for_features_by_lat_long(requests_mock): request_url = ( f"{NLDI_API_BASE_URL}/comid/position?coords=POINT%28-89.509%2043.087%29" ) - response_file_path = "data/nldi_get_features_by_lat_long.json" + response_file_path = "tests/data/nldi_get_features_by_lat_long.json" mock_request_data_sources(requests_mock) mock_request(requests_mock, request_url, response_file_path) diff --git a/tests/waterdata_test.py b/tests/waterdata_test.py index d0e7a49e..0f46e231 100755 --- a/tests/waterdata_test.py +++ b/tests/waterdata_test.py @@ -11,8 +11,8 @@ get_latest_continuous, get_field_measurements, get_time_series_metadata, - _SERVICES, - _PROFILES + SERVICES, + PROFILES, ) def mock_request(requests_mock, request_url, file_path): @@ -29,7 +29,7 @@ def test_mock_get_samples(requests_mock): "activityMediaName=Water&activityStartDateLower=2020-01-01" "&activityStartDateUpper=2024-12-31&monitoringLocationIdentifier=USGS-05406500&mimeType=text%2Fcsv" ) - response_file_path = "data/samples_results.txt" + response_file_path = "tests/data/samples_results.txt" mock_request(requests_mock, request_url, response_file_path) df, md = get_samples( service="results", @@ -112,7 +112,7 @@ def test_samples_organizations(): assert df.size == 3 def test_get_daily(): - df = get_daily( + df, metadata = get_daily( monitoring_location_id="USGS-05427718", parameter_code="00060", time="2025-01-01/.." @@ -123,10 +123,12 @@ def test_get_daily(): assert df.parameter_code.unique().tolist() == ["00060"] assert df.monitoring_location_id.unique().tolist() == ["USGS-05427718"] assert df["time"].apply(lambda x: isinstance(x, datetime.date)).all() + assert hasattr(metadata, 'url') + assert hasattr(metadata, 'query_time') assert df["value"].dtype == "float64" def test_get_daily_properties(): - df = get_daily( + df, metadata = get_daily( monitoring_location_id="USGS-05427718", parameter_code="00060", time="2025-01-01/..", @@ -135,39 +137,49 @@ def test_get_daily_properties(): assert "daily_id" in df.columns assert "geometry" in df.columns assert df.shape[1] == 6 - assert (df["time"] >= datetime.date(2025, 1, 1)).all() + assert df.parameter_code.unique().tolist() == ["00060"] + assert hasattr(metadata, 'url') + assert hasattr(metadata, 'query_time') def test_get_daily_no_geometry(): - df = get_daily( + df, metadata = get_daily( monitoring_location_id="USGS-05427718", parameter_code="00060", time="2025-01-01/..", - skipGeometry=True + skip_geometry=True ) assert "geometry" not in df.columns assert df.shape[1] == 11 assert isinstance(df, DataFrame) + assert hasattr(metadata, 'url') + assert hasattr(metadata, 'query_time') def test_get_monitoring_locations(): - df = get_monitoring_locations( + df, metadata = get_monitoring_locations( state_name="Connecticut", site_type_code="GW" ) assert df.site_type_code.unique().tolist() == ["GW"] + assert hasattr(metadata, 'url') + assert hasattr(metadata, 'query_time') def test_get_monitoring_locations_hucs(): - df = get_monitoring_locations( + df, metadata = get_monitoring_locations( hydrologic_unit_code=["010802050102", "010802050103"] ) assert set(df.hydrologic_unit_code.unique().tolist()) == {"010802050102", "010802050103"} + assert hasattr(metadata, 'url') + assert hasattr(metadata, 'query_time') def test_get_latest_continuous(): - df = get_latest_continuous( + df, metadata = get_latest_continuous( monitoring_location_id=["USGS-05427718", "USGS-05427719"], parameter_code=["00060", "00065"] ) assert df.shape[0] <= 4 assert df.statistic_id.unique().tolist() == ["00011"] + assert hasattr(metadata, 'url') + assert hasattr(metadata, 'query_time') try: datetime.datetime.strptime(df['time'].iloc[0], "%Y-%m-%dT%H:%M:%S+00:00") out=True @@ -176,22 +188,26 @@ def test_get_latest_continuous(): assert out def test_get_field_measurements(): - df = get_field_measurements( + df, metadata = get_field_measurements( monitoring_location_id="USGS-05427718", unit_of_measure="ft^3/s", time="2025-01-01/2025-10-01", - skipGeometry=True + skip_geometry=True ) assert "field_measurement_id" in df.columns assert "geometry" not in df.columns assert df.unit_of_measure.unique().tolist() == ["ft^3/s"] + assert hasattr(metadata, 'url') + assert hasattr(metadata, 'query_time') def test_get_time_series_metadata(): - df = get_time_series_metadata( + df, metadata = get_time_series_metadata( bbox=[-89.840355,42.853411,-88.818626,43.422598], parameter_code=["00060", "00065", "72019"], - skipGeometry=True + skip_geometry=True ) assert set(df['parameter_name'].unique().tolist()) == {"Gage height", "Water level, depth LSD", "Discharge"} + assert hasattr(metadata, 'url') + assert hasattr(metadata, 'query_time') diff --git a/tests/waterservices_test.py b/tests/waterservices_test.py index 19cc30fb..449650aa 100755 --- a/tests/waterservices_test.py +++ b/tests/waterservices_test.py @@ -93,7 +93,7 @@ def test_get_dv(requests_mock): "https://waterservices.usgs.gov/nwis/dv?format={}" "&startDT=2020-02-14&endDT=2020-02-15&sites={}".format(format, site) ) - response_file_path = "data/waterservices_dv.txt" + response_file_path = "tests/data/waterservices_dv.txt" mock_request(requests_mock, request_url, response_file_path) df, md = get_dv( sites=["01491000", "01645000"], start="2020-02-14", end="2020-02-15" @@ -115,7 +115,7 @@ def test_get_dv_site_value_types(requests_mock, site_input_type_list): "https://waterservices.usgs.gov/nwis/dv?format={}" "&startDT=2020-02-14&endDT=2020-02-15&sites={}".format(_format, site) ) - response_file_path = "data/waterservices_dv.txt" + response_file_path = "tests/data/waterservices_dv.txt" mock_request(requests_mock, request_url, response_file_path) if site_input_type_list: sites = [site] @@ -136,7 +136,7 @@ def test_get_iv(requests_mock): "https://waterservices.usgs.gov/nwis/iv?format={}" "&startDT=2019-02-14&endDT=2020-02-15&sites={}".format(format, site) ) - response_file_path = "data/waterservices_iv.txt" + response_file_path = "tests/data/waterservices_iv.txt" mock_request(requests_mock, request_url, response_file_path) df, md = get_iv( sites=["01491000", "01645000"], start="2019-02-14", end="2020-02-15" @@ -158,7 +158,7 @@ def test_get_iv_site_value_types(requests_mock, site_input_type_list): "https://waterservices.usgs.gov/nwis/iv?format={}" "&startDT=2019-02-14&endDT=2020-02-15&sites={}".format(_format, site) ) - response_file_path = "data/waterservices_iv.txt" + response_file_path = "tests/data/waterservices_iv.txt" mock_request(requests_mock, request_url, response_file_path) if site_input_type_list: sites = [site] @@ -183,7 +183,7 @@ def test_get_info(requests_mock): request_url = "https://waterservices.usgs.gov/nwis/site?sites={}¶meterCd={}&siteOutput=Expanded&format={}".format( site, parameter_cd, format ) - response_file_path = "data/waterservices_site.txt" + response_file_path = "tests/data/waterservices_site.txt" mock_request(requests_mock, request_url, response_file_path) df, md = get_info(sites=["01491000", "01645000"], parameterCd="00618") if not isinstance(df, DataFrame): @@ -210,7 +210,7 @@ def test_get_gwlevels(requests_mock): "https://nwis.waterdata.usgs.gov/nwis/gwlevels?format={}&begin_date=1851-01-01" "&site_no={}".format(format, site) ) - response_file_path = "data/waterdata_gwlevels.txt" + response_file_path = "tests/data/waterdata_gwlevels.txt" mock_request(requests_mock, request_url, response_file_path) df, md = get_gwlevels(sites=site) if not isinstance(df, DataFrame): @@ -229,7 +229,7 @@ def test_get_gwlevels_site_value_types(requests_mock, site_input_type_list): "https://nwis.waterdata.usgs.gov/nwis/gwlevels?format={}&begin_date=1851-01-01" "&site_no={}".format(_format, site) ) - response_file_path = "data/waterdata_gwlevels.txt" + response_file_path = "tests/data/waterdata_gwlevels.txt" mock_request(requests_mock, request_url, response_file_path) if site_input_type_list: sites = [site] @@ -249,7 +249,7 @@ def test_get_discharge_peaks(requests_mock): "https://nwis.waterdata.usgs.gov/nwis/peaks?format={}&site_no={}" "&begin_date=2000-02-14&end_date=2020-02-15".format(format, site) ) - response_file_path = "data/waterservices_peaks.txt" + response_file_path = "tests/data/waterservices_peaks.txt" mock_request(requests_mock, request_url, response_file_path) df, md = get_discharge_peaks(sites=[site], start="2000-02-14", end="2020-02-15") if not isinstance(df, DataFrame): @@ -269,7 +269,7 @@ def test_get_discharge_peaks_sites_value_types(requests_mock, site_input_type_li "https://nwis.waterdata.usgs.gov/nwis/peaks?format={}&site_no={}" "&begin_date=2000-02-14&end_date=2020-02-15".format(_format, site) ) - response_file_path = "data/waterservices_peaks.txt" + response_file_path = "tests/data/waterservices_peaks.txt" mock_request(requests_mock, request_url, response_file_path) if site_input_type_list: sites = [site] @@ -292,7 +292,7 @@ def test_get_discharge_measurements(requests_mock): "https://nwis.waterdata.usgs.gov/nwis/measurements?site_no={}" "&begin_date=2000-02-14&end_date=2020-02-15&format={}".format(site, format) ) - response_file_path = "data/waterdata_measurements.txt" + response_file_path = "tests/data/waterdata_measurements.txt" mock_request(requests_mock, request_url, response_file_path) df, md = get_discharge_measurements( sites=[site], start="2000-02-14", end="2020-02-15" @@ -315,7 +315,7 @@ def test_get_discharge_measurements_sites_value_types( "https://nwis.waterdata.usgs.gov/nwis/measurements?site_no={}" "&begin_date=2000-02-14&end_date=2020-02-15&format={}".format(site, format) ) - response_file_path = "data/waterdata_measurements.txt" + response_file_path = "tests/data/waterdata_measurements.txt" mock_request(requests_mock, request_url, response_file_path) if site_input_type_list: sites = [site] @@ -334,7 +334,7 @@ def test_get_pmcodes(requests_mock): DataFrame""" format = "rdb" request_url = "https://help.waterdata.usgs.gov/code/parameter_cd_nm_query?fmt=rdb&parm_nm_cd=%2500618%25" - response_file_path = "data/waterdata_pmcodes.txt" + response_file_path = "tests/data/waterdata_pmcodes.txt" mock_request(requests_mock, request_url, response_file_path) df, md = get_pmcodes(parameterCd="00618") if not isinstance(df, DataFrame): @@ -352,7 +352,7 @@ def test_get_pmcodes_parameterCd_value_types( parameterCd = "00618" request_url = "https://help.waterdata.usgs.gov/code/parameter_cd_nm_query?fmt={}&parm_nm_cd=%25{}%25" request_url = request_url.format(_format, parameterCd) - response_file_path = "data/waterdata_pmcodes.txt" + response_file_path = "tests/data/waterdata_pmcodes.txt" mock_request(requests_mock, request_url, response_file_path) if parameterCd_input_type_list: parameterCd = [parameterCd] @@ -372,7 +372,7 @@ def test_get_water_use_national(requests_mock): "https://nwis.waterdata.usgs.gov/nwis/water_use?rdb_compression=value&format={}&wu_year=ALL" "&wu_category=ALL&wu_county=ALL".format(format) ) - response_file_path = "data/water_use_national.txt" + response_file_path = "tests/data/water_use_national.txt" mock_request(requests_mock, request_url, response_file_path) df, md = get_water_use() if not isinstance(df, DataFrame): @@ -390,7 +390,7 @@ def test_get_water_use_national_year_value_types(requests_mock, year_input_type_ "https://nwis.waterdata.usgs.gov/nwis/water_use?rdb_compression=value&format={}&wu_year=ALL" "&wu_category=ALL&wu_county=ALL".format(_format) ) - response_file_path = "data/water_use_national.txt" + response_file_path = "tests/data/water_use_national.txt" mock_request(requests_mock, request_url, response_file_path) if year_input_type_list: years = [year] @@ -412,7 +412,7 @@ def test_get_water_use_national_county_value_types( "https://nwis.waterdata.usgs.gov/nwis/water_use?rdb_compression=value&format={}&wu_year=ALL" "&wu_category=ALL&wu_county=ALL".format(_format) ) - response_file_path = "data/water_use_national.txt" + response_file_path = "tests/data/water_use_national.txt" mock_request(requests_mock, request_url, response_file_path) if county_input_type_list: counties = [county] @@ -435,7 +435,7 @@ def test_get_water_use_national_county_value_types( "https://nwis.waterdata.usgs.gov/nwis/water_use?rdb_compression=value&format={}&wu_year=ALL" "&wu_category=ALL&wu_county=ALL".format(_format) ) - response_file_path = "data/water_use_national.txt" + response_file_path = "tests/data/water_use_national.txt" mock_request(requests_mock, request_url, response_file_path) if category_input_type_list: categories = [category] @@ -455,7 +455,7 @@ def test_get_water_use_allegheny(requests_mock): "https://nwis.waterdata.usgs.gov/PA/nwis/water_use?rdb_compression=value&format=rdb&wu_year=ALL" "&wu_category=ALL&wu_county=003&wu_area=county" ) - response_file_path = "data/water_use_allegheny.txt" + response_file_path = "tests/data/water_use_allegheny.txt" mock_request(requests_mock, request_url, response_file_path) df, md = get_water_use(state="PA", counties="003") if not isinstance(df, DataFrame): @@ -481,7 +481,7 @@ def test_get_ratings(requests_mock): request_url = "https://nwis.waterdata.usgs.gov/nwisweb/get_ratings/?site_no={}&file_type=base".format( site ) - response_file_path = "data/waterservices_ratings.txt" + response_file_path = "tests/data/waterservices_ratings.txt" mock_request(requests_mock, request_url, response_file_path) df, md = get_ratings(site_no=site) if not isinstance(df, DataFrame): @@ -501,7 +501,7 @@ def test_what_sites(requests_mock): "https://waterservices.usgs.gov/nwis/site?bBox=-83.0%2C36.5%2C-81.0%2C38.5" "¶meterCd={}&hasDataTypeCd=dv&format={}".format(parameter_cd, format) ) - response_file_path = "data/nwis_sites.txt" + response_file_path = "tests/data/nwis_sites.txt" mock_request(requests_mock, request_url, response_file_path) df, md = what_sites( @@ -534,7 +534,7 @@ def test_get_stats(requests_mock): request_url = "https://waterservices.usgs.gov/nwis/stat?sites=01491000%2C01645000&format={}".format( format ) - response_file_path = "data/waterservices_stats.txt" + response_file_path = "tests/data/waterservices_stats.txt" mock_request(requests_mock, request_url, response_file_path) df, md = get_stats(sites=["01491000", "01645000"]) @@ -552,7 +552,7 @@ def test_get_stats_site_value_types(requests_mock, site_input_type_list): request_url = "https://waterservices.usgs.gov/nwis/stat?sites={}&format={}".format( site, _format ) - response_file_path = "data/waterservices_stats.txt" + response_file_path = "tests/data/waterservices_stats.txt" mock_request(requests_mock, request_url, response_file_path) if site_input_type_list: sites = [site] @@ -579,7 +579,7 @@ def assert_metadata(requests_mock, request_url, md, site, parameter_cd, format): site_request_url = ( "https://waterservices.usgs.gov/nwis/site?sites={}&format=rdb".format(site) ) - with open("data/waterservices_site.txt") as text: + with open("tests/data/waterservices_site.txt") as text: requests_mock.get(site_request_url, text=text.read()) site_info, _ = md.site_info if not isinstance(site_info, DataFrame): @@ -591,7 +591,7 @@ def assert_metadata(requests_mock, request_url, md, site, parameter_cd, format): pcode_request_url = "https://help.waterdata.usgs.gov/code/parameter_cd_nm_query?fmt=rdb&parm_nm_cd=%25{}%25".format( param ) - with open("data/waterdata_pmcodes.txt") as text: + with open("tests/data/waterdata_pmcodes.txt") as text: requests_mock.get(pcode_request_url, text=text.read()) variable_info, _ = md.variable_info assert type(variable_info) is DataFrame diff --git a/tests/wqp_test.py b/tests/wqp_test.py index acf48c36..f36558bc 100755 --- a/tests/wqp_test.py +++ b/tests/wqp_test.py @@ -24,7 +24,7 @@ def test_get_results(requests_mock): "&characteristicName=Specific+conductance&startDateLo=05-01-2011&startDateHi=09-30-2011" "&mimeType=csv" ) - response_file_path = "data/wqp_results.txt" + response_file_path = "tests/data/wqp_results.txt" mock_request(requests_mock, request_url, response_file_path) df, md = get_results( siteid="WIDNR_WQX-10032762", @@ -48,7 +48,7 @@ def test_get_results_WQX3(requests_mock): "&mimeType=csv" "&dataProfile=fullPhysChem" ) - response_file_path = "data/wqp3_results.txt" + response_file_path = "tests/data/wqp3_results.txt" mock_request(requests_mock, request_url, response_file_path) df, md = get_results( legacy=False, @@ -71,7 +71,7 @@ def test_what_sites(requests_mock): "https://www.waterqualitydata.us/data/Station/Search?statecode=US%3A34&characteristicName=Chloride" "&mimeType=csv" ) - response_file_path = "data/wqp_sites.txt" + response_file_path = "tests/data/wqp_sites.txt" mock_request(requests_mock, request_url, response_file_path) df, md = what_sites(statecode="US:34", characteristicName="Chloride") assert type(df) is DataFrame @@ -88,7 +88,7 @@ def test_what_organizations(requests_mock): "https://www.waterqualitydata.us/data/Organization/Search?statecode=US%3A34&characteristicName=Chloride" "&mimeType=csv" ) - response_file_path = "data/wqp_organizations.txt" + response_file_path = "tests/data/wqp_organizations.txt" mock_request(requests_mock, request_url, response_file_path) df, md = what_organizations(statecode="US:34", characteristicName="Chloride") assert type(df) is DataFrame @@ -105,7 +105,7 @@ def test_what_projects(requests_mock): "https://www.waterqualitydata.us/data/Project/Search?statecode=US%3A34&characteristicName=Chloride" "&mimeType=csv" ) - response_file_path = "data/wqp_projects.txt" + response_file_path = "tests/data/wqp_projects.txt" mock_request(requests_mock, request_url, response_file_path) df, md = what_projects(statecode="US:34", characteristicName="Chloride") assert type(df) is DataFrame @@ -122,7 +122,7 @@ def test_what_activities(requests_mock): "https://www.waterqualitydata.us/data/Activity/Search?statecode=US%3A34&characteristicName=Chloride" "&mimeType=csv" ) - response_file_path = "data/wqp_activities.txt" + response_file_path = "tests/data/wqp_activities.txt" mock_request(requests_mock, request_url, response_file_path) df, md = what_activities(statecode="US:34", characteristicName="Chloride") assert type(df) is DataFrame @@ -139,7 +139,7 @@ def test_what_detection_limits(requests_mock): "https://www.waterqualitydata.us/data/ResultDetectionQuantitationLimit/Search?statecode=US%3A34&characteristicName=Chloride" "&mimeType=csv" ) - response_file_path = "data/wqp_detection_limits.txt" + response_file_path = "tests/data/wqp_detection_limits.txt" mock_request(requests_mock, request_url, response_file_path) df, md = what_detection_limits(statecode="US:34", characteristicName="Chloride") assert type(df) is DataFrame @@ -156,7 +156,7 @@ def test_what_habitat_metrics(requests_mock): "https://www.waterqualitydata.us/data/BiologicalMetric/Search?statecode=US%3A34&characteristicName=Chloride" "&mimeType=csv" ) - response_file_path = "data/wqp_habitat_metrics.txt" + response_file_path = "tests/data/wqp_habitat_metrics.txt" mock_request(requests_mock, request_url, response_file_path) df, md = what_habitat_metrics(statecode="US:34", characteristicName="Chloride") assert type(df) is DataFrame @@ -173,7 +173,7 @@ def test_what_project_weights(requests_mock): "https://www.waterqualitydata.us/data/ProjectMonitoringLocationWeighting/Search?statecode=US%3A34&characteristicName=Chloride" "&mimeType=csv" ) - response_file_path = "data/wqp_project_weights.txt" + response_file_path = "tests/data/wqp_project_weights.txt" mock_request(requests_mock, request_url, response_file_path) df, md = what_project_weights(statecode="US:34", characteristicName="Chloride") assert type(df) is DataFrame @@ -190,7 +190,7 @@ def test_what_activity_metrics(requests_mock): "https://www.waterqualitydata.us/data/ActivityMetric/Search?statecode=US%3A34&characteristicName=Chloride" "&mimeType=csv" ) - response_file_path = "data/wqp_activity_metrics.txt" + response_file_path = "tests/data/wqp_activity_metrics.txt" mock_request(requests_mock, request_url, response_file_path) df, md = what_activity_metrics(statecode="US:34", characteristicName="Chloride") assert type(df) is DataFrame