Merge branch 'main' into joss_paper

rogerkuou · rogerkuou · commit 97717dad4f69 · 2026-02-09T07:41:11.000+01:00
diff --git a/.github/workflows/sonarcloud.yml b/.github/workflows/sonarcloud.yml
@@ -17,7 +17,7 @@ jobs:
         with:
           fetch-depth: 0  # Shallow clones should be disabled for a better relevancy of analysis
       - name: SonarQube Scan
-        uses: SonarSource/sonarqube-scan-action@v4
+        uses: SonarSource/sonarqube-scan-action@v5
         env:
           GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}  # Needed to get PR information, if any
-          SONAR_TOKEN: ${{ secrets.SONAR_TOKEN }}
+          SONAR_TOKEN: ${{ secrets.SONAR_TOKEN }}
diff --git a/docs/api_reference.md b/docs/api_reference.md
@@ -0,0 +1,25 @@
+---
+hide:
+- navigation
+---
+#
+
+Here is the API reference for the `stmtools` package.
+
+## **Space Time Matrix module**
+
+::: stmtools.stm.SpaceTimeMatrix
+
+## **I/O module**
+
+::: stmtools._io.from_csv
+
+## **Metadata schema module**
+
+::: stmtools.metadata.STMMetaData
+
+## **Utility**
+
+::: stmtools.utils.crop
+
+::: stmtools.utils.monotonic_coords
diff --git a/docs/notebooks/data/example2.csv b/docs/notebooks/data/example2.csv
diff --git a/docs/operations.md b/docs/operations.md
@@ -4,9 +4,15 @@ STMTools supports various operations on an STM.
 
 ## Enrich an STM
 
-Contextual data can be added to an STM by enrichment. At present, STMTools supports enriching an STM by static polygons.
+Contextual data can be added to an STM by enrichment. STMTools supports enriching an STM by static polygons or a dataset.
 
-For example, if soil type data (`soil_map.gpkg`) is available together with an STM, one can first read `soil_map.gpkg` using the `GeoPandas` library as a `GeoDataFrame`, then add the soil type and corresponding type ID to the STM, using the `enrich_from_polygon` function.
+### Enrich from a polygon
+
+STMTools supports enriching an STM by static polygons. For example, if soil type
+data (`soil_map.gpkg`) is available together with an STM, one can first read
+`soil_map.gpkg` using the `GeoPandas` library as a `GeoDataFrame`, then add the
+soil type and corresponding type ID to the STM, using the `enrich_from_polygon`
+function.
 
 ```python
 import geopandas as gpd
@@ -24,6 +30,34 @@ fields_to_query = ['soil_type', 'type_id']
 stmat_enriched = stmat.stm.enrich_from_polygon(path_polygon, fields_to_query)
 ```
 
+### Enrich from a dataset
+
+STMTools supports enriching an STM by a dataset or a data array. For example, if
+a dataset (`meteo_data.nc`) is available together with an STM, one can first
+read `meteo_data.nc` using the `Xarray` library, then add the dataset to the
+STM, using the `enrich_from_dataset` function.
+
+```python
+import xarray as xr
+dataset = xr.open_dataset('meteo_data.nc')
+
+# one field
+stmat_enriched = stmat.stm.enrich_from_dataset(dataset, 'temperature')
+
+# multiple fields
+stmat_enriched = stmat.stm.enrich_from_dataset(dataset, ['temperature', 'precipitation'])
+```
+
+By default `"nearest"` is used for the interpolation. But you can choose [any
+method provided by
+Xarray](https://docs.xarray.dev/en/stable/generated/xarray.Dataset.interp.html).
+For example, if you want to use `"linear"` interpolation, you can do it like
+this:
+
+```python
+stmat_enriched = stmat.stm.enrich_from_dataset(dataset, 'temperature', method='linear')
+```
+
 ## Subset an STM
 
 A subset of an STM can be obtained based on 1) thresholding on an attribute, or 2) intersection with a background polygon.
@@ -42,7 +76,7 @@ This is equivalent to Xarray filtering:
 mask = stmat['pnt_enscoh'] > 0.7
 mask = mask.compute()
 stmat_subset = stmat.where(mask, drop=True)
-``` 
+```
 
 ### Subset by a polygon
 
diff --git a/docs/stm_init.md b/docs/stm_init.md
@@ -67,7 +67,11 @@ STM can also be intiated from a csv file. During this process, the following ass
     "amp_20100119" ..., where "^amp_" is the common RE pattern;
 3. There is no temporal-only (i.e. 1-row attribute) attribute present in the csv file.
 
-Consider the [example csv data](./notebooks/data/example.csv). It can be loaded by `from_csv`:
+Consider the [example csv data](./notebooks/data/example.csv). In this file, the
+rows are points and the columns are time series of `deformation`, `amplitude`,
+and `h2ph` variables. The columns names for these variables are `d_<timestamp>`,
+`a_<timestamp>`, and `h2ph_<timestamp>` respectively. We can read this csv file
+as an STM object in xarray format using the function `from_csv()`:
 
 ```python
 import stmtools
@@ -98,6 +102,48 @@ Data variables: (12/13)
     h2ph                   (space, time) float64 dask.array<chunksize=(2500, 11), meta=np.ndarray>
 ```
 
+By default, time values are extracted from the column names assumeing that the
+names are in the format of `a_<YYYYMMDD>`, `d_<YYYYMMDD>`, and
+`h2ph_<YYYYMMDD>`. Note that only a seperator "`_`", and a date format of
+`YYYYMMDD` are supported. But if the names are different, for example
+`amp_<YYYYMMDD>`, `def_<YYYYMMDD>`, and `h2ph_<YYYYMMDD>`, like [example2 csv
+data](./notebooks/data/example2.csv), you can specify `spacetime_pattern`
+argument as a dictionay mapping RE patterns of each space-time attribute to
+corresponding variable names:
+
+```python
+import stmtools
+stm = stmtools.from_csv('example2.csv', spacetime_pattern={
+    '^amp_': 'amplitude',
+    '^def_': 'deformation',
+    '^h2ph_': 'h2ph'
+})
+```
+
+```output
+stm
+<xarray.Dataset> Size: 910kB
+Dimensions:                (space: 2500, time: 11)
+Coordinates:
+  * space                  (space) int64 20kB 0 1 2 3 4 ... 2496 2497 2498 2499
+  * time                   (time) datetime64[ns] 88B 2016-03-27 ... 2016-07-15
+    lat                    (space) float64 20kB dask.array<chunksize=(2500,), meta=np.ndarray>
+    lon                    (space) float64 20kB dask.array<chunksize=(2500,), meta=np.ndarray>
+Data variables: (12/13)
+    pnt_id                 (space) <U1 10kB dask.array<chunksize=(2500,), meta=np.ndarray>
+    pnt_flags              (space) int64 20kB dask.array<chunksize=(2500,), meta=np.ndarray>
+    pnt_line               (space) int64 20kB dask.array<chunksize=(2500,), meta=np.ndarray>
+    pnt_pixel              (space) int64 20kB dask.array<chunksize=(2500,), meta=np.ndarray>
+    pnt_height             (space) float64 20kB dask.array<chunksize=(2500,), meta=np.ndarray>
+    pnt_demheight          (space) float64 20kB dask.array<chunksize=(2500,), meta=np.ndarray>
+    ...                     ...
+    pnt_enscoh             (space) float64 20kB dask.array<chunksize=(2500,), meta=np.ndarray>
+    pnt_ampconsist         (space) float64 20kB dask.array<chunksize=(2500,), meta=np.ndarray>
+    pnt_linear             (space) float64 20kB dask.array<chunksize=(2500,), meta=np.ndarray>
+    amplitude              (space, time) float64 220kB dask.array<chunksize=(2500, 11), meta=np.ndarray>
+    deformation            (space, time) float64 220kB dask.array<chunksize=(2500, 11), meta=np.ndarray>
+    h2ph                   (space, time) float64 220kB dask.array<chunksize=(2500, 11), meta=np.ndarray>
+```
 
 ## By pixel selection from an image stack
 
diff --git a/mkdocs.yml b/mkdocs.yml
@@ -3,10 +3,10 @@ repo_url: https://github.com/tudelftgeodesy/stmtools/
 repo_name: STM Tools
 
 nav:
-  - Getting Started: 
+  - Getting Started:
     - About STM Tools: index.md
     - Installation: setup.md
-  - Usage: 
+  - Usage:
     - Initiate an STM: stm_init.md
     - Operations on STM: operations.md
     - Ordering an STM: order.md
@@ -17,6 +17,7 @@ nav:
     - Contributing Guidelines: CONTRIBUTING.md
     - Code of Conduct: CODE_OF_CONDUCT.md
   - Change Log: CHANGELOG.md
+  - API Reference: api_reference.md
 
 
 theme:
@@ -44,7 +45,7 @@ theme:
     - navigation.tabs
     - navigation.tabs.sticky
     - content.code.copy
-  
+
 plugins:
 - mkdocs-jupyter:
       include_source: True
@@ -53,16 +54,31 @@ plugins:
     handlers:
       python:
         options:
-          docstring_style: google
+          docstring_style: numpy
           docstring_options:
-            ignore_init_summary: no
-          merge_init_into_class: yes
-          show_submodules: no
+            ignore_init_summary: true
+          merge_init_into_class: true
+          docstring_section_style: list
+          show_submodules: true
+          show_root_heading: true
+          show_source: true
+          heading_level: 3
+          relative_crossrefs: true
+          parameter_headings: false
+          separate_signature: true
+          show_bases: true
+          show_signature_annotations: true
+          show_symbol_type_heading: true
+          signature_crossrefs: true
+          summary: true
+          backlinks: tree
+          scoped_crossrefs: true
 
 markdown_extensions:
   - pymdownx.highlight:
       anchor_linenums: true
   - pymdownx.superfences
+  - pymdownx.highlight
 
 extra:
   generator: false
diff --git a/pyproject.toml b/pyproject.toml
@@ -152,3 +152,6 @@ line-ending = "auto"
 
 [tool.ruff.per-file-ignores]
 "tests/**" = ["D"]
+
+[tool.ruff.pydocstyle]
+convention = "numpy"
diff --git a/stmtools/_io.py b/stmtools/_io.py
@@ -25,47 +25,47 @@ def from_csv(
     """Initiate an STM instance from a csv file.
 
     The specified csv file will be loaded using `dask.dataframe.read_csv` with a fixed blocksize.
-
     The columns of the csv file will be classified into coordinates, and data variables.
-
     This classification is performed by Regular Expression (RE) pattern matching according to
-      three variables: `space_pattern`, `spacetime_pattern` and `coords_cols`.
-
+    three variables: `space_pattern`, `spacetime_pattern` and `coords_cols`.
     The following assumptions are made to the column names of the csv file:
-        1. All columns with space-only attributes share the same RE pattern in the column names.
-          E.g. Latitude, Longitude and height columns are named as "pnt_lat", "pnt_lon" and
-          "pnt_height", sharing the same RE pattern "^pnt_";
-        2. Per space-time attribute, a common RE pattern is shared by all columns. E.g. for the
-          time-series of amplitude data, the column names are "amp_20100101", "amp_20100110",
-          "amp_20100119" ..., where "^amp_" is the common RE pattern;
-        3. There is no temporal-only (i.e. 1-row attribute) attribute present in the csv file.
-
-    `from_csv` does not retrieve time stamps based on column names. The `time` coordinate of
-      the output STM will be a monotonic integer series starting from 0.
-
-    Args:
-    ----
-        file (str | Path): Path to the csv file.
-        space_pattern (str, optional): RE pattern to match space attribute columns.
-          Defaults to "^pnt_".
-        spacetime_pattern (dict | None, optional): A dictionay mapping RE patterns of each
-          space-time attribute to corresponding variable names. Defaults to None, which means
-          the following map will be applied:
-          {"^d_": "deformation", "^a_": "amplitude", "^h2ph_": "h2ph"}.
-        coords_cols (list | dict, optional): List of columns to be used as space coordinates.
-          When `coords_cols` is a dictionary, a reaming will be performed per coordinates.
-          Defaults to None, then the following renaming will be performed:
-          "{"pnt_lat": "lat", "pnt_lon": "lon"}"
-        output_chunksize (dict | None, optional): Chunksize of the output. Defaults to None,
-          then the size of the first chunk in the DaskDataFrame will be used, up-rounding to
-          the next 5000.
-        blocksize (int | str | None, optional): Blocksize to load the csv.
-          Defaults to 200e6 (in bytes). See the documentation of
-          [dask.dataframe.read_csv](https://docs.dask.org/en/stable/generated/dask.dataframe.read_csv.html)
-
-    Returns:
+
+    1. All columns with space-only attributes share the same RE pattern in the column names.
+        E.g. Latitude, Longitude and height columns are named as "pnt_lat", "pnt_lon" and
+        "pnt_height", sharing the same RE pattern "^pnt_";
+    2. Per space-time attribute, a common RE pattern is shared by all columns. E.g. for the
+        time-series of amplitude data, the column names are "a_20100101", "a_20100110",
+        "a_20100119" ..., where "^a_" is the common RE pattern;
+    3. There is no temporal-only (i.e. 1-row attribute) attribute present in the csv file.
+
+    Parameters
+    ----------
+    file: str | Path
+        Path to the csv file.
+    space_pattern: str, optional
+        RE pattern to match space attribute columns. Defaults to "^pnt_".
+    spacetime_pattern: dict | None, optional
+        A dictionay mapping RE patterns of each space-time attribute to
+        corresponding variable names. Defaults to None, which means the
+        following map will be applied: {"^d_": "deformation", "^a_":
+        "amplitude", "^h2ph_": "h2ph"}.
+    coords_cols: list | dict, optional
+        List of columns to be used as space coordinates. When `coords_cols` is a
+        dictionary, a reaming will be performed per coordinates. Defaults to
+        None, then the following renaming will be performed: "{"pnt_lat": "lat",
+        "pnt_lon": "lon"}"
+    output_chunksize: dict | None, optional
+        Chunksize of the output. Defaults to None, then the size of the first
+        chunk in the DaskDataFrame will be used, up-rounding to the next 5000.
+    blocksize: int | str | None, optional
+        Blocksize to load the csv. Defaults to 200e6 (in bytes). See the
+        documentation of
+        [dask.dataframe.read_csv](https://docs.dask.org/en/stable/generated/dask.dataframe.read_csv.html)
+
+    Returns
     -------
-        xr.Dataset: Output STM instance
+    xr.Dataset
+        Output STM instance
 
     """
     # Load csv as Dask DataFrame
@@ -112,7 +112,7 @@ def from_csv(
                 # specify str type for point id
                 # otherwise it will be loaded as objest type
                 # then when saving to zarr, a redundant loading is needed to determine type
-                da_pnt = ddf[column].to_dask_array(lengths=chunks).astype(str)
+                da_pnt = ddf[column].astype(str).to_dask_array(lengths=chunks).astype(str)
             else:
                 da_pnt = ddf[column].to_dask_array(lengths=chunks)
             stmat = stmat.assign({column: (("space"), da_pnt)})
diff --git a/stmtools/stm.py b/stmtools/stm.py