Skip to content

Commit 97717da

Browse files
committed
Merge branch 'main' into joss_paper
2 parents e9d0fef + fad314b commit 97717da

File tree

9 files changed

+2719
-90
lines changed

9 files changed

+2719
-90
lines changed

.github/workflows/sonarcloud.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ jobs:
1717
with:
1818
fetch-depth: 0 # Shallow clones should be disabled for a better relevancy of analysis
1919
- name: SonarQube Scan
20-
uses: SonarSource/sonarqube-scan-action@v4
20+
uses: SonarSource/sonarqube-scan-action@v5
2121
env:
2222
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} # Needed to get PR information, if any
23-
SONAR_TOKEN: ${{ secrets.SONAR_TOKEN }}
23+
SONAR_TOKEN: ${{ secrets.SONAR_TOKEN }}

docs/api_reference.md

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
---
2+
hide:
3+
- navigation
4+
---
5+
#
6+
7+
Here is the API reference for the `stmtools` package.
8+
9+
## **Space Time Matrix module**
10+
11+
::: stmtools.stm.SpaceTimeMatrix
12+
13+
## **I/O module**
14+
15+
::: stmtools._io.from_csv
16+
17+
## **Metadata schema module**
18+
19+
::: stmtools.metadata.STMMetaData
20+
21+
## **Utility**
22+
23+
::: stmtools.utils.crop
24+
25+
::: stmtools.utils.monotonic_coords

docs/notebooks/data/example2.csv

Lines changed: 2501 additions & 0 deletions
Large diffs are not rendered by default.

docs/operations.md

Lines changed: 37 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -4,9 +4,15 @@ STMTools supports various operations on an STM.
44

55
## Enrich an STM
66

7-
Contextual data can be added to an STM by enrichment. At present, STMTools supports enriching an STM by static polygons.
7+
Contextual data can be added to an STM by enrichment. STMTools supports enriching an STM by static polygons or a dataset.
88

9-
For example, if soil type data (`soil_map.gpkg`) is available together with an STM, one can first read `soil_map.gpkg` using the `GeoPandas` library as a `GeoDataFrame`, then add the soil type and corresponding type ID to the STM, using the `enrich_from_polygon` function.
9+
### Enrich from a polygon
10+
11+
STMTools supports enriching an STM by static polygons. For example, if soil type
12+
data (`soil_map.gpkg`) is available together with an STM, one can first read
13+
`soil_map.gpkg` using the `GeoPandas` library as a `GeoDataFrame`, then add the
14+
soil type and corresponding type ID to the STM, using the `enrich_from_polygon`
15+
function.
1016

1117
```python
1218
import geopandas as gpd
@@ -24,6 +30,34 @@ fields_to_query = ['soil_type', 'type_id']
2430
stmat_enriched = stmat.stm.enrich_from_polygon(path_polygon, fields_to_query)
2531
```
2632

33+
### Enrich from a dataset
34+
35+
STMTools supports enriching an STM by a dataset or a data array. For example, if
36+
a dataset (`meteo_data.nc`) is available together with an STM, one can first
37+
read `meteo_data.nc` using the `Xarray` library, then add the dataset to the
38+
STM, using the `enrich_from_dataset` function.
39+
40+
```python
41+
import xarray as xr
42+
dataset = xr.open_dataset('meteo_data.nc')
43+
44+
# one field
45+
stmat_enriched = stmat.stm.enrich_from_dataset(dataset, 'temperature')
46+
47+
# multiple fields
48+
stmat_enriched = stmat.stm.enrich_from_dataset(dataset, ['temperature', 'precipitation'])
49+
```
50+
51+
By default `"nearest"` is used for the interpolation. But you can choose [any
52+
method provided by
53+
Xarray](https://docs.xarray.dev/en/stable/generated/xarray.Dataset.interp.html).
54+
For example, if you want to use `"linear"` interpolation, you can do it like
55+
this:
56+
57+
```python
58+
stmat_enriched = stmat.stm.enrich_from_dataset(dataset, 'temperature', method='linear')
59+
```
60+
2761
## Subset an STM
2862

2963
A subset of an STM can be obtained based on 1) thresholding on an attribute, or 2) intersection with a background polygon.
@@ -42,7 +76,7 @@ This is equivalent to Xarray filtering:
4276
mask = stmat['pnt_enscoh'] > 0.7
4377
mask = mask.compute()
4478
stmat_subset = stmat.where(mask, drop=True)
45-
```
79+
```
4680

4781
### Subset by a polygon
4882

docs/stm_init.md

Lines changed: 47 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -67,7 +67,11 @@ STM can also be intiated from a csv file. During this process, the following ass
6767
"amp_20100119" ..., where "^amp_" is the common RE pattern;
6868
3. There is no temporal-only (i.e. 1-row attribute) attribute present in the csv file.
6969

70-
Consider the [example csv data](./notebooks/data/example.csv). It can be loaded by `from_csv`:
70+
Consider the [example csv data](./notebooks/data/example.csv). In this file, the
71+
rows are points and the columns are time series of `deformation`, `amplitude`,
72+
and `h2ph` variables. The columns names for these variables are `d_<timestamp>`,
73+
`a_<timestamp>`, and `h2ph_<timestamp>` respectively. We can read this csv file
74+
as an STM object in xarray format using the function `from_csv()`:
7175

7276
```python
7377
import stmtools
@@ -98,6 +102,48 @@ Data variables: (12/13)
98102
h2ph (space, time) float64 dask.array<chunksize=(2500, 11), meta=np.ndarray>
99103
```
100104

105+
By default, time values are extracted from the column names assumeing that the
106+
names are in the format of `a_<YYYYMMDD>`, `d_<YYYYMMDD>`, and
107+
`h2ph_<YYYYMMDD>`. Note that only a seperator "`_`", and a date format of
108+
`YYYYMMDD` are supported. But if the names are different, for example
109+
`amp_<YYYYMMDD>`, `def_<YYYYMMDD>`, and `h2ph_<YYYYMMDD>`, like [example2 csv
110+
data](./notebooks/data/example2.csv), you can specify `spacetime_pattern`
111+
argument as a dictionay mapping RE patterns of each space-time attribute to
112+
corresponding variable names:
113+
114+
```python
115+
import stmtools
116+
stm = stmtools.from_csv('example2.csv', spacetime_pattern={
117+
'^amp_': 'amplitude',
118+
'^def_': 'deformation',
119+
'^h2ph_': 'h2ph'
120+
})
121+
```
122+
123+
```output
124+
stm
125+
<xarray.Dataset> Size: 910kB
126+
Dimensions: (space: 2500, time: 11)
127+
Coordinates:
128+
* space (space) int64 20kB 0 1 2 3 4 ... 2496 2497 2498 2499
129+
* time (time) datetime64[ns] 88B 2016-03-27 ... 2016-07-15
130+
lat (space) float64 20kB dask.array<chunksize=(2500,), meta=np.ndarray>
131+
lon (space) float64 20kB dask.array<chunksize=(2500,), meta=np.ndarray>
132+
Data variables: (12/13)
133+
pnt_id (space) <U1 10kB dask.array<chunksize=(2500,), meta=np.ndarray>
134+
pnt_flags (space) int64 20kB dask.array<chunksize=(2500,), meta=np.ndarray>
135+
pnt_line (space) int64 20kB dask.array<chunksize=(2500,), meta=np.ndarray>
136+
pnt_pixel (space) int64 20kB dask.array<chunksize=(2500,), meta=np.ndarray>
137+
pnt_height (space) float64 20kB dask.array<chunksize=(2500,), meta=np.ndarray>
138+
pnt_demheight (space) float64 20kB dask.array<chunksize=(2500,), meta=np.ndarray>
139+
... ...
140+
pnt_enscoh (space) float64 20kB dask.array<chunksize=(2500,), meta=np.ndarray>
141+
pnt_ampconsist (space) float64 20kB dask.array<chunksize=(2500,), meta=np.ndarray>
142+
pnt_linear (space) float64 20kB dask.array<chunksize=(2500,), meta=np.ndarray>
143+
amplitude (space, time) float64 220kB dask.array<chunksize=(2500, 11), meta=np.ndarray>
144+
deformation (space, time) float64 220kB dask.array<chunksize=(2500, 11), meta=np.ndarray>
145+
h2ph (space, time) float64 220kB dask.array<chunksize=(2500, 11), meta=np.ndarray>
146+
```
101147

102148
## By pixel selection from an image stack
103149

mkdocs.yml

Lines changed: 23 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -3,10 +3,10 @@ repo_url: https://github.com/tudelftgeodesy/stmtools/
33
repo_name: STM Tools
44

55
nav:
6-
- Getting Started:
6+
- Getting Started:
77
- About STM Tools: index.md
88
- Installation: setup.md
9-
- Usage:
9+
- Usage:
1010
- Initiate an STM: stm_init.md
1111
- Operations on STM: operations.md
1212
- Ordering an STM: order.md
@@ -17,6 +17,7 @@ nav:
1717
- Contributing Guidelines: CONTRIBUTING.md
1818
- Code of Conduct: CODE_OF_CONDUCT.md
1919
- Change Log: CHANGELOG.md
20+
- API Reference: api_reference.md
2021

2122

2223
theme:
@@ -44,7 +45,7 @@ theme:
4445
- navigation.tabs
4546
- navigation.tabs.sticky
4647
- content.code.copy
47-
48+
4849
plugins:
4950
- mkdocs-jupyter:
5051
include_source: True
@@ -53,16 +54,31 @@ plugins:
5354
handlers:
5455
python:
5556
options:
56-
docstring_style: google
57+
docstring_style: numpy
5758
docstring_options:
58-
ignore_init_summary: no
59-
merge_init_into_class: yes
60-
show_submodules: no
59+
ignore_init_summary: true
60+
merge_init_into_class: true
61+
docstring_section_style: list
62+
show_submodules: true
63+
show_root_heading: true
64+
show_source: true
65+
heading_level: 3
66+
relative_crossrefs: true
67+
parameter_headings: false
68+
separate_signature: true
69+
show_bases: true
70+
show_signature_annotations: true
71+
show_symbol_type_heading: true
72+
signature_crossrefs: true
73+
summary: true
74+
backlinks: tree
75+
scoped_crossrefs: true
6176

6277
markdown_extensions:
6378
- pymdownx.highlight:
6479
anchor_linenums: true
6580
- pymdownx.superfences
81+
- pymdownx.highlight
6682

6783
extra:
6884
generator: false

pyproject.toml

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -152,3 +152,6 @@ line-ending = "auto"
152152

153153
[tool.ruff.per-file-ignores]
154154
"tests/**" = ["D"]
155+
156+
[tool.ruff.pydocstyle]
157+
convention = "numpy"

stmtools/_io.py

Lines changed: 38 additions & 38 deletions
Original file line numberDiff line numberDiff line change
@@ -25,47 +25,47 @@ def from_csv(
2525
"""Initiate an STM instance from a csv file.
2626
2727
The specified csv file will be loaded using `dask.dataframe.read_csv` with a fixed blocksize.
28-
2928
The columns of the csv file will be classified into coordinates, and data variables.
30-
3129
This classification is performed by Regular Expression (RE) pattern matching according to
32-
three variables: `space_pattern`, `spacetime_pattern` and `coords_cols`.
33-
30+
three variables: `space_pattern`, `spacetime_pattern` and `coords_cols`.
3431
The following assumptions are made to the column names of the csv file:
35-
1. All columns with space-only attributes share the same RE pattern in the column names.
36-
E.g. Latitude, Longitude and height columns are named as "pnt_lat", "pnt_lon" and
37-
"pnt_height", sharing the same RE pattern "^pnt_";
38-
2. Per space-time attribute, a common RE pattern is shared by all columns. E.g. for the
39-
time-series of amplitude data, the column names are "amp_20100101", "amp_20100110",
40-
"amp_20100119" ..., where "^amp_" is the common RE pattern;
41-
3. There is no temporal-only (i.e. 1-row attribute) attribute present in the csv file.
42-
43-
`from_csv` does not retrieve time stamps based on column names. The `time` coordinate of
44-
the output STM will be a monotonic integer series starting from 0.
45-
46-
Args:
47-
----
48-
file (str | Path): Path to the csv file.
49-
space_pattern (str, optional): RE pattern to match space attribute columns.
50-
Defaults to "^pnt_".
51-
spacetime_pattern (dict | None, optional): A dictionay mapping RE patterns of each
52-
space-time attribute to corresponding variable names. Defaults to None, which means
53-
the following map will be applied:
54-
{"^d_": "deformation", "^a_": "amplitude", "^h2ph_": "h2ph"}.
55-
coords_cols (list | dict, optional): List of columns to be used as space coordinates.
56-
When `coords_cols` is a dictionary, a reaming will be performed per coordinates.
57-
Defaults to None, then the following renaming will be performed:
58-
"{"pnt_lat": "lat", "pnt_lon": "lon"}"
59-
output_chunksize (dict | None, optional): Chunksize of the output. Defaults to None,
60-
then the size of the first chunk in the DaskDataFrame will be used, up-rounding to
61-
the next 5000.
62-
blocksize (int | str | None, optional): Blocksize to load the csv.
63-
Defaults to 200e6 (in bytes). See the documentation of
64-
[dask.dataframe.read_csv](https://docs.dask.org/en/stable/generated/dask.dataframe.read_csv.html)
65-
66-
Returns:
32+
33+
1. All columns with space-only attributes share the same RE pattern in the column names.
34+
E.g. Latitude, Longitude and height columns are named as "pnt_lat", "pnt_lon" and
35+
"pnt_height", sharing the same RE pattern "^pnt_";
36+
2. Per space-time attribute, a common RE pattern is shared by all columns. E.g. for the
37+
time-series of amplitude data, the column names are "a_20100101", "a_20100110",
38+
"a_20100119" ..., where "^a_" is the common RE pattern;
39+
3. There is no temporal-only (i.e. 1-row attribute) attribute present in the csv file.
40+
41+
Parameters
42+
----------
43+
file: str | Path
44+
Path to the csv file.
45+
space_pattern: str, optional
46+
RE pattern to match space attribute columns. Defaults to "^pnt_".
47+
spacetime_pattern: dict | None, optional
48+
A dictionay mapping RE patterns of each space-time attribute to
49+
corresponding variable names. Defaults to None, which means the
50+
following map will be applied: {"^d_": "deformation", "^a_":
51+
"amplitude", "^h2ph_": "h2ph"}.
52+
coords_cols: list | dict, optional
53+
List of columns to be used as space coordinates. When `coords_cols` is a
54+
dictionary, a reaming will be performed per coordinates. Defaults to
55+
None, then the following renaming will be performed: "{"pnt_lat": "lat",
56+
"pnt_lon": "lon"}"
57+
output_chunksize: dict | None, optional
58+
Chunksize of the output. Defaults to None, then the size of the first
59+
chunk in the DaskDataFrame will be used, up-rounding to the next 5000.
60+
blocksize: int | str | None, optional
61+
Blocksize to load the csv. Defaults to 200e6 (in bytes). See the
62+
documentation of
63+
[dask.dataframe.read_csv](https://docs.dask.org/en/stable/generated/dask.dataframe.read_csv.html)
64+
65+
Returns
6766
-------
68-
xr.Dataset: Output STM instance
67+
xr.Dataset
68+
Output STM instance
6969
7070
"""
7171
# Load csv as Dask DataFrame
@@ -112,7 +112,7 @@ def from_csv(
112112
# specify str type for point id
113113
# otherwise it will be loaded as objest type
114114
# then when saving to zarr, a redundant loading is needed to determine type
115-
da_pnt = ddf[column].to_dask_array(lengths=chunks).astype(str)
115+
da_pnt = ddf[column].astype(str).to_dask_array(lengths=chunks).astype(str)
116116
else:
117117
da_pnt = ddf[column].to_dask_array(lengths=chunks)
118118
stmat = stmat.assign({column: (("space"), da_pnt)})

0 commit comments

Comments
 (0)