Skip to content

Add analyte-trend, MCL-exceedance, and monitoring-recency products#91

Merged
jirhiker merged 6 commits into
mainfrom
feature/data-products-recover
Jun 28, 2026
Merged

Add analyte-trend, MCL-exceedance, and monitoring-recency products#91
jirhiker merged 6 commits into
mainfrom
feature/data-products-recover

Conversation

@jirhiker

Copy link
Copy Markdown
Member

Re-targets to main the data-product work that landed only on the (already-merged) #89 st2 branch — #90 was merged into that branch after #89 closed, so its content never reached main. Cherry-picked the 6 product commits onto current main (st2 + test_sources exclusion are already on main via #89).

What

Three new data products + the supporting refactors:

  1. ogc_analyte_trend — per-well analyte concentration trend (Mann-Kendall + Theil-Sen, daily mean). Seeds nm_arsenic_trend, nm_nitrate_trend.
  2. ogc_mcl_exceedance (nm_mcl_exceedance) — per-well drinking-water MCL exceedance flags; thresholds read at run time from gs://<bucket>/config/mcl.json (EPA-sourced, self-documenting; orchestration/config/mcl.json).
  3. ogc_monitoring_recency (nm_monitoring_recency) — per-well last-observation date, days-since, active/stale status (WL, 365d).

Supporting changes

  • Generalized trend engine: dump_trend_collection(slope_units, reducer, method, parameter_name); _daily_series(reducer=min|max|mean).
  • New dumpers dump_mcl_exceedance_collection, dump_monitoring_recency_collection.
  • Extracted backend/trend_stats.py (pure stats: daily aggregation, qualification gate, Mann-Kendall + Theil-Sen) out of the growing ogc_features.py (628 → ~500 lines).
  • GCSResource.read_json; die_config treats MCL as summary mode; definitions registers the 3 output types (each gets a job + schedule).
  • Added dagster-dg-cli dependency so dg check defs works reliably.
  • config/mcl.json generated from EPA NPDWR/Secondary standards, self-documenting; nitrate as-N vs as-NO3 hazard noted in code + data.

Verification

dg check defs passes; full offline suite 277 tests pass.

Before running nm_mcl_exceedance

Upload orchestration/config/mcl.json to gs://<products_bucket>/config/mcl.json.

🤖 Generated with Claude Code

jirhiker and others added 6 commits June 28, 2026 15:19
Three new data products built on the per-source asset graph:

- ogc_analyte_trend: per-well analyte concentration trend (Mann-Kendall +
  Theil-Sen, daily mean). One product per analyte; seeds nm_arsenic_trend
  and nm_nitrate_trend.
- ogc_mcl_exceedance (nm_mcl_exceedance): one feature per well flagging
  drinking-water MCL exceedances. Thresholds read at run time from
  gs://<bucket>/config/mcl.json (source of truth); see mcl.sample.json.
- ogc_monitoring_recency (nm_monitoring_recency): one feature per well
  with last-observation date, days_since_last, and active/stale status
  (water levels, stale > 365d).

Implementation:
- Generalize the trend dumper: dump_waterlevel_trend_collection ->
  dump_trend_collection(slope_units, reducer, method, parameter_name);
  _daily_min_series -> _daily_series(reducer min|max|mean). slope_ft_per_year
  -> slope_per_year + slope_units.
- New dumpers dump_mcl_exceedance_collection (pivot + threshold compare)
  and dump_monitoring_recency_collection.
- GCSResource.read_json for the MCL file; die_config treats MCL as summary
  mode; definitions registers the three output types (each gets a job +
  schedule).

Offline tests cover all three. Run nm_mcl_exceedance only after uploading
config/mcl.json to the products bucket.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Replace the sample with the real EPA-sourced MCL file. Values in mg/L:
- arsenic 0.01, nitrate(as N) 10, fluoride 4.0, uranium 0.03 (primary)
- chloride 250, sulfate 250, tds 500 (secondary)
pH (6.5-8.5) omitted (a range, not a single MCL). Provenance + EPA
source URLs recorded in the file. Add uranium to the nm_mcl_exceedance
analyte list. Upload this file to gs://<products_bucket>/config/mcl.json.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Add a _schema block (explains every field), structured _source with EPA
URLs + retrieved date, an _omitted note (pH is a range), and per-analyte
units/basis/label/note. The product reads only mcl/type per analyte;
_-prefixed keys and extra fields are ignored, and the whole dict travels
into the output collection's mcl_thresholds as provenance.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The exceedance test is a direct magnitude comparison, so MCL and value
must share units and basis. Document the nitrate pitfall (EPA MCL is as
N; data may be as NO3, ~4.43x) at the comparison site.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
ogc_features.py had grown to ~628 lines mixing three concerns; the
statistics cluster is the one that isn't serialization. Move the daily
aggregation, qualification gate, Mann-Kendall + Theil-Sen test, the
thresholds, and the method-description text into backend/trend_stats.py
(pure analysis, lazily importing scipy/pymannkendall). ogc_features
re-exports them so importers and dump_trend_collection's default arg keep
working.

ogc_features 628 -> 506 lines (serialization only); trend_stats 143.
Add direct unit tests for the extracted module. 27 tests pass.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
dg kept disappearing from the project venv on env re-resolves because it
was never declared, breaking the AGENTS.md-recommended `dg check defs`.
Declare it so `uv run dg ...` always works. Dev/CLI only — not in the
serverless requirements.txt, so the deploy PEX is unaffected.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@github-actions

github-actions Bot commented Jun 28, 2026

Copy link
Copy Markdown

Your pull request is automatically being deployed to Dagster Cloud.

Location Status Link Updated
die-orchestration View in Cloud Jun 28, 2026 at 09:24 PM (UTC)

@jirhiker jirhiker merged commit 53e2edf into main Jun 28, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant