Add analyte-trend, MCL-exceedance, and monitoring-recency products#91
Merged
Conversation
Three new data products built on the per-source asset graph: - ogc_analyte_trend: per-well analyte concentration trend (Mann-Kendall + Theil-Sen, daily mean). One product per analyte; seeds nm_arsenic_trend and nm_nitrate_trend. - ogc_mcl_exceedance (nm_mcl_exceedance): one feature per well flagging drinking-water MCL exceedances. Thresholds read at run time from gs://<bucket>/config/mcl.json (source of truth); see mcl.sample.json. - ogc_monitoring_recency (nm_monitoring_recency): one feature per well with last-observation date, days_since_last, and active/stale status (water levels, stale > 365d). Implementation: - Generalize the trend dumper: dump_waterlevel_trend_collection -> dump_trend_collection(slope_units, reducer, method, parameter_name); _daily_min_series -> _daily_series(reducer min|max|mean). slope_ft_per_year -> slope_per_year + slope_units. - New dumpers dump_mcl_exceedance_collection (pivot + threshold compare) and dump_monitoring_recency_collection. - GCSResource.read_json for the MCL file; die_config treats MCL as summary mode; definitions registers the three output types (each gets a job + schedule). Offline tests cover all three. Run nm_mcl_exceedance only after uploading config/mcl.json to the products bucket. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Replace the sample with the real EPA-sourced MCL file. Values in mg/L: - arsenic 0.01, nitrate(as N) 10, fluoride 4.0, uranium 0.03 (primary) - chloride 250, sulfate 250, tds 500 (secondary) pH (6.5-8.5) omitted (a range, not a single MCL). Provenance + EPA source URLs recorded in the file. Add uranium to the nm_mcl_exceedance analyte list. Upload this file to gs://<products_bucket>/config/mcl.json. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Add a _schema block (explains every field), structured _source with EPA URLs + retrieved date, an _omitted note (pH is a range), and per-analyte units/basis/label/note. The product reads only mcl/type per analyte; _-prefixed keys and extra fields are ignored, and the whole dict travels into the output collection's mcl_thresholds as provenance. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The exceedance test is a direct magnitude comparison, so MCL and value must share units and basis. Document the nitrate pitfall (EPA MCL is as N; data may be as NO3, ~4.43x) at the comparison site. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
ogc_features.py had grown to ~628 lines mixing three concerns; the statistics cluster is the one that isn't serialization. Move the daily aggregation, qualification gate, Mann-Kendall + Theil-Sen test, the thresholds, and the method-description text into backend/trend_stats.py (pure analysis, lazily importing scipy/pymannkendall). ogc_features re-exports them so importers and dump_trend_collection's default arg keep working. ogc_features 628 -> 506 lines (serialization only); trend_stats 143. Add direct unit tests for the extracted module. 27 tests pass. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
dg kept disappearing from the project venv on env re-resolves because it was never declared, breaking the AGENTS.md-recommended `dg check defs`. Declare it so `uv run dg ...` always works. Dev/CLI only — not in the serverless requirements.txt, so the deploy PEX is unaffected. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
Your pull request is automatically being deployed to Dagster Cloud.
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Re-targets to main the data-product work that landed only on the (already-merged) #89 st2 branch — #90 was merged into that branch after #89 closed, so its content never reached main. Cherry-picked the 6 product commits onto current main (st2 + test_sources exclusion are already on main via #89).
What
Three new data products + the supporting refactors:
ogc_analyte_trend— per-well analyte concentration trend (Mann-Kendall + Theil-Sen, daily mean). Seedsnm_arsenic_trend,nm_nitrate_trend.ogc_mcl_exceedance(nm_mcl_exceedance) — per-well drinking-water MCL exceedance flags; thresholds read at run time fromgs://<bucket>/config/mcl.json(EPA-sourced, self-documenting;orchestration/config/mcl.json).ogc_monitoring_recency(nm_monitoring_recency) — per-well last-observation date, days-since, active/stale status (WL, 365d).Supporting changes
dump_trend_collection(slope_units, reducer, method, parameter_name);_daily_series(reducer=min|max|mean).dump_mcl_exceedance_collection,dump_monitoring_recency_collection.backend/trend_stats.py(pure stats: daily aggregation, qualification gate, Mann-Kendall + Theil-Sen) out of the growingogc_features.py(628 → ~500 lines).GCSResource.read_json;die_configtreats MCL as summary mode;definitionsregisters the 3 output types (each gets a job + schedule).dagster-dg-clidependency sodg check defsworks reliably.config/mcl.jsongenerated from EPA NPDWR/Secondary standards, self-documenting; nitrate as-N vs as-NO3 hazard noted in code + data.Verification
dg check defspasses; full offline suite 277 tests pass.Before running
nm_mcl_exceedanceUpload
orchestration/config/mcl.jsontogs://<products_bucket>/config/mcl.json.🤖 Generated with Claude Code