Your model might be producing different outputs depending on which features you feed it, which seed you use, or how you bin the data: and your current validation doesn't catch this. mrv-lib tests whether your model outputs are stable across admissible specification choices, or silently depend on arbitrary modelling decisions.
mrv is a pure validation library: you supply labels from your own models, mrv measures how stable they are. Bank MRM (SR 11-7 / Basel IV) is the anchor application; the same framework deploys equally to quant-fund risk discipline (sizing-gate at matched Sharpe with reduced MaxDD) and production ML monitoring (route to fallback or human-in-the-loop on RED, regardless of whether the model is for finance or any other domain).
The framework is identification-locked, not alpha: see Paper 3e §6 (Orthogonality to the Realized-Data Class) for the empirical chain establishing that the underlying primitive is orthogonal to realized-vol metrics (joint R^2 < 0.05 on the daily axis), does not predict forward returns, and is not autocorrelated beyond its construction window. mrv-lib is therefore a governance signal, not a forecast.
| Test | Question | Status |
|---|---|---|
| Representation Invariance | Do labels change when you use different feature representations? | v0.1.0 |
| Resolution Invariance | Do labels agree across 5m / 15m / 1h / 1d frequencies? | v0.2.1 |
| Model Risk Index (MRI) | A single score combining rep + res into actionable governance signal | v0.3.0 |
Also includes: business impact function (impact_fn), continuous monitoring with alerts, disagreement attribution (LOO / frequency-pair / temporal), SR 11-7 compliant report with auto-generated findings, and a findings engine with severity classification.
Three production-grade deployment patterns share one Gate / Envelope / Tier mechanic:
| Domain | What you gate | RED action |
|---|---|---|
| Bank MRM (SR 11-7 / Basel) | Internal model output (VaR, ECL, IRB risk-weights) | Suspend primary; activate fallback; committee review within 5 days |
| Quant fund risk discipline | Strategy position sizing (deployment of an alpha you trust) | Reduce to 30% of target; flatten after 3 consecutive RED days |
| Production ML monitoring | Live ML system serving predictions (any domain: finance, recsys, fraud, healthcare, autonomous systems) | Route to fallback (rule-based, prior-version, human-in-the-loop); page on-call |
The cross-domain fit is not aspirational: the same quick_mri(close) API drives the Tier classification in all three; the difference is which downstream action the GREEN/YELLOW/RED tier triggers.
pip install mrv-libOne-liner from daily close prices (no extra data required):
import numpy as np
import pandas as pd
from mrv.mri import quick_mri
# Generate a sample close-price series (replace with your own pd.Series)
rng = np.random.default_rng(0)
dates = pd.bdate_range("2022-01-03", periods=500)
close = pd.Series(100 * np.exp(rng.normal(0, 0.01, 500).cumsum()), index=dates)
mri = quick_mri(close)
mri.report() # sub-metric breakdown
df = mri.to_dataframe()Labels-first API (supply labels from your own model):
from mrv.pipeline import validate_rep
result = validate_rep(labels={
"SPY": {
"vol+dd+var": labels_a, # 1-D integer ndarray of regime labels
"vol+var+cvar": labels_b,
}
})
print(result["assets"]["SPY"]["mean_ari"])mrv-lib/
├── config.yaml # Configuration (for convenience pipeline)
├── templates/
│ ├── template.tex # Academic report template
│ └── sr11_7_template.tex # SR 11-7 regulatory report template
├── examples/
│ ├── quickstart.ipynb
│ ├── paper1_representation_invariance.ipynb
│ ├── paper2_resolution_invariance.ipynb
│ ├── paper3e_model_risk_index.ipynb
│ └── example_california_housing.ipynb
├── src/mrv/
│ ├── pipeline.py # validate_rep() / validate_res() + convenience wrappers
│ ├── data/ # Data loading, factors, normalization (optional)
│ ├── models/ # GMM/HMM fitting
│ ├── validator/
│ │ ├── base.py # BaseValidator (subclass for custom tests)
│ │ ├── rep.py # Representation Invariance (Paper 1)
│ │ ├── res.py # Resolution Invariance (Paper 2)
│ │ ├── metrics.py # ARI, AMI, NMI, Spearman, VI
│ │ ├── attribution.py # LOO, frequency-pair, temporal hotspots
│ │ ├── findings.py # SR 11-7 findings engine
│ │ ├── monitor.py # Continuous monitoring + alerts
│ │ └── report.py # JSON -> LaTeX -> PDF
│ ├── mri/ # Model Risk Index (Paper 3)
│ │ ├── index.py # compute_mri(), compute_rolling_mri(), quick_mri()
│ │ ├── bounds.py # Ordinal bound G, SOE, zone classification
│ │ ├── spectral.py # Markov spectral gap, stress diagnostics
│ │ └── wasserstein.py # Sliced Wasserstein, MRI_cross
│ └── utils/
│ ├── config.py # YAML config loading
│ ├── download.py # IB data download
│ └── log.py # Logging setup
├── reports/ # Output (gitignored)
└── tests/ # 276 tests
Each run creates a timestamped directory under reports/:
- result.json -- Complete data (reusable for report regeneration)
- report.pdf -- Professional report with cover page, dashboard, heatmaps, and remediation plan
- summary.txt -- Plain text quick view
- {asset}_ari_heatmap.png -- ARI heatmap per asset
- {asset}_timeline.png -- Regime timeline (res validator)
- pipeline_summary.csv -- Summary metrics per asset
Based on the following PhD research:
- Zheng, Low & Wang (2026). Regime Labels Are Not Representation-Invariant (Paper 1). Submitted.
- Zheng, Low & Wang (2026). Regime Labels Are Not Resolution-Invariant (Paper 2). Submitted to Finance Research Letters.
- Zheng (2026). Inference Collapse Theory (Paper 3a). Working paper.
- Zheng (2026). Model Risk Index: Quantifying Regime Inference Collapse and Ordinal Invariance (Paper 3e). Working paper.
MIT. See LICENSE.
ModelGuard Lab -- Author: Kai Zheng.