Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/daily-security-observability.lock.yml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

39 changes: 39 additions & 0 deletions docs/adr/40126-formal-compliance-test-suite-for-forecast-engine.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
# ADR-40126: Formal Compliance Test Suite for the Monte Carlo Forecast Engine

**Date**: 2026-06-19
**Status**: Draft

## Context

The Monte Carlo forecast engine (`pkg/cli/forecast.go`, `forecast_montecarlo.go`) is governed by a formal specification under `specs/forecast-compliance-fixtures/README.md`, with numbered requirements (R-MC-*, R-FC-*, R-CLI-*) and acceptance tests T-FC-031–T-FC-040. The canonical fixture `run_summary_minimal.json` set `total_effective_tokens` but omitted `total_aic`; because `forecast.go:593` gates on `runAIC ≤ 0 → continue`, every fixture-backed run was silently skipped and produced no Monte Carlo input. This field-name gap between what the fixture published and what the engine reads (`TokenUsage.TotalAIC`) went undetected because no test bound the fixture schema to the engine's read path. The engine's numeric invariants (Bernoulli success collapse, observed-rate and yield formulas, the λ=15 Poisson crossover, the n≥10 reliability threshold, nil-on-degenerate-λ behavior) were likewise unguarded by executable tests mapped to the spec.

## Decision

We will add a formal compliance test suite, `pkg/cli/forecast_compliance_fixtures_formal_test.go`, that maps thirteen predicates (P1–P13) one-to-one onto the formal model in `specs/forecast-compliance-fixtures/README.md`. Each predicate is expressed as a table-driven Go test that exercises the real engine functions (`runMonteCarlo`, `useNormalApproximationForPoisson`, `RunForecast`) and the real fixture files, with the formal cross-reference (TLA+ invariant, F* pre/post condition, or Z3-SMT schema gap) recorded in the test doc comment. We will also fix the fixture by adding `"total_aic": 0.0054`, and P12/P13 explicitly guard against regression of that schema gap. This follows the existing repository convention for formal conformance suites (see [ADR-38166](38166-formal-conformance-test-suite-for-compiler-threat-detection.md)).

## Alternatives Considered

### Alternative 1: Fix the fixture only, no test suite
Add `total_aic` to the fixture and stop. This restores correct behavior but leaves the schema-mapping gap unguarded — the same field could be dropped or renamed again with no failing test. Rejected because the original defect was caused precisely by the absence of a test binding fixture keys to the engine's read path.

### Alternative 2: Generic unit tests without spec mapping
Write conventional unit tests covering the engine's numeric behavior without the explicit P1–P13 predicate-to-specification mapping. This would catch regressions but would not make conformance to the formal model auditable, and would diverge from the established formal-conformance pattern already used elsewhere in the codebase. Rejected for weaker traceability between requirement and test.

## Consequences

### Positive
- The fixture-to-engine field mapping (`total_aic`) is now enforced by P1/P12/P13; the silent-skip defect cannot regress unnoticed.
- Each formal requirement has a named, executable test, giving auditable traceability between `specs/forecast-compliance-fixtures/README.md` and `pkg/cli`.
- Numeric invariants (Poisson crossover at λ=15, n≥10 reliability, degenerate-λ nil handling) are now pinned to exact boundary values.

### Negative
- The test file replicates a small amount of engine logic (e.g. the duration derivation in P10 mirrors `forecast.go:573-574`); the replica must be kept in sync if the engine changes.
- Thirteen predicates add maintenance surface that must track future edits to the formal specification.

### Neutral
- Tests are gated behind `//go:build !integration` and run against in-repo fixtures, so they require no GitHub authentication.
- P11 tolerates non-validation errors from `RunForecast` (e.g. missing auth) to remain runnable in CI sandboxes, asserting only on the days-validation message.

---

*This is a DRAFT ADR generated by the [Design Decision Gate](https://github.com/github/gh-aw/actions/runs/27797333197) workflow. The PR author must review, complete, and finalize this document before the PR can merge.*
Loading
Loading