feat(v0.7.3): brain.prove() holdout validation + gradata demo + one-line installer#176
Conversation
THE BET-THE-COMPANY FIX (council unanimous: do this first):
brain.prove() now uses Welch's t-test on a chronological train/test
split (70/30) when ≥5 sessions exist. Old in-sample Mann-Kendall is
the cold-start fallback. Caller can tell which ran via 'method' field:
- holdout_welch_ttest ← preferred (statistically valid)
- in_sample_mann_kendall_legacy ← cold start fallback
NEW: brain_prove_holdout() in _core.py
- Splits sessions chronologically (last 30% = test)
- Welch's t-test (allows unequal variances)
- Computes mean, std, n, lift_pct, p_value
- confidence_level: strong/moderate/weak/insufficient
- No scipy required — t-distribution computed via regularized
incomplete beta function in pure stdlib
- 7 regression tests (improvement, no-trend, regression, cold-start,
backward-compat, determinism)
THE INSTALL NO-BRAINER (council install-ICP wedge):
NEW: gradata demo command
- Pre-seeded SDR brain (200 corrections, 12 graduated rules)
- Shows side-by-side 'without brain' vs 'with brain' email draft
- Token reduction: 412 → 31 (92% smaller)
- Lists top 5 rules learned
- Wow-moment: zero correct() calls needed for demo
- 4 tests; all pass
THE ONE-LINE INSTALLER:
NEW: gradata-install/install.sh
- curl -sSL gradata.ai/install | bash compatible
- Detects platform (macOS/Linux/WSL)
- Detects AI tool (Claude Code, Cursor, Codex, Gemini, OpenCode, Hermes)
- Picks best Python installer (uv > pipx > pip3)
- Initializes brain in .git project root or ~/gradata/default
- Installs hook adapter for each detected tool
- Idempotent + safe (refuses sudo, validates platform)
- Prints next-steps including 'gradata demo'
NOT IN THIS COMMIT:
- /gradata Claude Code slash command — codex sandbox couldn't reach
gradata-plugin/ from Gradata/ workdir. Will do separately.
Council 3-vendor verdict drove all of this:
Cloud pivot → Convergence-as-a-Service (9/mo)
Install ICP → YC W26/S26 founder, Claude Code daily, 30-80 corr/wk
Killer objection → 'Why not CLAUDE.md?' (defuse in README first 100 words)
Co-authored-by: Codex <noreply@openai.com>
There was a problem hiding this comment.
Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.
|
Caution Review failedFailed to post review comments 📝 Walkthrough
WalkthroughAdds a user-space installer script, a deterministic CLI demo (assets, seeding, and a run_demo entrypoint with CLI integration), and holdout-based statistical proof logic with supporting helpers and tests; packaging was updated to include demo assets. ChangesInstaller
Proof System (holdout-based validation)
Deterministic Demo System
Sequence Diagram(s)sequenceDiagram
autonumber
rect rgba(200,220,255,0.5)
Participant CLI
end
rect rgba(200,255,200,0.5)
Participant DemoModule
end
rect rgba(255,220,200,0.5)
Participant Brain
end
rect rgba(240,240,240,0.5)
Participant Filesystem
end
CLI->>DemoModule: run_demo(scenario)
DemoModule->>Filesystem: ensure seed assets (lessons.md, events.jsonl, manifest)
DemoModule->>Filesystem: create temp copy of seeded brain
DemoModule->>Brain: open Brain from temp copy
Brain-->>DemoModule: provide rules/state
DemoModule->>Brain: apply rules to demo task
Brain-->>DemoModule: returned transformed output and metrics
DemoModule->>CLI: print before/after, token counts, top rules
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 OpenGrep (1.20.0)OpenGrep fatal error (exit code 2): �[32m✔�[39m �[1mOpengrep OSS�[0m �[1m Loading rules from local config...�[0m Comment |
The pyproject.toml force-includes assets/demo_brains/sdr/system.db but .gitignore had `*.db` excluding it, breaking the editable install on CI. Override with explicit `!src/gradata/assets/demo_brains/sdr/*.db`. (cherry picked from commit aff162e)
There was a problem hiding this comment.
Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.
…essions - brain_prove() now always populates legacy evidence keys (convergence_trend, effort_ratio, rule_count, correction_count, sessions, categories_converged, strongest_category, edit_distance_trend) even when using holdout path. - use_holdout decision now reads real DB session count via _correction_counts_by_session, not mocked _get_convergence return. Makes legacy mock-based tests deterministic and unblocks v0.7.3 PR. Tests: 4196 passed locally.
There was a problem hiding this comment.
Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.
… compat Windows-only failure: \u2713 (✓) was mojibake'd by cp1252 default codec when captured by subprocess. Set PYTHONIOENCODING=utf-8 + encoding='utf-8' on subprocess.run to make Windows behave like macOS/Linux.
There was a problem hiding this comment.
Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.
v0.7.3 — brain.prove() holdout validation + gradata demo + curl|bash installer
Three council-driven shipments in one PR.
Why this PR exists
Council 3-vendor
--full(7 lenses, ~22 min) on cloud pivot called holdout validation the bet-the-company technical risk for YC demo day:Audit confirmed the council was right.
brain_prove()was running Mann-Kendall on the same session series the brain was trained on. Statistical malpractice for a SaaS pitch.brain.prove() — now does holdout
New
brain_prove_holdout()in_core.py:lift_pct,p_value,train_window,test_window,confidence_level,methodmethodfield exposes which logic ran:holdout_welch_ttest(preferred)in_sample_mann_kendall_legacy(cold-start fallback for <5 sessions)brain.prove()upgraded to call holdout when ≥5 sessions; falls back to legacy below that. Backward-compat for all callers.7 regression tests. All pass.
gradata demo command — the install no-brainer
Council install-ICP verdict (6/7 lenses unanimous): solo Cursor/Claude Code power user wants a 60-second wow-moment that doesn't require writing any code first.
New
gradata demo:correct()calls needed — just see the compounding4 tests. All pass.
curl | bash installer
Council install-ICP P5/P6 unanimous: gstack-style 30-second install is non-negotiable.
New
gradata-install/install.sh:curl -sSL gradata.ai/install | bashcompatible~/gradata/defaultTests
Builds on
Co-authored-by: Codex noreply@openai.com