feat(v0.7.3): brain.prove() holdout validation + gradata demo + one-line installer by Gradata · Pull Request #176 · Gradata/gradata

Gradata · 2026-05-06T16:55:23Z

v0.7.3 — brain.prove() holdout validation + gradata demo + curl|bash installer

Three council-driven shipments in one PR.

Why this PR exists

Council 3-vendor --full (7 lenses, ~22 min) on cloud pivot called holdout validation the bet-the-company technical risk for YC demo day:

"brain.prove() must do holdout validation (train on weeks 1-2, measure on fresh week-3 tasks), not paired t-tests on the same corpus you synthesized from. That is a fatal demo-day objection if not fixed first."

Audit confirmed the council was right. brain_prove() was running Mann-Kendall on the same session series the brain was trained on. Statistical malpractice for a SaaS pitch.

brain.prove() — now does holdout

New brain_prove_holdout() in _core.py:

Splits sessions chronologically (last 30% = test by default)
Welch's t-test with unequal variances allowed
No scipy dep — t-distribution computed via regularized incomplete beta in stdlib
Returns lift_pct, p_value, train_window, test_window, confidence_level, method
method field exposes which logic ran:
- holdout_welch_ttest (preferred)
- in_sample_mann_kendall_legacy (cold-start fallback for <5 sessions)

brain.prove() upgraded to call holdout when ≥5 sessions; falls back to legacy below that. Backward-compat for all callers.

7 regression tests. All pass.

gradata demo command — the install no-brainer

Council install-ICP verdict (6/7 lenses unanimous): solo Cursor/Claude Code power user wants a 60-second wow-moment that doesn't require writing any code first.

New gradata demo:

Pre-seeded SDR brain (200 corrections, 12 graduated rules)
Side-by-side "without brain" vs "with brain" email draft
Token reduction: 412 → 31 (92% smaller)
Lists top 5 rules learned
Zero correct() calls needed — just see the compounding

4 tests. All pass.

curl | bash installer

Council install-ICP P5/P6 unanimous: gstack-style 30-second install is non-negotiable.

New gradata-install/install.sh:

curl -sSL gradata.ai/install | bash compatible
Detects platform (macOS / Linux / WSL)
Detects AI tool (Claude Code, Cursor, Codex, Gemini, OpenCode, Hermes)
Picks best Python installer (uv > pipx > pip3)
Initializes brain in repo or ~/gradata/default
Installs hook adapter for each detected tool
Idempotent + safe (refuses sudo, validates platform)

Tests

Before:  4174 passing
After:   4185 passing (added 7 holdout + 4 demo tests)
ruff check + ruff format --check + pyright src/   all clean

Builds on

v0.7.0 (feat(v0.7.0): gradata_recall MCP tool + universal hook adapters + audit CLI #171): gradata_recall MCP + universal hook adapters + audit CLI
v0.7.1 (feat(v0.7.1): agent-lightning bridge — gradata tune (auto-improvement) #172): agent-lightning APO bridge
v0.7.2 (feat(v0.7.2): data-flow correctness — fix the 27K→35 dashboard gap #175): data-flow correctness
v0.7.3 (this): bet-the-company stats fix + install no-brainer

Co-authored-by: Codex noreply@openai.com

THE BET-THE-COMPANY FIX (council unanimous: do this first): brain.prove() now uses Welch's t-test on a chronological train/test split (70/30) when ≥5 sessions exist. Old in-sample Mann-Kendall is the cold-start fallback. Caller can tell which ran via 'method' field: - holdout_welch_ttest ← preferred (statistically valid) - in_sample_mann_kendall_legacy ← cold start fallback NEW: brain_prove_holdout() in _core.py - Splits sessions chronologically (last 30% = test) - Welch's t-test (allows unequal variances) - Computes mean, std, n, lift_pct, p_value - confidence_level: strong/moderate/weak/insufficient - No scipy required — t-distribution computed via regularized incomplete beta function in pure stdlib - 7 regression tests (improvement, no-trend, regression, cold-start, backward-compat, determinism) THE INSTALL NO-BRAINER (council install-ICP wedge): NEW: gradata demo command - Pre-seeded SDR brain (200 corrections, 12 graduated rules) - Shows side-by-side 'without brain' vs 'with brain' email draft - Token reduction: 412 → 31 (92% smaller) - Lists top 5 rules learned - Wow-moment: zero correct() calls needed for demo - 4 tests; all pass THE ONE-LINE INSTALLER: NEW: gradata-install/install.sh - curl -sSL gradata.ai/install | bash compatible - Detects platform (macOS/Linux/WSL) - Detects AI tool (Claude Code, Cursor, Codex, Gemini, OpenCode, Hermes) - Picks best Python installer (uv > pipx > pip3) - Initializes brain in .git project root or ~/gradata/default - Installs hook adapter for each detected tool - Idempotent + safe (refuses sudo, validates platform) - Prints next-steps including 'gradata demo' NOT IN THIS COMMIT: - /gradata Claude Code slash command — codex sandbox couldn't reach gradata-plugin/ from Gradata/ workdir. Will do separately. Council 3-vendor verdict drove all of this: Cloud pivot → Convergence-as-a-Service (9/mo) Install ICP → YC W26/S26 founder, Claude Code daily, 30-80 corr/wk Killer objection → 'Why not CLAUDE.md?' (defuse in README first 100 words) Co-authored-by: Codex <noreply@openai.com>

greptile-apps

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

coderabbitai · 2026-05-06T16:55:36Z

Caution

Review failed

Failed to post review comments

📝 Walkthrough

brain.prove() holdout validation: added brain_prove_holdout() which chronologically splits sessions (default 70/30 train/test), computes mean/std/n, lift_pct, p_value using Welch’s t-test (implemented via stdlib regularized incomplete beta), train_window, test_window, confidence_level, and method; brain.prove() chooses holdout when sufficient real sessions (≥5) and falls back to an in‑sample Mann–Kendall legacy method for cold-starts.
Proof payload shape change: brain.prove() now includes a top-level "method" field and nests statistics under an "evidence" object while still merging/populating legacy evidence keys for backward compatibility.
New public API: brain_prove_holdout(brain: Brain, *, train_ratio: float = 0.7, min_train_sessions: int = 3, min_test_sessions: int = 2).
Demo CLI & module: added gradata._demo (DemoScenario, SDR_SCENARIO, SCENARIOS, run_demo()); CLI demo reworked to run a deterministic SDR demo (side‑by‑side before/after email draft, token reduction 412→31, top‑5 rules) and now accepts --scenario (default "sdr") instead of the prior positional target path.
Demo assets & packaging: included seeded SDR demo brain and 12 lessons under assets/demo_brains/sdr/, updated pyproject to include demo assets, and adjusted .gitignore to force-include the seeded DB files.
One-line installer: added gradata-install/install.sh for curl -sSL gradata.ai/install | bash usage; detects platform (macOS/Linux/WSL), prefers installers uv > pipx > pip3, wires AI tooling via per-tool hooks, initializes a brain in-repo or at ~/gradata/default, is idempotent and refuses sudo.
Tests & quality: +11 tests added (7 holdout, 4 demo); test suite increased from 4,174 → 4,185 passing; ruff and pyright checks clean.
Breaking/behavioral changes: no public function signatures changed (Brain.prove() signature preserved), but CLI behaviour for gradata demo changed (positional target removed in favor of --scenario), and proof result payload shape now includes method/evidence—callers relying on exact legacy payload should account for nested evidence keys.

Walkthrough

Adds a user-space installer script, a deterministic CLI demo (assets, seeding, and a run_demo entrypoint with CLI integration), and holdout-based statistical proof logic with supporting helpers and tests; packaging was updated to include demo assets.

Changes

Installer

Layer / File(s)	Summary
Script & Execution Safety `Gradata/gradata-install/install.sh`	New bash installer with shebang, strict options, and `usage()` help text.
Argument & Platform Validation `Gradata/gradata-install/install.sh`	Validates CLI args, enforces non-root installs, maps OS -> platform and detects WSL.
Install Method Selection `Gradata/gradata-install/install.sh`	Selects installer backend (uv, pipx, pip3) and defines `install_sdk()` variants; errors if none available.
Launcher & Runtime Invocation `Gradata/gradata-install/install.sh`	Defines `run_gradata()` launcher to invoke Gradata via binary, module, or python fallback.
Tool Discovery & Wiring `Gradata/gradata-install/install.sh`	Detects AI-coding tools, runs per-tool install hooks (continues on individual failures), initializes brain, and prints final status summary.

Proof System (holdout-based validation)

Layer / File(s)	Summary
Imports & Statistical Helpers `src/gradata/_core.py`	Adds `math`/`statistics` imports and private helpers: `_correction_counts_by_session`, `_regularized_incomplete_beta`, `_student_t_two_tailed_pvalue`, `_welch_ttest`, `_window_stats`.
Holdout Implementation `src/gradata/_core.py`	New public `brain_prove_holdout(...)` computes train/test aggregates, Welch t-test p-value (SciPy or fallback), lift, and returns structured `evidence` with `method`.
Routing & Payload Shape `src/gradata/_core.py`	`brain_prove` performs holdout-viability check, routes to holdout when sufficient sessions, otherwise uses legacy in-sample path; result payloads include top-level `method` and nested `evidence`, merging legacy evidence for compatibility.
Docstring `src/gradata/brain.py`	`Brain.prove()` docstring updated to mention train/test holdout (Welch's t-test) and fallback.
Tests `tests/test_brain_prove_holdout.py`	New pytest module covering holdout behavior, determinism, method selection, and edge cases.

Deterministic Demo System

Layer / File(s)	Summary
Demo Module `src/gradata/_demo.py`	New `DemoScenario` dataclass, `SDR_SCENARIO`, `SCENARIOS` registry, deterministic seed generation (lessons/events), token-count helpers, seed brain management, and `run_demo()` entry point.
Seed Assets `src/gradata/assets/demo_brains/sdr/lessons.md`	Adds twelve date-anchored SDR rule blocks used as demo seed lessons.
CLI Integration `src/gradata/cli.py`	Refactors `demo` subcommand to call `run_demo`; replaces positional target with `--scenario` (choices: sdr, coding; default: sdr) and updates `cmd_demo`.
Tests `tests/test_demo.py`	New tests asserting demo CLI exit code, before/after output, token-reduction reporting, and seeded-asset regeneration.
Packaging & VCS `pyproject.toml`, `.gitignore`	Adds demo asset globs and force-include mapping for system DB to wheel/sdist; `.gitignore` adjusted to force-include demo DB files.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    rect rgba(200,220,255,0.5)
    Participant CLI
    end
    rect rgba(200,255,200,0.5)
    Participant DemoModule
    end
    rect rgba(255,220,200,0.5)
    Participant Brain
    end
    rect rgba(240,240,240,0.5)
    Participant Filesystem
    end

    CLI->>DemoModule: run_demo(scenario)
    DemoModule->>Filesystem: ensure seed assets (lessons.md, events.jsonl, manifest)
    DemoModule->>Filesystem: create temp copy of seeded brain
    DemoModule->>Brain: open Brain from temp copy
    Brain-->>DemoModule: provide rules/state
    DemoModule->>Brain: apply rules to demo task
    Brain-->>DemoModule: returned transformed output and metrics
    DemoModule->>CLI: print before/after, token counts, top rules

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Gradata/gradata#18: Modifies src/gradata/_core.py and touches brain_prove-related logic; likely related at the code level.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 11.76% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The PR title 'feat(v0.7.3): brain.prove() holdout validation + gradata demo + one-line installer' clearly and concisely summarizes all three major changes in the changeset.
Description check	✅ Passed	The PR description is comprehensive and directly related to all the changes in the changeset, providing clear context for why each change was made and how it functions.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch v0.7.3-prove-holdout

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 OpenGrep (1.20.0)

OpenGrep fatal error (exit code 2):
┌──────────────┐
│ Opengrep CLI │
└──────────────┘

�[32m✔�[39m �[1mOpengrep OSS�[0m
�[32m✔�[39m Basic security coverage for first-party code vulnerabilities.

�[1m Loading rules from local config...�[0m
[00.23][ERROR]: Error: exception Glob.Lexer.Syntax_error("malformed glob pattern: missing ']'")
Raised at Glob__Lexer.syntax_error in file "libs/glob/Lexer.mll", line 8, characters 2-26
Called from Glob__Lexer.__ocaml_lex_token_rec in file "libs/glob/Lexer.mll", line 29, characters 26-53
Cal

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

The pyproject.toml force-includes assets/demo_brains/sdr/system.db but .gitignore had `*.db` excluding it, breaking the editable install on CI. Override with explicit `!src/gradata/assets/demo_brains/sdr/*.db`. (cherry picked from commit aff162e)

greptile-apps

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

…essions - brain_prove() now always populates legacy evidence keys (convergence_trend, effort_ratio, rule_count, correction_count, sessions, categories_converged, strongest_category, edit_distance_trend) even when using holdout path. - use_holdout decision now reads real DB session count via _correction_counts_by_session, not mocked _get_convergence return. Makes legacy mock-based tests deterministic and unblocks v0.7.3 PR. Tests: 4196 passed locally.

greptile-apps

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

… compat Windows-only failure: \u2713 (✓) was mojibake'd by cp1252 default codec when captured by subprocess. Set PYTHONIOENCODING=utf-8 + encoding='utf-8' on subprocess.run to make Windows behave like macOS/Linux.

greptile-apps

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

greptile-apps Bot reviewed May 6, 2026

View reviewed changes

coderabbitai Bot added feature breaking-change labels May 6, 2026

greptile-apps Bot reviewed May 6, 2026

View reviewed changes

Gradata merged commit 10af6f0 into main May 6, 2026
9 checks passed

Gradata deleted the v0.7.3-prove-holdout branch May 6, 2026 17:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(v0.7.3): brain.prove() holdout validation + gradata demo + one-line installer#176

feat(v0.7.3): brain.prove() holdout validation + gradata demo + one-line installer#176
Gradata merged 4 commits into
mainfrom
v0.7.3-prove-holdout

Gradata commented May 6, 2026

Uh oh!

greptile-apps Bot left a comment

Uh oh!

coderabbitai Bot commented May 6, 2026 •

edited

Loading

Review failed

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

❌ Failed checks (1 warning)

Uh oh!

greptile-apps Bot left a comment

Uh oh!

greptile-apps Bot left a comment

Uh oh!

greptile-apps Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Gradata commented May 6, 2026

v0.7.3 — brain.prove() holdout validation + gradata demo + curl|bash installer

Why this PR exists

brain.prove() — now does holdout

gradata demo command — the install no-brainer

curl | bash installer

Tests

Builds on

Uh oh!

greptile-apps Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot commented May 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

❌ Failed checks (1 warning)

Uh oh!

greptile-apps Bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai Bot commented May 6, 2026 •

edited

Loading