Skip to content

feat(v0.7.3): brain.prove() holdout validation + gradata demo + one-line installer#176

Merged
Gradata merged 4 commits into
mainfrom
v0.7.3-prove-holdout
May 6, 2026
Merged

feat(v0.7.3): brain.prove() holdout validation + gradata demo + one-line installer#176
Gradata merged 4 commits into
mainfrom
v0.7.3-prove-holdout

Conversation

@Gradata

@Gradata Gradata commented May 6, 2026

Copy link
Copy Markdown
Owner

v0.7.3 — brain.prove() holdout validation + gradata demo + curl|bash installer

Three council-driven shipments in one PR.

Why this PR exists

Council 3-vendor --full (7 lenses, ~22 min) on cloud pivot called holdout validation the bet-the-company technical risk for YC demo day:

"brain.prove() must do holdout validation (train on weeks 1-2, measure on fresh week-3 tasks), not paired t-tests on the same corpus you synthesized from. That is a fatal demo-day objection if not fixed first."

Audit confirmed the council was right. brain_prove() was running Mann-Kendall on the same session series the brain was trained on. Statistical malpractice for a SaaS pitch.

brain.prove() — now does holdout

New brain_prove_holdout() in _core.py:

  • Splits sessions chronologically (last 30% = test by default)
  • Welch's t-test with unequal variances allowed
  • No scipy dep — t-distribution computed via regularized incomplete beta in stdlib
  • Returns lift_pct, p_value, train_window, test_window, confidence_level, method
  • method field exposes which logic ran:
    • holdout_welch_ttest (preferred)
    • in_sample_mann_kendall_legacy (cold-start fallback for <5 sessions)

brain.prove() upgraded to call holdout when ≥5 sessions; falls back to legacy below that. Backward-compat for all callers.

7 regression tests. All pass.

gradata demo command — the install no-brainer

Council install-ICP verdict (6/7 lenses unanimous): solo Cursor/Claude Code power user wants a 60-second wow-moment that doesn't require writing any code first.

New gradata demo:

  • Pre-seeded SDR brain (200 corrections, 12 graduated rules)
  • Side-by-side "without brain" vs "with brain" email draft
  • Token reduction: 412 → 31 (92% smaller)
  • Lists top 5 rules learned
  • Zero correct() calls needed — just see the compounding

4 tests. All pass.

curl | bash installer

Council install-ICP P5/P6 unanimous: gstack-style 30-second install is non-negotiable.

New gradata-install/install.sh:

  • curl -sSL gradata.ai/install | bash compatible
  • Detects platform (macOS / Linux / WSL)
  • Detects AI tool (Claude Code, Cursor, Codex, Gemini, OpenCode, Hermes)
  • Picks best Python installer (uv > pipx > pip3)
  • Initializes brain in repo or ~/gradata/default
  • Installs hook adapter for each detected tool
  • Idempotent + safe (refuses sudo, validates platform)

Tests

Before:  4174 passing
After:   4185 passing (added 7 holdout + 4 demo tests)
ruff check + ruff format --check + pyright src/   all clean

Builds on

Co-authored-by: Codex noreply@openai.com

THE BET-THE-COMPANY FIX (council unanimous: do this first):
brain.prove() now uses Welch's t-test on a chronological train/test
split (70/30) when ≥5 sessions exist. Old in-sample Mann-Kendall is
the cold-start fallback. Caller can tell which ran via 'method' field:
  - holdout_welch_ttest        ← preferred (statistically valid)
  - in_sample_mann_kendall_legacy ← cold start fallback

NEW: brain_prove_holdout() in _core.py
  - Splits sessions chronologically (last 30% = test)
  - Welch's t-test (allows unequal variances)
  - Computes mean, std, n, lift_pct, p_value
  - confidence_level: strong/moderate/weak/insufficient
  - No scipy required — t-distribution computed via regularized
    incomplete beta function in pure stdlib
  - 7 regression tests (improvement, no-trend, regression, cold-start,
    backward-compat, determinism)

THE INSTALL NO-BRAINER (council install-ICP wedge):
NEW: gradata demo command
  - Pre-seeded SDR brain (200 corrections, 12 graduated rules)
  - Shows side-by-side 'without brain' vs 'with brain' email draft
  - Token reduction: 412 → 31 (92% smaller)
  - Lists top 5 rules learned
  - Wow-moment: zero correct() calls needed for demo
  - 4 tests; all pass

THE ONE-LINE INSTALLER:
NEW: gradata-install/install.sh
  - curl -sSL gradata.ai/install | bash compatible
  - Detects platform (macOS/Linux/WSL)
  - Detects AI tool (Claude Code, Cursor, Codex, Gemini, OpenCode, Hermes)
  - Picks best Python installer (uv > pipx > pip3)
  - Initializes brain in .git project root or ~/gradata/default
  - Installs hook adapter for each detected tool
  - Idempotent + safe (refuses sudo, validates platform)
  - Prints next-steps including 'gradata demo'

NOT IN THIS COMMIT:
- /gradata Claude Code slash command — codex sandbox couldn't reach
  gradata-plugin/ from Gradata/ workdir. Will do separately.

Council 3-vendor verdict drove all of this:
  Cloud pivot      → Convergence-as-a-Service (9/mo)
  Install ICP      → YC W26/S26 founder, Claude Code daily, 30-80 corr/wk
  Killer objection → 'Why not CLAUDE.md?' (defuse in README first 100 words)

Co-authored-by: Codex <noreply@openai.com>

@greptile-apps greptile-apps Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

@coderabbitai

coderabbitai Bot commented May 6, 2026

Copy link
Copy Markdown

Caution

Review failed

Failed to post review comments

📝 Walkthrough
  • brain.prove() holdout validation: added brain_prove_holdout() which chronologically splits sessions (default 70/30 train/test), computes mean/std/n, lift_pct, p_value using Welch’s t-test (implemented via stdlib regularized incomplete beta), train_window, test_window, confidence_level, and method; brain.prove() chooses holdout when sufficient real sessions (≥5) and falls back to an in‑sample Mann–Kendall legacy method for cold-starts.
  • Proof payload shape change: brain.prove() now includes a top-level "method" field and nests statistics under an "evidence" object while still merging/populating legacy evidence keys for backward compatibility.
  • New public API: brain_prove_holdout(brain: Brain, *, train_ratio: float = 0.7, min_train_sessions: int = 3, min_test_sessions: int = 2).
  • Demo CLI & module: added gradata._demo (DemoScenario, SDR_SCENARIO, SCENARIOS, run_demo()); CLI demo reworked to run a deterministic SDR demo (side‑by‑side before/after email draft, token reduction 412→31, top‑5 rules) and now accepts --scenario (default "sdr") instead of the prior positional target path.
  • Demo assets & packaging: included seeded SDR demo brain and 12 lessons under assets/demo_brains/sdr/, updated pyproject to include demo assets, and adjusted .gitignore to force-include the seeded DB files.
  • One-line installer: added gradata-install/install.sh for curl -sSL gradata.ai/install | bash usage; detects platform (macOS/Linux/WSL), prefers installers uv > pipx > pip3, wires AI tooling via per-tool hooks, initializes a brain in-repo or at ~/gradata/default, is idempotent and refuses sudo.
  • Tests & quality: +11 tests added (7 holdout, 4 demo); test suite increased from 4,174 → 4,185 passing; ruff and pyright checks clean.
  • Breaking/behavioral changes: no public function signatures changed (Brain.prove() signature preserved), but CLI behaviour for gradata demo changed (positional target removed in favor of --scenario), and proof result payload shape now includes method/evidence—callers relying on exact legacy payload should account for nested evidence keys.

Walkthrough

Adds a user-space installer script, a deterministic CLI demo (assets, seeding, and a run_demo entrypoint with CLI integration), and holdout-based statistical proof logic with supporting helpers and tests; packaging was updated to include demo assets.

Changes

Installer

Layer / File(s) Summary
Script & Execution Safety
Gradata/gradata-install/install.sh
New bash installer with shebang, strict options, and usage() help text.
Argument & Platform Validation
Gradata/gradata-install/install.sh
Validates CLI args, enforces non-root installs, maps OS -> platform and detects WSL.
Install Method Selection
Gradata/gradata-install/install.sh
Selects installer backend (uv, pipx, pip3) and defines install_sdk() variants; errors if none available.
Launcher & Runtime Invocation
Gradata/gradata-install/install.sh
Defines run_gradata() launcher to invoke Gradata via binary, module, or python fallback.
Tool Discovery & Wiring
Gradata/gradata-install/install.sh
Detects AI-coding tools, runs per-tool install hooks (continues on individual failures), initializes brain, and prints final status summary.

Proof System (holdout-based validation)

Layer / File(s) Summary
Imports & Statistical Helpers
src/gradata/_core.py
Adds math/statistics imports and private helpers: _correction_counts_by_session, _regularized_incomplete_beta, _student_t_two_tailed_pvalue, _welch_ttest, _window_stats.
Holdout Implementation
src/gradata/_core.py
New public brain_prove_holdout(...) computes train/test aggregates, Welch t-test p-value (SciPy or fallback), lift, and returns structured evidence with method.
Routing & Payload Shape
src/gradata/_core.py
brain_prove performs holdout-viability check, routes to holdout when sufficient sessions, otherwise uses legacy in-sample path; result payloads include top-level method and nested evidence, merging legacy evidence for compatibility.
Docstring
src/gradata/brain.py
Brain.prove() docstring updated to mention train/test holdout (Welch's t-test) and fallback.
Tests
tests/test_brain_prove_holdout.py
New pytest module covering holdout behavior, determinism, method selection, and edge cases.

Deterministic Demo System

Layer / File(s) Summary
Demo Module
src/gradata/_demo.py
New DemoScenario dataclass, SDR_SCENARIO, SCENARIOS registry, deterministic seed generation (lessons/events), token-count helpers, seed brain management, and run_demo() entry point.
Seed Assets
src/gradata/assets/demo_brains/sdr/lessons.md
Adds twelve date-anchored SDR rule blocks used as demo seed lessons.
CLI Integration
src/gradata/cli.py
Refactors demo subcommand to call run_demo; replaces positional target with --scenario (choices: sdr, coding; default: sdr) and updates cmd_demo.
Tests
tests/test_demo.py
New tests asserting demo CLI exit code, before/after output, token-reduction reporting, and seeded-asset regeneration.
Packaging & VCS
pyproject.toml, .gitignore
Adds demo asset globs and force-include mapping for system DB to wheel/sdist; .gitignore adjusted to force-include demo DB files.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    rect rgba(200,220,255,0.5)
    Participant CLI
    end
    rect rgba(200,255,200,0.5)
    Participant DemoModule
    end
    rect rgba(255,220,200,0.5)
    Participant Brain
    end
    rect rgba(240,240,240,0.5)
    Participant Filesystem
    end

    CLI->>DemoModule: run_demo(scenario)
    DemoModule->>Filesystem: ensure seed assets (lessons.md, events.jsonl, manifest)
    DemoModule->>Filesystem: create temp copy of seeded brain
    DemoModule->>Brain: open Brain from temp copy
    Brain-->>DemoModule: provide rules/state
    DemoModule->>Brain: apply rules to demo task
    Brain-->>DemoModule: returned transformed output and metrics
    DemoModule->>CLI: print before/after, token counts, top rules
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

  • Gradata/gradata#18: Modifies src/gradata/_core.py and touches brain_prove-related logic; likely related at the code level.
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 11.76% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The PR title 'feat(v0.7.3): brain.prove() holdout validation + gradata demo + one-line installer' clearly and concisely summarizes all three major changes in the changeset.
Description check ✅ Passed The PR description is comprehensive and directly related to all the changes in the changeset, providing clear context for why each change was made and how it functions.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch v0.7.3-prove-holdout

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 OpenGrep (1.20.0)

OpenGrep fatal error (exit code 2):
┌──────────────┐
│ Opengrep CLI │
└──────────────┘

�[32m✔�[39m �[1mOpengrep OSS�[0m
�[32m✔�[39m Basic security coverage for first-party code vulnerabilities.

�[1m Loading rules from local config...�[0m
[00.23][ERROR]: Error: exception Glob.Lexer.Syntax_error("malformed glob pattern: missing ']'")
Raised at Glob__Lexer.syntax_error in file "libs/glob/Lexer.mll", line 8, characters 2-26
Called from Glob__Lexer.__ocaml_lex_token_rec in file "libs/glob/Lexer.mll", line 29, characters 26-53
Cal


Comment @coderabbitai help to get the list of available commands and usage tips.

The pyproject.toml force-includes assets/demo_brains/sdr/system.db
but .gitignore had `*.db` excluding it, breaking the editable install
on CI. Override with explicit `!src/gradata/assets/demo_brains/sdr/*.db`.

(cherry picked from commit aff162e)

@greptile-apps greptile-apps Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

…essions

- brain_prove() now always populates legacy evidence keys (convergence_trend,
  effort_ratio, rule_count, correction_count, sessions, categories_converged,
  strongest_category, edit_distance_trend) even when using holdout path.
- use_holdout decision now reads real DB session count via
  _correction_counts_by_session, not mocked _get_convergence return.
  Makes legacy mock-based tests deterministic and unblocks v0.7.3 PR.

Tests: 4196 passed locally.

@greptile-apps greptile-apps Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

… compat

Windows-only failure: \u2713 (✓) was mojibake'd by cp1252 default codec when
captured by subprocess. Set PYTHONIOENCODING=utf-8 + encoding='utf-8' on
subprocess.run to make Windows behave like macOS/Linux.

@greptile-apps greptile-apps Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

@Gradata Gradata merged commit 10af6f0 into main May 6, 2026
9 checks passed
@Gradata Gradata deleted the v0.7.3-prove-holdout branch May 6, 2026 17:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant