bench: PMR-100 procedural memory retention benchmark (rebased) by Gradata · Pull Request #155 · Gradata/gradata

Gradata · 2026-05-01T16:11:43Z

Clean rebase of #148.

The single benchmark council recommended Gradata ship before launch. 100 scripted sessions, 6 correction classes, recall@1/recall@3 metrics with per-class breakdown. First baseline run (3 sessions, BEHAVIORAL class): 0% rules extracted, 0% recall. This is the work. Track on every PR. Ship at >=70% recall@1 across all classes. Run: python -m bench.pmr_100 [--quick] [-n N]

…atch Fix wrong assumption that apply_brain_rules returns a list of rule objects. It returns a formatted prompt string. Recall scoring now checks whether expected keywords appear in the rendered text. Smoke (10 sessions): still 0% recall — confirms the kernel does not graduate rules from a single correction. Multiple reinforcements needed before lessons file populates. This is by design (FSRS scoring) and is the real launch question: how many reinforcements until rules become callable?

greptile-apps

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

coderabbitai · 2026-05-01T16:11:54Z

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 7f14fe36-1123-467a-9c53-a97afa363cf7

📥 Commits

Reviewing files that changed from the base of the PR and between 5f5f87f and 7516950.

📒 Files selected for processing (4)

Gradata/.gitignore
Gradata/bench/README.md
Gradata/bench/__init__.py
Gradata/bench/pmr_100.py

📝 Walkthrough

New benchmark: Adds PMR-100 (Procedural Memory Retention) benchmarking suite with 100 scripted sessions across 6 correction classes
Metrics: Implements recall@1 and recall@3 scoring with per-class breakdowns; CLI invocation via python -m bench.pmr_100 [--quick] [-n N]
New public API: Exports Scenario, SessionResult, and BenchResult dataclasses, plus run_benchmark() and main() functions
Benchmark structure: Tests correction injection, distractor turns, probing, and keyword-based recall validation against expected outputs
Baseline data: Initial 3-session run shows 0% rule extraction and 0% recall (expected behavior per FSRS design)
Package setup: Enables bench/ as importable Python package
Result persistence: Outputs benchmark results to JSON with timestamp, config, summary stats, and per-session data
Documentation: Adds comprehensive README with workflow description, CLI usage, and scenario configuration guidance

Walkthrough

Introduces the PMR-100 "Procedural Memory Retention" benchmark suite. Adds a benchmarking script that evaluates a Brain system's ability to extract and recall procedural rules through correction injection, distractor turns, and recall scoring. Includes documentation, package setup, and CLI interface for running benchmarks with configurable parameters.

Changes

Cohort / File(s)	Summary
Package Setup `Gradata/.gitignore`, `Gradata/bench/__init__.py`	Adds ignore rule for `bench/results/` directory and marks `bench/` as a Python package to enable module execution.
Documentation `Gradata/bench/README.md`	Documents PMR-100 benchmark workflow, expected outputs, CLI commands, baseline results, and instructions for adding new scenarios.
Benchmark Implementation `Gradata/bench/pmr_100.py`	Implements complete benchmarking script with dataclasses for scenario, session, and result definitions. Includes `run_one_session()` to execute individual benchmark sessions with correction injection and distractor turns, `run_benchmark()` to aggregate results across multiple sessions, and `main()` CLI entrypoint with configurable parameters (session count, distractor count, seed, quick mode).

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Suggested labels

feature, docs

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch rebase/pmr-100-benchmark

_{Review rate limit: 1/5 review remaining, refill in 38 minutes and 9 seconds.}

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

oliver added 2 commits May 1, 2026 09:08

greptile-apps Bot reviewed May 1, 2026

View reviewed changes

Gradata merged commit b98d16c into main May 1, 2026
7 of 9 checks passed

Gradata deleted the rebase/pmr-100-benchmark branch May 1, 2026 16:12

coderabbitai Bot added docs feature labels May 1, 2026

coderabbitai Bot mentioned this pull request May 6, 2026

feat(v0.7.2): data-flow correctness — fix the 27K→35 dashboard gap #175

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bench: PMR-100 procedural memory retention benchmark (rebased)#155

bench: PMR-100 procedural memory retention benchmark (rebased)#155
Gradata merged 2 commits into
mainfrom
rebase/pmr-100-benchmark

Gradata commented May 1, 2026

Uh oh!

greptile-apps Bot left a comment

Uh oh!

coderabbitai Bot commented May 1, 2026 •

edited

Loading

Review failed

Walkthrough

Changes

Estimated code review effort

Suggested labels

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Gradata commented May 1, 2026

Uh oh!

greptile-apps Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot commented May 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Walkthrough

Changes

Estimated code review effort

Suggested labels

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai Bot commented May 1, 2026 •

edited

Loading