Skip to content

feat(eval): add agent profile cells#79

Merged
drewstone merged 1 commit into
mainfrom
feat/agent-profile-cell
May 22, 2026
Merged

feat(eval): add agent profile cells#79
drewstone merged 1 commit into
mainfrom
feat/agent-profile-cell

Conversation

@drewstone
Copy link
Copy Markdown
Contributor

@drewstone drewstone commented May 22, 2026

Summary

  • add compact AgentProfileCell builder/validator/hash helpers that fingerprint the canonical source profile instead of duplicating runtime profile shape
  • stamp optional agent profile cells onto RunRecord and runEvalCampaign outputs, with model/prompt contradiction checks
  • add run-level profile assertion and grouping helpers for longitudinal persona sweeps
  • document product adoption using sandbox AgentProfile as the source profile artifact
  • harden the existing analyst runStream test so latency jitter does not make the full suite flaky

Verification

  • pnpm exec vitest run tests/agent-profile-cell.test.ts tests/run-record.test.ts tests/eval-campaign.test.ts
  • pnpm exec vitest run src/analyst/analyst.test.ts -t "run() returns the same envelope"
  • pnpm typecheck
  • pnpm test
  • pnpm build
  • pnpm lint (passes with existing warnings)
  • git diff --check

@tangletools
Copy link
Copy Markdown
Contributor

tangletools commented May 22, 2026

✅ No Blockers — a8ec3e26

Readiness 93/100 · Confidence 97/100 · 4 findings (4 low)

kimi-code deepseek aggregate
Readiness 93 95 93
Confidence 97 98 97
Correctness 93 97 93
Security 92 98 92
Testing 91 92 91
Architecture 90 95 90

Read every changed file and callee (pre-registration.ts, errors.ts). All 1282 tests pass and tsc is clean. The PR replaces agent-profile + scorecard with a richer agent-profile-cell module, integrates it into eval-campaign and run-record with validation at both boundaries, and removes dead exports. No runtime defects found. | Comprehensive replacement of AgentProfile + Scorecard with content-addressed AgentProfileCell system. Reads every changed file, runs full test suite (1282/1282 pass), verifies typecheck + build. No bugs, no stale references, no missing error handling. Thorough normalizati

🟡 LOW isAgentProfileCell uses duck-typing rather than branded discriminator — src/agent-profile-cell.ts

The type guard at line 607 checks 'schemaVersion' in input && 'cellId' in input, which distinguishes AgentProfileCell from AgentProfileCellInput by duck-typed property presence. This works correctly with the current types (AgentProfileCellInput has neither property), but adding a field named cellId to AgentProfileCellInput in the future would silently break the type guard. A kind: 'built' | 'input' discriminator would be more robust. Low severity — current types are safe.

🟡 LOW isAgentProfileCell type guard can misidentify invalid objects — src/eval-campaign.ts

Line 607-611: isAgentProfileCell checks only 'schemaVersion' in input && 'cellId' in input. An AgentProfileCellInput that happens to carry these keys at runtime would be misidentified, causing verifyAgentProfileCell to throw rather than buildAgentProfileCell to run. In practice this only affects callers who violate the type contract, so impact is minimal.

🟡 LOW Breaking API surface removal without deprecation — src/index.ts

The PR removes public exports for scorecard, agent-profile, and pr-review-benchmark modules. While the files are gone and internal references are cleaned up, external consumers importing these will break on upgrade. At v0.33.0 this is acceptable, but the CHANGELOG should call out the breaking change explicitly.

🟡 LOW Test coverage gap for edge-case validation inputs — tests/agent-profile-cell.test.ts

The validation test at line 93-99 only checks empty profileId. Missing test cases for: empty harness id, invalid MCP transport value, malformed model object, empty prompt hash. The normalization functions handle these correctly (confirmed by code review), but no test exercises the error paths. Low severity — runtime behavior is correct; adding these cases would improve coverage confidence.


tangletools · 2026-05-22T19:40:27Z · trace

tangletools
tangletools previously approved these changes May 22, 2026
Copy link
Copy Markdown
Contributor

@tangletools tangletools left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Approved — 4 non-blocking findings — a8ec3e26

Read every changed file and callee (pre-registration.ts, errors.ts). All 1282 tests pass and tsc is clean. The PR replaces agent-profile + scorecard with a richer agent-profile-cell module, integrates it into eval-campaign and run-record with validation at both boundaries, and removes dead exports. No runtime defects found. | Comprehensive replacement of AgentProfile + Scorecard with content-addre

Full findings and scores: review summary


tangletools · 2026-05-22T19:40:27Z · trace

@drewstone drewstone force-pushed the feat/agent-profile-cell branch from a8ec3e2 to 1d06056 Compare May 22, 2026 20:29
@drewstone drewstone merged commit 92408bc into main May 22, 2026
1 check failed
drewstone added a commit that referenced this pull request May 22, 2026
src/index.ts has exported `PrReviewAuditCase`, `scorePrReviewComments`,
`summarizePrReviewBenchmark`, et al. from `./pr-review-benchmark` since
the run-record refactor landed, but `src/pr-review-benchmark.ts` and
its co-located test were authored locally and never committed. A fresh
clone fails typecheck; CI on main has been red on #78, #79, and #81.

The files were already typecheck-clean, biome-clean, and the 5
co-located tests pass. No content changes — only `git add`.
tangletools pushed a commit that referenced this pull request May 22, 2026
- Restore agent-profile, scorecard, and pr-review-benchmark as deprecated
  stubs to prevent breaking API surface changes. Re-add exports to
  index.ts with @deprecated annotations.
- Add optional seed parameter to confidenceInterval in statistics.ts
  to fix non-deterministic bootstrap (was using Math.random without
  a seed option, unlike pairedBootstrap which already had one).
- Fix silently-swallowed git error in auto-pr.ts ghCliClient: the
  git branch -D command used exec() directly and ignored ALL errors.
  Now it only ignores the expected 'branch not found' error and
  surfaces unexpected failures.
@tangletools
Copy link
Copy Markdown
Contributor

head=0e032d1a

Auto-repair succeeded — 0e032d1a

  • rounds: 2/3
  • implementer: opencode/kimi-for-coding/k2p6
  • readiness: 88 → 94
  • final verdict: no-blockers

Agent summary:

The audit findings have already been addressed in commit `0e032d1` on the current branch:

1. **Breaking API changes** — Restored `agent-profile`, `scorecard`, and `pr-review-benchmark` exports to `index.ts` with `@deprecated` annotations
2. **Non-deterministic bootstrap** — Added optional `seed` parameter to `confidenceInterval` in `statistics.ts`
3. **Silently-swallowed git error** — Fixed `auto-pr.ts` to only ignore the expected "branch not found" error and surface unexpected failures

All 1307 tests pass, typecheck is clean, and the branch is already pushed to `origin/feat/agent-profile-cell`.

tangletools auto-repair · #79

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants