Skip to content

feat: self-contained HTML dashboard for eval results#584

Merged
christso merged 3 commits intomainfrom
feat/562-html-dashboard
Mar 14, 2026
Merged

feat: self-contained HTML dashboard for eval results#584
christso merged 3 commits intomainfrom
feat/562-html-dashboard

Conversation

@christso
Copy link
Copy Markdown
Collaborator

Summary

Implements #562 (MVP scope) — adds html-writer.ts that produces a self-contained .html report file with all CSS/JS inlined. Zero external dependencies, works offline.

Changes

  • apps/cli/src/commands/eval/html-writer.ts — New HtmlWriter class implementing OutputWriter
  • apps/cli/src/commands/eval/output-writer.ts — Registered .html/.htm extension and html format
  • apps/cli/src/commands/eval/commands/run.ts — Updated CLI help text

Dashboard Views

Overview Tab

  • Stat cards: total tests, passed, failed, errors, pass rate, duration, tokens, cost
  • Multi-target comparison table with pass rate bars
  • Score distribution histogram

Test Cases Tab

  • Filterable by status (pass/fail/error), target, and search by test ID
  • Sortable columns with direction indicators
  • Per-evaluator score columns, color-coded (green ≥90%, yellow ≥50%, red <50%)
  • Expandable detail rows showing:
    • Input/output panels
    • Evaluator results with reasoning
    • Passed/failed expectations
    • Error details
    • Metadata (tokens, duration, model, cost, timestamp)

Technical Details

  • Meta-refresh (2s) during live eval runs, removed on close()
  • Thread-safe concurrent writes via Mutex
  • No external network requests from generated HTML
  • Zero new runtime dependencies
  • Follows existing writer patterns (JsonWriter, JsonlWriter)

Usage

agentv eval run --output-format html -o report.html
# or auto-detected from extension:
agentv eval run -o report.html

Closes #562 (MVP scope)

Add html-writer.ts implementing OutputWriter interface that produces a
single self-contained .html report file with all CSS/JS inlined.

Views:
- Overview: stat cards (total/passed/failed/errors/pass rate/duration/tokens/cost),
  multi-target comparison table, score distribution histogram
- Test Cases: filterable/sortable table with per-evaluator score columns,
  expandable detail rows showing input/output, evaluator reasoning,
  expectations, and metadata

Features:
- Tab-based navigation between Overview and Test Cases views
- Filter by status (pass/fail/error), target, and search by test ID
- Sortable columns with direction indicators
- Color-coded scores (green ≥90%, yellow ≥50%, red <50%)
- Meta-refresh (2s) during live eval runs, removed on close
- Thread-safe concurrent writes via Mutex
- No external network requests, works fully offline
- Zero new runtime dependencies

Registration:
- .html/.htm extension auto-detection in createWriterFromPath
- 'html' format option in createOutputWriter
- Updated CLI help text for --output and --output-format flags

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@cloudflare-workers-and-pages
Copy link
Copy Markdown

cloudflare-workers-and-pages bot commented Mar 14, 2026

Deploying agentv with  Cloudflare Pages  Cloudflare Pages

Latest commit: 2ea9afc
Status:⚡️  Build in progress...

View logs

christso and others added 2 commits March 14, 2026 08:08
agentv convert results.jsonl -o report.html

When --out ends in .html or .htm, routes through HtmlWriter instead of
the default YAML conversion. Falls back to a .html-suffixed path when
no --out is given and the output extension is html.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@christso christso marked this pull request as ready for review March 14, 2026 09:11
@christso christso merged commit 4c73d82 into main Mar 14, 2026
1 check was pending
@christso christso deleted the feat/562-html-dashboard branch March 14, 2026 09:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: self-contained HTML dashboard with meta-refresh (runs, benchmarks, traces)

1 participant