Skip to content

stevibe/BenchLocal

Repository files navigation

BenchLocal logo

BenchLocal

Test LLMs on real tasks. Compare models side-by-side.

Website · Download · Watch demo · Build a Bench Pack

BenchLocal desktop app preview

BenchLocal is a local-first desktop app for running, comparing, and managing installable LLM Bench Packs against local or remote models.

Official Bench Packs today:

BenchLocal owns the shared desktop runtime:

  • provider configuration
  • model registry
  • Bench Pack install and update flow
  • per-tab sampling overrides
  • run execution and result history
  • verifier lifecycle management
  • persisted desktop UI state

Agent access

BenchLocal can expose a local agent surface so AI agents and automation tools can control benchmark workflows while the desktop UI stays live.

Enable it from Settings > Agent Access. The app will show:

  • a bearer token
  • the local Agent Guide URL
  • the OpenAPI URL
  • the MCP Streamable HTTP URL

The HTTP API uses JSON commands for actions such as listing Bench Packs, managing providers and models, creating tabs, selecting models, refreshing availability, starting runs, resuming runs, retrying results, and stopping active runs. Live progress is available through Server-Sent Events at /v1/events.

MCP-capable agents can connect to /mcp with the same bearer token and use standard benchlocal_* tools plus BenchLocal state resources. This is the preferred integration path for agents that support tool calls.

See docs/agent-control-api.md for endpoint details, MCP tools/resources, safety rules, and the extension checklist for adding future UI features to the agent surface.

Each Bench Pack owns its benchmark behavior:

  • scenario definitions
  • benchmark-specific prompts
  • scoring logic
  • verifier contracts where required
  • benchmark-specific traces and summaries

Repo layout

  • app/ Electron app shell, desktop UI, main process, preload, renderer
  • packages/benchlocal-core shared protocol, config, workspace, and theme types
  • packages/benchlocal-sdk authoring helpers for Bench Pack repos
  • packages/benchpack-host host-side install, inspection, verifier, and run orchestration logic
  • themes/ built-in desktop themes
  • scripts/ local macOS release helpers
  • docs/ packaging and release docs

Developer references

Build commands

  • npm run build compile the app and workspace packages for development
  • npm run pack compile and package the production desktop app, including DMG and ZIP artifacts
  • npm run build:dir compile and produce an unpacked local app bundle
  • npm run build:win compile and package unsigned Windows NSIS and ZIP artifacts
  • npm run build:linux compile and package Linux AppImage and tar.gz artifacts
  • npm run release:all build the signed macOS release plus Windows and Linux desktop artifacts in one command

License

MIT

About

Test LLMs on real tasks. Compare models side-by-side.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors