Desktop AI coding assistant for local repositories. Bring your own model — Anthropic, OpenAI, Gemini, and Ollama work out of the box, or point it at any OpenAI-compatible custom endpoint. Chat over your repo, run your tools, edit files in place, and watch git state update inline.
- Download
CodeBench-macos.dmgfrom the latest release. - Open the DMG and drag Code Bench into Applications.
- Launch the app from Applications.
When you point Code Bench at a project under Documents, Downloads, or Desktop, macOS will ask "Code Bench would like to access files in your … folder." Click Allow. Code Bench reads project files from wherever you store them on disk, so it needs access to those user folders.
The app is chat-centric: a single conversation surface with a project sidebar on the left, an inline changes panel that surfaces the agent's edits as they happen, and a top action bar for the active project's branch and PR state. Settings (⌘,) hosts the configuration sub-areas.
| Surface | Capabilities |
|---|---|
| Chat | Streaming responses · per-session system prompt · model selector · agent loop with tool use · interactive permission prompts · "ask the user a question" cards |
| Project sidebar | Add/relocate local projects · session list per project · session archive · live git state (branch, ahead/behind, dirty-tree) · branch picker |
| Changes panel | Inline diff per edited file · per-change accept/reject · conflict-merge view when on-disk drifts from agent edits · commit dialog · create-PR dialog |
| Coding tools | Built-in tool registry (filesystem read/write, ripgrep, bash, web fetch) · per-tool denylist · ripgrep auto-detect · ready for MCP-server tool sources |
| MCP servers | Configure stdio and HTTP/SSE MCP servers · enable/disable per server · tool inventory surfaces in chat |
| Integrations | GitHub sign-in (Device Flow) · repository browser feeding the project sidebar |
| Providers | Multi-provider key storage (OpenAI · Anthropic · Gemini · Ollama · custom OpenAI-compatible endpoint) · per-provider connectivity test · keys in OS keychain |
| Settings → Reset | "Wipe all data" — clears API keys, GitHub sign-in, chat history, projects, and MCP servers in one step |
| Auto-update | Checks the GitHub Releases endpoint on launch and from Settings · verifies Team-ID match and codesign/spctl on macOS · self-installs and relaunches |
| Platform | Status |
|---|---|
| macOS | ✅ Supported — built, signed, notarized, and released in CI on every tag |
| Windows | |
| Linux | .github/workflows/build.yml and .github/workflows/build-and-publish.yml. |
iOS, Android, and Web are out of scope.
If you want Windows or Linux to be a supported platform, the path is: re-enable the matrix entries in
build.ymlandbuild-and-publish.yml, add platform-appropriate code-signing, fix anything that breaks, and update this section. Until that happens, treat the desktop builds for those targets as a developer-only escape hatch.
| Dependency | Version |
|---|---|
| Flutter SDK | ≥ 3.41.6 stable |
| Dart SDK | ≥ 3.11.4 |
| Xcode (macOS builds) | 15+ Sequoia (Xcode CLI tools required) |
| Windows 10 (Windows builds) | 1903+ (in development) |
| GTK 3 + ninja + cmake (Linux builds) | system packages (in development) |
git clone git@github.com:mkappworks-dev/code-bench-app.git
cd code-bench-app
flutter pub getDrift (SQLite ORM) and Riverpod require a one-time code-generation step. Run this before the first build and whenever you modify database tables or add @riverpod providers:
dart run build_runner build --delete-conflicting-outputsUse watch mode during active development:
dart run build_runner watch --delete-conflicting-outputsFor macos:
flutter run -d macos # primary dev targetFor windows:
flutter run -d windowsFor linux:
flutter run -d linuxOn first launch, the onboarding screen gates access until at least one AI provider API key is saved.
GitHub sign-in — Code Bench uses the OAuth 2.0 Device Authorization Grant (RFC 8628) on the Benchlabs Codebench GitHub App. The app's
client_idis checked into source atApiConstants.githubClientIdand shipped in the binary — that's intentional. Device Flow treatsclient_idas a non-secret (the same reason there is noclient_secretto ship), so embedding it carries no credential-leak risk; see docs/superpowers/specs/2026-05-03-github-app-device-flow-design.md for the full threat model.Forks must register their own GitHub App and replace
ApiConstants.githubClientId. To register: github.com → Settings → Developer settings → GitHub Apps → New GitHub App, tick Enable Device Flow at the bottom, and copy the resultingIv23li…Client ID over the embedded value.
lib/
├── main.dart # Entry point — ProviderScope, window_manager init
├── app.dart # MaterialApp.router wired to GoRouter
├── router/
│ └── app_router.dart # GoRouter: onboarding guard + chat ShellRoute + settings route
├── shell/
│ ├── chat_shell.dart # Sidebar + chat column + optional changes panel; ⌘N / ⌘, shortcuts
│ ├── notifiers/ # Top-action-bar and status-bar state
│ └── widgets/ # AppLifecycleObserver, TopActionBar, StatusBar, ActionOutputPanel
├── core/ # Constants, AppException hierarchy, theme/colors, utils, shared widgets
├── data/
│ ├── _core/ # Drift AppDatabase, DioFactory, SecureStorage, preferences
│ ├── shared/ # Cross-cutting models: AIModel, ChatMessage
│ ├── ai/ # AI datasources (Dio), repository, models/
│ ├── session/ # Session datasource (Drift), repository, models/ (ChatSession, ToolEvent, …)
│ ├── project/ # Project datasource (Drift), repository, models/ (Project, WorkspaceProject, …)
│ ├── git/ # Git datasource (Process), live-state datasource, repository, models/, exceptions
│ ├── github/ # GitHub datasources (Dio + OAuth), repository, models/
│ ├── apply/ # Apply datasource (filesystem), repository, security guard
│ ├── filesystem/ # Filesystem datasource (dart:io)
│ ├── bash/ # Bash datasource (Process) — the one documented `runInShell` exception
│ ├── coding_tools/ # Tool inputs/outputs, denylist, registry-facing types
│ ├── mcp/ # MCP config datasource (Drift), transport datasources (stdio + HTTP/SSE), repository, models/
│ ├── web_fetch/ # Web-fetch datasource (Dio)
│ ├── providers/ # Provider catalog + ProvidersService backing
│ ├── settings/ # Settings datasource (Drift + SharedPreferences), repository, models/
│ ├── update/ # Update datasources (Dio for releases, Process for install, IO for sentinel), models/
│ └── integrations/ # Integration metadata (GitHub OAuth)
├── services/
│ ├── ai/ # AIService — stream buffering, model resolution
│ ├── agent/ # Agent loop — tool dispatch, permission prompts, iteration cap
│ ├── coding_tools/ # ToolRegistry, denylist service, ripgrep availability probe, individual tools/
│ ├── mcp/ # MCP service — server lifecycle, tool inventory
│ ├── git/ # GitService — composite git operations
│ ├── github/ # GitHubService — OAuth + REST composition
│ ├── session/ # SessionService — send-and-stream, history, archive
│ ├── project/ # ProjectService — add/relocate, scan
│ ├── apply/ # ApplyService — patch orchestration + security guard
│ ├── providers/ # ProvidersService — keychain-backed key storage
│ ├── api_key_test/ # ApiKeyTestService — provider connectivity checks
│ ├── ide/ # IdeService — editor/terminal launch
│ ├── settings/ # SettingsService — wipe cascade, onboarding
│ └── update/ # UpdateService — version comparison, codesign/spctl gates, swap-and-relaunch
└── features/
├── onboarding/ # First-run wizard (API keys, GitHub sign-in)
├── chat/ # Chat UI, message streaming, agent permission prompts, code-apply actions
├── project_sidebar/ # Project list, session list, archive, branch picker triggers
├── branch_picker/ # Branch picker dialog + notifier
├── archive/ # Archived sessions screen
├── general/ # Settings → General (preferences, update section, reset section)
├── providers/ # Settings → Providers (per-provider keys + test)
├── integrations/ # Settings → Integrations (GitHub sign-in)
├── coding_tools/ # Settings → Coding Tools (denylist, ripgrep status)
├── mcp_servers/ # Settings → MCP Servers (configure, enable/disable)
├── update/ # Update notifier, state, failure types, "Check now" UI
└── settings/ # Settings shell + sub-area router
The dependency graph is strictly one-directional. Violating it is a build-review blocker:
Widgets / Screens
↓ (ref.watch / ref.read notifier)
Notifiers ← the only layer widgets may reach
↓ (ref.read service)
Services ← business logic, composition, typed exceptions
↓ (constructor injection)
Repositories ← domain interfaces; no I/O
↓
Datasources ← Dio, DB, Process.run, filesystem live here
↓
External (REST APIs / SQLite / OS)
Widgets communicate with notifiers only via ref.watch / ref.read(…notifier).method(). They never reach into a service or repository provider directly. Process.run, dart:io, and Dio are confined to lib/data/**/datasource/.
Command notifiers (*Actions, e.g. ProjectSidebarActions, CodeApplyActions, GitActions) use void build() with keepAlive: true and expose imperative Future<void> methods. They are the bridge between the UI and the service layer.
Naming conventions:
| Layer | Rule |
|---|---|
| Service class | ends in Service (GitService, SessionService) |
| Service provider | @riverpod function placed before the class it instantiates |
| Repository interface | ends in Repository (GitRepository, AIRepository) |
| Repository impl + provider | class ends in RepositoryImpl; @riverpod before it |
| Datasource file naming | suffix encodes I/O type: *_dio.dart, *_process.dart, *_io.dart, *_drift.dart |
| Command notifier | ends in Actions; void build(), keepAlive: true |
| State notifier | ends in Notifier; owns AsyncValue or value state |
| Notifier file placement | *_notifier.dart, *_actions.dart, and *_failure.dart all live in {feature}/notifiers/ |
The Riverpod generator strips the Notifier suffix from provider names (ActiveSessionIdNotifier → activeSessionIdProvider). The Actions suffix is kept (GitActions → gitActionsProvider). Widgets must never call ref.invalidate directly — route through a notifier method instead.
Widgets are pure state-renderers. They call notifier methods and listen for AsyncError state to show snackbars — they never try/catch business-logic calls or import service/repository exception types.
Notifiers mediate all commands. *Actions notifiers extend AsyncNotifier<void>; failures are emitted as AsyncError carrying a typed sealed class {Notifier}Failure. *Notifier classes own reactive AsyncValue<T> data state.
Services own business logic and composition. They receive repositories via constructor injection, convert low-level I/O errors into typed domain exceptions, and expose a clean API to notifiers. Services are instantiated via @riverpod / @Riverpod(keepAlive: true) providers and never constructed directly.
Repositories are domain interfaces (lib/data/**/repository/). Implementations (*RepositoryImpl) are wired up via Riverpod providers and injected into services.
Datasources (lib/data/**/datasource/) are where all I/O lives: Dio HTTP calls, SQLite via Drift, Process.run, and dart:io filesystem access. File suffix encodes the I/O type: *_dio.dart, *_process.dart, *_io.dart, *_drift.dart.
The full rules — naming conventions, error-handling patterns, logging matrix, security guards — are in CLAUDE.md.
| Pattern | Used for |
|---|---|
@Riverpod(keepAlive: true) class Notifier |
Long-lived app state: active session ID, active project ID, selected model, system prompts, DB, storage |
@Riverpod(keepAlive: true) class Actions |
Imperative commands: *Actions notifiers expose Future<void> methods that mediate widget → service calls (e.g. CodeApplyActions, ProjectSidebarActions, GitActions) |
@riverpod class AsyncNotifier |
Chat messages (loads history, streams new messages) |
@riverpod function (StreamProvider) |
Session list, live git state, MCP server list — wraps Drift / Process stream sources |
@riverpod function (FutureProvider) |
One-shot reads: available model list, package version, last update-check timestamp |
All data is stored in a local SQLite database managed by Drift (code_bench.db).
| Table | Stores |
|---|---|
ChatSessions |
Session ID · title · model/provider · created/updated timestamps · pin flag · archive flag |
ChatMessages |
Message ID · session FK · role · content · extracted code blocks (JSON) · tool events (JSON) · timestamp |
WorkspaceProjects |
Project ID · name · local path · linked repo ID · active branch · associated session IDs |
McpServers |
Server ID · name · transport (stdio / HTTP-SSE) · command + args · env (JSON) · URL · enabled flag |
DAOs: SessionDao (sessions + messages CRUD, stream watch) · ProjectDao (projects CRUD) · McpDao (servers CRUD, including deleteAll for the wipe cascade).
SecureStorageSource wraps flutter_secure_storage using a consistent key scheme:
| Key | Holds |
|---|---|
api_key_{provider} |
API key per AI provider (e.g. api_key_openai) |
github_token |
GitHub OAuth access token |
ollama_base_url |
Custom Ollama server URL |
custom_endpoint_url |
OpenAI-compatible custom endpoint |
custom_endpoint_api_key |
Key for the custom endpoint |
| Platform | Backend |
|---|---|
| macOS | Keychain (first_unlock accessibility) |
| Windows | Windows Credential Manager |
| Linux | libsecret |
for macos:
flutter build macos --release # → build/macos/Build/Products/Release/macOS App Sandbox is intentionally disabled. Code Bench shells out to
git,code,cursor, and user-defined action commands, which cannot work under sandbox. See macos/Runner/README.md for the rationale, contributor rules, and distribution implications (Mac App Store eligibility, hardened runtime, notarization).
for windows:
flutter build windows --release # → build/windows/x64/runner/Release/ (unsupported)for linux:
flutter build linux --release # → build/linux/x64/release/bundle/ (unsupported)Releases are managed by release-please. Every merge to main updates an open release PR that bumps pubspec.yaml, writes CHANGELOG.md, and proposes the next semver version based on conventional commit types (feat: → minor, fix: → patch, feat!: / BREAKING CHANGE: → major). Merging that PR triggers .github/workflows/release-please.yml, which creates a draft GitHub release and chains .github/workflows/build-and-publish.yml to build the macOS app, sign with a Developer ID, notarize through Apple's notary service, staple the ticket, and upload CodeBench-macos.dmg and CodeBench-macos.zip to the draft. The chained workflow then publishes the draft (PATCH draft=false), which is the moment the v* git tag gets created and the release becomes "latest." The in-app auto-updater consumes those artifacts on next launch of older clients.
Add these under Settings → Secrets and variables → Actions before the first release:
| Secret | Holds | How to get it |
|---|---|---|
MACOS_CERTIFICATE |
Base64-encoded Developer ID Application certificate (.p12) |
Export from Keychain Access (right-click identity → Export → .p12), then base64 -i cert.p12 | pbcopy and paste |
MACOS_CERTIFICATE_PASSWORD |
Password set when exporting the .p12 |
The password you typed at export time |
MACOS_PROVISIONING_PROFILE |
Base64-encoded Developer ID provisioning profile | See macos/Runner/README.md for the full Developer Portal walkthrough |
APPLE_ID |
Apple ID email of the notarizing account | The email tied to your Apple Developer membership |
APPLE_ID_PASSWORD |
App-specific password — not your Apple ID password | appleid.apple.com → Sign-In and Security → App-Specific Passwords → Generate (label e.g. code-bench-notarize) |
APPLE_TEAM_ID |
10-character Team ID (also used as Xcode DEVELOPMENT_TEAM) |
developer.apple.com/account → Membership Details |
RELEASE_PLEASE_TOKEN |
Personal access token (classic) with repo scope |
github.com/settings/tokens → Generate new token (classic) → check repo → no expiry |
Why a PAT for release-please? PRs created by the default
GITHUB_TOKENare blocked from triggering other workflows (GitHub's anti-loop protection). Without a PAT, the release PR's required status checks (Analyze & Test,Build (macos)) get stuck on "Expected — Waiting for status to be reported" and never run. A PAT makes the PR appear as user-created so CI fires normally.
- Merge feature/fix PRs to
mainas normal — use Conventional Commits (feat:,fix:, etc.). - release-please keeps a release PR open that accumulates all pending commits.
- When you're ready to ship, merge the release PR.
- CI tags, builds, notarizes, and publishes automatically — no manual steps needed.
Never manually bump
pubspec.yamlor pushv*tags. release-please owns both. Manual bumps or tags will confuse the manifest and produce duplicate or mis-versioned releases.
The Release (build + publish) workflow has a Run workflow button under Actions tab → Release (build + publish). Use it for the recovery scenarios below — it's the safety valve when the chained automatic flow fails or you need to operate on an existing release out-of-band. Do not use it for normal releases; merging a Release Please PR is what cuts a release.
Inputs:
| Input | When to set it |
|---|---|
tag |
Always — the tag string, e.g. v0.2.0 |
release_id |
Only when publishing a stuck draft; otherwise blank |
Scenario A — stuck draft (no tag yet). release-please.yml ran but the chained run failed mid-way (build crashed, notarization timeout, upload failed). Result: a draft release exists in the Releases tab with a "Draft" badge, but no v* tag was created.
- Open the draft release; copy the release ID from the URL (
releases/edit/<id>). - Actions tab → Release (build + publish) → Run workflow.
- Enter the tag (e.g.
v0.2.0) and the release ID → Run.
The workflow re-runs the build, deletes any half-uploaded assets, re-uploads, and flips draft=false to publish — which is what finally creates the git tag.
Scenario B — re-upload assets to a published release. The release shipped, but the DMG was discovered to be corrupted, or you want to attach an additional file. Tag exists, release is non-draft.
- Actions tab → Release (build + publish) → Run workflow.
- Enter the tag, leave release ID blank → Run.
The workflow checks out the tag, builds fresh, and softprops uploads (replacing same-named assets). The changelog is not overwritten.
Scenario C — orphaned tag with no release object. Rare. Someone pushed git tag v0.2.0 && git push origin v0.2.0 outside the Release Please flow. Tag exists, no release object yet.
Same steps as Scenario B (tag only, blank release ID). softprops creates the release for that tag and uploads artifacts.
Scenario D — testing pipeline changes on an existing tag. You modified the build / sign / notarize steps in build-and-publish.yml and want to verify on a real tag without merging a Release Please PR.
Same steps as Scenario B. Be aware: this will replace existing assets on that release — pick a throwaway test tag if you don't want to disturb a real release.
flutter test # run all tests
flutter analyze # static analysis
dart format lib/ test/ # format
dart format --set-exit-if-changed lib/ test/ # CI format check- Add a value to the
AIProviderenum in lib/data/shared/ai_model.dart. - Implement the streaming
sendMessagepath underlib/data/ai/datasource/(Dio for HTTP,*_dio.dartsuffix) and surface it throughAIRepository/AIServicein lib/services/ai/ai_service.dart. - Add the per-provider key plumbing in lib/data/_core/secure_storage.dart and the corresponding entry in
ProvidersService(lib/services/providers/providers_service.dart). - Wire the connectivity test in lib/services/api_key_test/.
- Add the row to the Settings → Providers UI under lib/features/providers/.
- Define the table class in lib/data/_core/app_database.dart.
- Create a
@DriftAccessorDAO class in the same file (include adeleteAllmethod so the table participates inSettingsService.wipeAllData). - Add both to the
@DriftDatabaseannotation and thedaoslist. - Increment
schemaVersionand add amigrationstep. - Run
dart run build_runner build --delete-conflicting-outputs. - If the table holds user data, add a wipe step to lib/services/settings/settings_service.dart so "Wipe all data" stays exhaustive.
| Layer | Technology |
|---|---|
| UI | Flutter · Material Design · Google Fonts |
| State | flutter_riverpod · riverpod_annotation |
| Navigation | go_router (ShellRoute) |
| Local DB | Drift (SQLite via sqlite3_flutter_libs) |
| Secret storage | flutter_secure_storage |
| HTTP / streaming | Dio (SSE via ResponseType.stream) |
| AI providers | OpenAI · Anthropic · Gemini · Ollama · Custom (OpenAI-compatible) |
| Tool sources | Built-in registry (filesystem, ripgrep, bash, web fetch) · MCP (stdio + HTTP/SSE) |
| Chat rendering | flutter_markdown_plus · flutter_highlight |
| GitHub auth | OAuth 2.0 Device Flow (RFC 8628) on a GitHub App — public client_id only |
| Self-update | GitHub Releases API · codesign --verify + spctl --assess · swap-and-relaunch helper |
| Preferences | shared_preferences (NSUserDefaults / equivalents) |
| Window management | window_manager |
| Serialization | freezed · json_annotation |
| Code generation | build_runner · riverpod_generator · drift_dev · freezed · json_serializable |
Contributions are welcome. Please read CONTRIBUTING.md before opening a PR.
To report a vulnerability, see SECURITY.md.
MIT — free to use, modify, and distribute.