Skip to content

fix: enable runtime evidence profiles#242

Open
richardfogaca wants to merge 1 commit into
compozy:mainfrom
richardfogaca:fix/profile-runtime-evidence
Open

fix: enable runtime evidence profiles#242
richardfogaca wants to merge 1 commit into
compozy:mainfrom
richardfogaca:fix/profile-runtime-evidence

Conversation

@richardfogaca
Copy link
Copy Markdown
Contributor

@richardfogaca richardfogaca commented May 30, 2026

What this changes

This PR introduces an explicit runtime evidence mode for task execution profiles. Profiles can now ask workers to run with broader local runtime permissions and prompt guidance when the task needs browser/simulator/app evidence.

Why

AGH already lets agents execute code, but frontend/mobile work needs a deliberate mode for installing dependencies, booting apps, driving browsers or simulators, and collecting evidence. Keeping this behind an execution-profile policy gives us that capability without changing the default conservative task path.

Implementation notes

  • Adds runtime.mode to task execution profiles with default and evidence modes.
  • Persists runtime_mode in GlobalDB with migration and schema coverage.
  • Applies runtime evidence mode during session creation by overriding permissions and adding explicit runtime/browser/simulator prompt guidance.
  • Loads profile-selected worker targets so provider/model/capability routing applies to spawned task sessions.
  • Surfaces runtime mode in CLI/profile output and generated OpenAPI/types/fixtures.
  • Regenerates API/tool metadata and includes the existing MDX formatting cleanup required by repo verification.

Reviewer focus

  • Runtime evidence is opt-in per execution profile; default mode should preserve existing behavior.
  • Permission override should only apply to runtime evidence sessions.
  • Schema/default handling should match server-side validation.

Validation

  • PATH="$HOME/.bun/bin:$PATH" make verify

Copilot AI review requested due to automatic review settings May 30, 2026 01:11
@vercel
Copy link
Copy Markdown

vercel Bot commented May 30, 2026

@richardfogaca is attempting to deploy a commit to the Compozy Team on Vercel.

A member of the Team first needs to authorize it.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 30, 2026

Warning

Review limit reached

@richardfogaca, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 2 minutes and 36 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: d00c62e8-3bd5-4de2-8c05-2fbf56bdd0f7

📥 Commits

Reviewing files that changed from the base of the PR and between cd4b155 and b3df09a.

⛔ Files ignored due to path filters (4)
  • internal/tools/builtin/testdata/native-tool-catalog.json is excluded by !**/*.json
  • openapi/agh.json is excluded by !**/*.json
  • packages/site/content/runtime/core/configuration/config-toml.mdx is excluded by !**/*.mdx
  • web/src/generated/agh-openapi.d.ts is excluded by !**/generated/**
📒 Files selected for processing (19)
  • internal/api/spec/spec.go
  • internal/api/spec/spec_test.go
  • internal/cli/task.go
  • internal/daemon/task_role_runtime.go
  • internal/daemon/task_role_runtime_test.go
  • internal/daemon/task_runtime.go
  • internal/session/manager.go
  • internal/session/manager_start.go
  • internal/store/globaldb/global_db.go
  • internal/store/globaldb/global_db_task_orchestration_schema_test.go
  • internal/store/globaldb/global_db_task_profile.go
  • internal/store/globaldb/global_db_task_profile_test.go
  • internal/store/globaldb/global_db_test.go
  • internal/store/globaldb/migrate_task_orchestration_profile.go
  • internal/store/globaldb/schema_task_orchestration_profile.go
  • internal/task/manager_profile.go
  • internal/task/profile.go
  • internal/tools/builtin/tasks.go
  • web/src/systems/tasks/mocks/fixtures.ts
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Introduces a new task execution-profile runtime policy (default vs evidence) that is persisted in global DB, surfaced via OpenAPI/CLI, and used to relax session permissions + add guidance overlays when evidence mode is enabled.

Changes:

  • Add RuntimePolicy / RuntimeMode to task execution profiles, including normalization + validation.
  • Persist runtime_mode in global DB with migration + store read/write support.
  • Propagate runtime evidence mode into session creation (permissions + prompt overlays) and update API/types/fixtures/docs accordingly.

Reviewed changes

Copilot reviewed 20 out of 21 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
web/src/systems/tasks/mocks/fixtures.ts Adds runtime to task execution profile fixtures and updates override precedence.
web/src/generated/agh-openapi.d.ts Updates generated TS types to include runtime.
packages/site/content/runtime/core/configuration/config-toml.mdx Adjusts coordinator configuration table formatting.
openapi/agh.json Adds runtime object to execution profile schema and marks it required.
internal/tools/builtin/testdata/native-tool-catalog.json Updates schema digest to reflect profile schema changes.
internal/tools/builtin/tasks.go Extends execution profile JSON schema to include runtime.
internal/task/profile.go Adds runtime mode/policy types, normalization, and validation.
internal/task/manager_profile.go Sets default runtime mode in default execution profile.
internal/store/globaldb/schema_task_orchestration_profile.go Adds runtime_mode column to schema.
internal/store/globaldb/migrate_task_orchestration_profile.go Adds migration to backfill/add runtime_mode column.
internal/store/globaldb/global_db_test.go Registers new migration in expected migration list.
internal/store/globaldb/global_db_task_profile_test.go Extends store tests to cover runtime mode persistence.
internal/store/globaldb/global_db_task_profile.go Reads/writes runtime_mode in execution profile store.
internal/store/globaldb/global_db_task_orchestration_schema_test.go Includes runtime_mode in schema assertions.
internal/store/globaldb/global_db.go Adds migration v41 for runtime_mode.
internal/session/manager_start.go Allows session start to override permissions via CreateOpts.
internal/session/manager.go Adds Permissions to session CreateOpts.
internal/daemon/task_runtime.go Applies runtime evidence mode to session opts (permissions + overlay).
internal/daemon/task_role_runtime_test.go Adds coverage for profile-selected worker + runtime evidence behavior.
internal/daemon/task_role_runtime.go Loads execution profiles to decide worker target; adds capability-aware claim command text.
internal/cli/task.go Displays runtime mode in CLI execution profile output.
Comments suppressed due to low confidence (1)

openapi/agh.json:1

  • The OpenAPI schema allows any string for runtime.mode, but server-side validation only accepts default or evidence (and treats empty as default after normalization). This mismatch will generate overly-permissive TS types (mode: string) and can lead clients to send values that pass schema validation but are rejected by the API; consider adding an enum (and a default) for runtime.mode matching server behavior. Also consider whether runtime (and/or mode) should truly be required in the OpenAPI contract if the server can safely default it when omitted.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread internal/tools/builtin/tasks.go
Comment thread internal/store/globaldb/migrate_task_orchestration_profile.go Outdated
Comment thread internal/store/globaldb/migrate_task_orchestration_profile.go
Comment thread internal/daemon/task_role_runtime.go
@richardfogaca richardfogaca force-pushed the fix/profile-runtime-evidence branch from b3d14e1 to b3df09a Compare May 30, 2026 01:38
Copy link
Copy Markdown
Member

@pedronauck pedronauck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated review — self + Codex (gpt-5.5, xhigh)

Two reviews were attempted on PR #242 (fix: enable runtime evidence profiles): an AGH self-review and an external Codex peer review (gpt-5.5, reasoning xhigh).

  • Self verdict: FIX_BEFORE_SHIP (1 blocker, 3 risks, 2 nits)
  • Codex verdict: unavailable — the Codex run technically completed but ignored the scoped-write contract and produced no findings artifact, so it returned no usable verdict. The findings below are from the self-review only.

The core design is solid: runtime.mode is opt-in, validated, persisted via an appended numbered migration (v41, prior checksums unchanged), contract regen co-ships (OpenAPI + web agh-openapi.d.ts + native-tool-catalog digest + fixtures), and unit coverage is good. go build ./internal/... is clean.

Blockers

  • Dropping --wait from the worker claim prompt can strand sessions. internal/daemon/task_role_runtime.go taskRoleClaimCommand (~L558) now emits agh task next -o json (was --wait -o json), and taskRolePromptOverlay tells the agent to run it "once". Without --wait, the command does a single non-blocking claim and returns empty if nothing is immediately claimable. A freshly spawned profile-selected worker can reach the prompt before its run is claimable (boot/lease/visibility timing) and never claim its assigned run. Fix: keep --wait as the base arg and append --capability flags after it.

Risks

  • Modifying the body of already-shipped migration v38. internal/store/globaldb/migrate_task_orchestration_profile.go: the v38 migration body now also adds runtime_mode. Harmless in practice (idempotent + v41 covers it) but violates append-only migration discipline. Fix: revert the v38 body change and rely solely on v41.
  • Evidence mode grants approve-all without requiring sandbox isolation. internal/daemon/task_runtime.go applyTaskSessionRuntimeProfile (L222-235) sets Permissions=approve-all whenever mode is evidence, independent of sandbox isolation — full auto-approve on the operator host. Fix: gate evidence-mode approve-all behind an isolated sandbox, or make the trust boundary explicit with a warning.
  • No docs for the new runtime.mode field. The only packages/site change (config-toml.mdx) is whitespace cleanup. Fix: add execution-profile docs for runtime.mode (+ sandbox.mode), or file a tracked follow-up.

Nits

  • Implicit worker-target precedence in workerTargetForRun (AgentNamePreferredAgentNames[0]AllowedAgentNames) — add a one-line comment documenting the order.
  • Redundant idempotent column add across v38 and v41 — disappears once v38 is reverted to its original body.

Note: the external Codex round will be re-run in a healthy environment to add its independent verdict.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants