fix: enable runtime evidence profiles#242
Conversation
|
@richardfogaca is attempting to deploy a commit to the Compozy Team on Vercel. A member of the Team first needs to authorize it. |
|
Warning Review limit reached
More reviews will be available in 2 minutes and 36 seconds. Learn how PR review limits work. Your organization has run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: ⛔ Files ignored due to path filters (4)
📒 Files selected for processing (19)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Introduces a new task execution-profile runtime policy (default vs evidence) that is persisted in global DB, surfaced via OpenAPI/CLI, and used to relax session permissions + add guidance overlays when evidence mode is enabled.
Changes:
- Add
RuntimePolicy/RuntimeModeto task execution profiles, including normalization + validation. - Persist
runtime_modein global DB with migration + store read/write support. - Propagate runtime evidence mode into session creation (permissions + prompt overlays) and update API/types/fixtures/docs accordingly.
Reviewed changes
Copilot reviewed 20 out of 21 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| web/src/systems/tasks/mocks/fixtures.ts | Adds runtime to task execution profile fixtures and updates override precedence. |
| web/src/generated/agh-openapi.d.ts | Updates generated TS types to include runtime. |
| packages/site/content/runtime/core/configuration/config-toml.mdx | Adjusts coordinator configuration table formatting. |
| openapi/agh.json | Adds runtime object to execution profile schema and marks it required. |
| internal/tools/builtin/testdata/native-tool-catalog.json | Updates schema digest to reflect profile schema changes. |
| internal/tools/builtin/tasks.go | Extends execution profile JSON schema to include runtime. |
| internal/task/profile.go | Adds runtime mode/policy types, normalization, and validation. |
| internal/task/manager_profile.go | Sets default runtime mode in default execution profile. |
| internal/store/globaldb/schema_task_orchestration_profile.go | Adds runtime_mode column to schema. |
| internal/store/globaldb/migrate_task_orchestration_profile.go | Adds migration to backfill/add runtime_mode column. |
| internal/store/globaldb/global_db_test.go | Registers new migration in expected migration list. |
| internal/store/globaldb/global_db_task_profile_test.go | Extends store tests to cover runtime mode persistence. |
| internal/store/globaldb/global_db_task_profile.go | Reads/writes runtime_mode in execution profile store. |
| internal/store/globaldb/global_db_task_orchestration_schema_test.go | Includes runtime_mode in schema assertions. |
| internal/store/globaldb/global_db.go | Adds migration v41 for runtime_mode. |
| internal/session/manager_start.go | Allows session start to override permissions via CreateOpts. |
| internal/session/manager.go | Adds Permissions to session CreateOpts. |
| internal/daemon/task_runtime.go | Applies runtime evidence mode to session opts (permissions + overlay). |
| internal/daemon/task_role_runtime_test.go | Adds coverage for profile-selected worker + runtime evidence behavior. |
| internal/daemon/task_role_runtime.go | Loads execution profiles to decide worker target; adds capability-aware claim command text. |
| internal/cli/task.go | Displays runtime mode in CLI execution profile output. |
Comments suppressed due to low confidence (1)
openapi/agh.json:1
- The OpenAPI schema allows any string for
runtime.mode, but server-side validation only acceptsdefaultorevidence(and treats empty as default after normalization). This mismatch will generate overly-permissive TS types (mode: string) and can lead clients to send values that pass schema validation but are rejected by the API; consider adding anenum(and adefault) forruntime.modematching server behavior. Also consider whetherruntime(and/ormode) should truly be required in the OpenAPI contract if the server can safely default it when omitted.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
b3d14e1 to
b3df09a
Compare
pedronauck
left a comment
There was a problem hiding this comment.
Automated review — self + Codex (gpt-5.5, xhigh)
Two reviews were attempted on PR #242 (fix: enable runtime evidence profiles): an AGH self-review and an external Codex peer review (gpt-5.5, reasoning xhigh).
- Self verdict: FIX_BEFORE_SHIP (1 blocker, 3 risks, 2 nits)
- Codex verdict: unavailable — the Codex run technically completed but ignored the scoped-write contract and produced no findings artifact, so it returned no usable verdict. The findings below are from the self-review only.
The core design is solid: runtime.mode is opt-in, validated, persisted via an appended numbered migration (v41, prior checksums unchanged), contract regen co-ships (OpenAPI + web agh-openapi.d.ts + native-tool-catalog digest + fixtures), and unit coverage is good. go build ./internal/... is clean.
Blockers
- Dropping
--waitfrom the worker claim prompt can strand sessions.internal/daemon/task_role_runtime.gotaskRoleClaimCommand(~L558) now emitsagh task next -o json(was--wait -o json), andtaskRolePromptOverlaytells the agent to run it "once". Without--wait, the command does a single non-blocking claim and returns empty if nothing is immediately claimable. A freshly spawned profile-selected worker can reach the prompt before its run is claimable (boot/lease/visibility timing) and never claim its assigned run. Fix: keep--waitas the base arg and append--capabilityflags after it.
Risks
- Modifying the body of already-shipped migration v38.
internal/store/globaldb/migrate_task_orchestration_profile.go: the v38 migration body now also addsruntime_mode. Harmless in practice (idempotent + v41 covers it) but violates append-only migration discipline. Fix: revert the v38 body change and rely solely on v41. - Evidence mode grants
approve-allwithout requiring sandbox isolation.internal/daemon/task_runtime.goapplyTaskSessionRuntimeProfile(L222-235) setsPermissions=approve-allwhenever mode isevidence, independent of sandbox isolation — full auto-approve on the operator host. Fix: gate evidence-mode approve-all behind an isolated sandbox, or make the trust boundary explicit with a warning. - No docs for the new
runtime.modefield. The onlypackages/sitechange (config-toml.mdx) is whitespace cleanup. Fix: add execution-profile docs forruntime.mode(+sandbox.mode), or file a tracked follow-up.
Nits
- Implicit worker-target precedence in
workerTargetForRun(AgentName→PreferredAgentNames[0]→AllowedAgentNames) — add a one-line comment documenting the order. - Redundant idempotent column add across v38 and v41 — disappears once v38 is reverted to its original body.
Note: the external Codex round will be re-run in a healthy environment to add its independent verdict.
What this changes
This PR introduces an explicit runtime evidence mode for task execution profiles. Profiles can now ask workers to run with broader local runtime permissions and prompt guidance when the task needs browser/simulator/app evidence.
Why
AGH already lets agents execute code, but frontend/mobile work needs a deliberate mode for installing dependencies, booting apps, driving browsers or simulators, and collecting evidence. Keeping this behind an execution-profile policy gives us that capability without changing the default conservative task path.
Implementation notes
runtime.modeto task execution profiles withdefaultandevidencemodes.runtime_modein GlobalDB with migration and schema coverage.Reviewer focus
Validation
PATH="$HOME/.bun/bin:$PATH" make verify