fix: enable runtime evidence profiles by richardfogaca · Pull Request #242 · compozy/agh

richardfogaca · 2026-05-30T01:11:24Z

What this changes

This PR introduces an explicit runtime evidence mode for task execution profiles. Profiles can now ask workers to run with broader local runtime permissions and prompt guidance when the task needs browser/simulator/app evidence.

Why

AGH already lets agents execute code, but frontend/mobile work needs a deliberate mode for installing dependencies, booting apps, driving browsers or simulators, and collecting evidence. Keeping this behind an execution-profile policy gives us that capability without changing the default conservative task path.

Implementation notes

Adds runtime.mode to task execution profiles with default and evidence modes.
Persists runtime_mode in GlobalDB with migration and schema coverage.
Applies runtime evidence mode during session creation by overriding permissions and adding explicit runtime/browser/simulator prompt guidance.
Loads profile-selected worker targets so provider/model/capability routing applies to spawned task sessions.
Surfaces runtime mode in CLI/profile output and generated OpenAPI/types/fixtures.
Regenerates API/tool metadata and includes the existing MDX formatting cleanup required by repo verification.

Reviewer focus

Runtime evidence is opt-in per execution profile; default mode should preserve existing behavior.
Permission override should only apply to runtime evidence sessions.
Schema/default handling should match server-side validation.

Validation

PATH="$HOME/.bun/bin:$PATH" make verify

vercel · 2026-05-30T01:11:28Z

@richardfogaca is attempting to deploy a commit to the Compozy Team on Vercel.

A member of the Team first needs to authorize it.

coderabbitai · 2026-05-30T01:11:31Z

Warning

Review limit reached

@richardfogaca, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 2 minutes and 36 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: d00c62e8-3bd5-4de2-8c05-2fbf56bdd0f7

📥 Commits

Reviewing files that changed from the base of the PR and between cd4b155 and b3df09a.

⛔ Files ignored due to path filters (4)

internal/tools/builtin/testdata/native-tool-catalog.json is excluded by !**/*.json
openapi/agh.json is excluded by !**/*.json
packages/site/content/runtime/core/configuration/config-toml.mdx is excluded by !**/*.mdx
web/src/generated/agh-openapi.d.ts is excluded by !**/generated/**

📒 Files selected for processing (19)

internal/api/spec/spec.go
internal/api/spec/spec_test.go
internal/cli/task.go
internal/daemon/task_role_runtime.go
internal/daemon/task_role_runtime_test.go
internal/daemon/task_runtime.go
internal/session/manager.go
internal/session/manager_start.go
internal/store/globaldb/global_db.go
internal/store/globaldb/global_db_task_orchestration_schema_test.go
internal/store/globaldb/global_db_task_profile.go
internal/store/globaldb/global_db_task_profile_test.go
internal/store/globaldb/global_db_test.go
internal/store/globaldb/migrate_task_orchestration_profile.go
internal/store/globaldb/schema_task_orchestration_profile.go
internal/task/manager_profile.go
internal/task/profile.go
internal/tools/builtin/tasks.go
web/src/systems/tasks/mocks/fixtures.ts

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Copilot

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Introduces a new task execution-profile runtime policy (default vs evidence) that is persisted in global DB, surfaced via OpenAPI/CLI, and used to relax session permissions + add guidance overlays when evidence mode is enabled.

Changes:

Add RuntimePolicy / RuntimeMode to task execution profiles, including normalization + validation.
Persist runtime_mode in global DB with migration + store read/write support.
Propagate runtime evidence mode into session creation (permissions + prompt overlays) and update API/types/fixtures/docs accordingly.

Reviewed changes

Copilot reviewed 20 out of 21 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
web/src/systems/tasks/mocks/fixtures.ts	Adds `runtime` to task execution profile fixtures and updates override precedence.
web/src/generated/agh-openapi.d.ts	Updates generated TS types to include `runtime`.
packages/site/content/runtime/core/configuration/config-toml.mdx	Adjusts coordinator configuration table formatting.
openapi/agh.json	Adds `runtime` object to execution profile schema and marks it required.
internal/tools/builtin/testdata/native-tool-catalog.json	Updates schema digest to reflect profile schema changes.
internal/tools/builtin/tasks.go	Extends execution profile JSON schema to include `runtime`.
internal/task/profile.go	Adds runtime mode/policy types, normalization, and validation.
internal/task/manager_profile.go	Sets default runtime mode in default execution profile.
internal/store/globaldb/schema_task_orchestration_profile.go	Adds `runtime_mode` column to schema.
internal/store/globaldb/migrate_task_orchestration_profile.go	Adds migration to backfill/add `runtime_mode` column.
internal/store/globaldb/global_db_test.go	Registers new migration in expected migration list.
internal/store/globaldb/global_db_task_profile_test.go	Extends store tests to cover runtime mode persistence.
internal/store/globaldb/global_db_task_profile.go	Reads/writes `runtime_mode` in execution profile store.
internal/store/globaldb/global_db_task_orchestration_schema_test.go	Includes `runtime_mode` in schema assertions.
internal/store/globaldb/global_db.go	Adds migration v41 for `runtime_mode`.
internal/session/manager_start.go	Allows session start to override permissions via CreateOpts.
internal/session/manager.go	Adds `Permissions` to session CreateOpts.
internal/daemon/task_runtime.go	Applies runtime evidence mode to session opts (permissions + overlay).
internal/daemon/task_role_runtime_test.go	Adds coverage for profile-selected worker + runtime evidence behavior.
internal/daemon/task_role_runtime.go	Loads execution profiles to decide worker target; adds capability-aware claim command text.
internal/cli/task.go	Displays runtime mode in CLI execution profile output.

Comments suppressed due to low confidence (1)

openapi/agh.json:1

The OpenAPI schema allows any string for runtime.mode, but server-side validation only accepts default or evidence (and treats empty as default after normalization). This mismatch will generate overly-permissive TS types (mode: string) and can lead clients to send values that pass schema validation but are rejected by the API; consider adding an enum (and a default) for runtime.mode matching server behavior. Also consider whether runtime (and/or mode) should truly be required in the OpenAPI contract if the server can safely default it when omitted.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

pedronauck

Automated review — self + Codex (gpt-5.5, xhigh)

Two reviews were attempted on PR #242 (fix: enable runtime evidence profiles): an AGH self-review and an external Codex peer review (gpt-5.5, reasoning xhigh).

Self verdict: FIX_BEFORE_SHIP (1 blocker, 3 risks, 2 nits)
Codex verdict: unavailable — the Codex run technically completed but ignored the scoped-write contract and produced no findings artifact, so it returned no usable verdict. The findings below are from the self-review only.

The core design is solid: runtime.mode is opt-in, validated, persisted via an appended numbered migration (v41, prior checksums unchanged), contract regen co-ships (OpenAPI + web agh-openapi.d.ts + native-tool-catalog digest + fixtures), and unit coverage is good. go build ./internal/... is clean.

Blockers

Dropping --wait from the worker claim prompt can strand sessions. internal/daemon/task_role_runtime.go taskRoleClaimCommand (~L558) now emits agh task next -o json (was --wait -o json), and taskRolePromptOverlay tells the agent to run it "once". Without --wait, the command does a single non-blocking claim and returns empty if nothing is immediately claimable. A freshly spawned profile-selected worker can reach the prompt before its run is claimable (boot/lease/visibility timing) and never claim its assigned run. Fix: keep --wait as the base arg and append --capability flags after it.

Risks

Modifying the body of already-shipped migration v38. internal/store/globaldb/migrate_task_orchestration_profile.go: the v38 migration body now also adds runtime_mode. Harmless in practice (idempotent + v41 covers it) but violates append-only migration discipline. Fix: revert the v38 body change and rely solely on v41.
Evidence mode grants approve-all without requiring sandbox isolation. internal/daemon/task_runtime.go applyTaskSessionRuntimeProfile (L222-235) sets Permissions=approve-all whenever mode is evidence, independent of sandbox isolation — full auto-approve on the operator host. Fix: gate evidence-mode approve-all behind an isolated sandbox, or make the trust boundary explicit with a warning.
No docs for the new runtime.mode field. The only packages/site change (config-toml.mdx) is whitespace cleanup. Fix: add execution-profile docs for runtime.mode (+ sandbox.mode), or file a tracked follow-up.

Nits

Implicit worker-target precedence in workerTargetForRun (AgentName → PreferredAgentNames[0] → AllowedAgentNames) — add a one-line comment documenting the order.
Redundant idempotent column add across v38 and v41 — disappears once v38 is reverted to its original body.

Note: the external Codex round will be re-run in a healthy environment to add its independent verdict.

Copilot AI review requested due to automatic review settings May 30, 2026 01:11

Copilot AI reviewed May 30, 2026

View reviewed changes

Comment thread internal/tools/builtin/tasks.go

Comment thread internal/store/globaldb/migrate_task_orchestration_profile.go Outdated

Comment thread internal/store/globaldb/migrate_task_orchestration_profile.go

Comment thread internal/daemon/task_role_runtime.go

fix: enable runtime evidence profiles

b3df09a

richardfogaca force-pushed the fix/profile-runtime-evidence branch from b3d14e1 to b3df09a Compare May 30, 2026 01:38

coderabbitai Bot approved these changes May 30, 2026

View reviewed changes

pedronauck requested changes May 31, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: enable runtime evidence profiles#242

fix: enable runtime evidence profiles#242
richardfogaca wants to merge 1 commit into
compozy:mainfrom
richardfogaca:fix/profile-runtime-evidence

richardfogaca commented May 30, 2026 •

edited

Loading

Uh oh!

vercel Bot commented May 30, 2026

Uh oh!

coderabbitai Bot commented May 30, 2026 •

edited

Loading

Review limit reached

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pedronauck left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

richardfogaca commented May 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this changes

Why

Implementation notes

Reviewer focus

Validation

Uh oh!

vercel Bot commented May 30, 2026

Uh oh!

coderabbitai Bot commented May 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review limit reached

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pedronauck left a comment

Choose a reason for hiding this comment

Automated review — self + Codex (gpt-5.5, xhigh)

Blockers

Risks

Nits

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

richardfogaca commented May 30, 2026 •

edited

Loading

coderabbitai Bot commented May 30, 2026 •

edited

Loading