Skip to content

Enforce per-type safe-output max count at MCP invocation time (return actionable error), not only downstream #40311

@dsyme

Description

@dsyme

Summary

The per-type max operation count (e.g. safe-outputs.add-comment.max: 10) is not enforced inside the safe-outputs MCP server during tool invocation. Every add_comment / add_labels / etc. call returns success regardless of how many of that type the agent has already emitted in the run. The cap is only applied after the agent has finished, in two downstream places:

  • actions/setup/js/collect_ndjson_output.cjs — drops over-limit items with a non-fatal core.warning (Too many items of type 'X'. Maximum allowed: N.).
  • actions/setup/js/safe_output_processor.cjs — silently slice(0, maxCount) (Too many items (N), limiting to M).

Because the agent gets no in-loop feedback, it can only self-ration based on the static tool description text (MCE2). When a run legitimately needs more operations than the cap allows, the agent silently rations, often inconsistently, and the excess work is dropped without anyone noticing.

This appears to contradict the project's own spec — MCE4 (Dual Enforcement) in docs/src/content/docs/specs/safe-outputs-specification.md — which requires count constraints to be enforced at both invocation time and processing time.

Real-world impact (observed)

In a downstream factory workflow run, the agent decided 9 PRs passed a quality gate and:

  • emitted 9 add_labels (a "success" label) — all 9 landed, because add_labels has its own separate max: 10 budget; and
  • intended to post 9 explanatory comments + 6 @copilot CI-fix delegations = 15 add_comments, but add_comment.max: 10 meant only 10 comments were emitted (the 4 it happened to author first + 6 delegations).

Result: 5 PRs received the success label but never received the corresponding comment. The agent's own final summary even claimed it had commented on all 9 — a bookkeeping/hallucination failure that an in-loop limit error would have prevented. Because labels and comments have independent caps and neither call ever errored, the agent produced inconsistent, silently-truncated state.

This is not specific to that workflow — any workflow whose agent emits more than max items of a single safe-output type in one run hits the same silent truncation.

Why the MCP layer is the right place to fix it

The add_comment handler in actions/setup/js/safe_outputs_handlers.cjs already enforces several constraints at invocation time and explicitly cites the early-validation requirements:

/**
 * Handler for add_comment tool
 * Per Safe Outputs Specification MCE1: Enforces constraints during tool invocation
 * to provide immediate feedback to the LLM before recording to NDJSON
 */
const addCommentHandler = args => {
  // ...
  enforceCommentLimits(body); // body length / mentions / links → immediate -32602 error
  // ... discussion guard, target-context guard, wildcard-target guard, intent validation
  appendSafeOutput(entry); // <-- appended unconditionally; no per-type count check
};

So content-level entity limits (mentions, links, length) already produce immediate, actionable -32602 errors — but the operation-count limit (the per-type max) is not among them, even though MCE1's "Constraint Categories" explicitly include "Entity Limits: Maximum counts for … other entities" and MCE4 requires dual enforcement:

MCE4: Dual Enforcement — Constraints MUST be enforced at both invocation time (MCP server) and processing time (safe output processor) to provide defense-in-depth.

The conformance section likewise specifies that exceeding max should be rejected with an E002 (LIMIT_EXCEEDED) error, but today that rejection only happens post-hoc.

There's already precedent for a session-scoped counter in the same file (inlineReviewCommentCount for buffered inline review comments), so the mechanism fits cleanly.

Proposed change

In the safe-outputs MCP server (safe_outputs_handlers.cjs / safe_outputs_mcp_server_http.cjs), track a session-scoped per-type counter of appended items and, when a handler is invoked after its configured max is already reached, return an actionable MCP error instead of appending — consistent with MCE3:

{
  "error": {
    "code": -32602,
    "message": "E002: add_comment limit reached — 10 of 10 comments already used this run",
    "data": {
      "constraint": "max",
      "type": "add_comment",
      "limit": 10,
      "guidance": "You have used all 10 add_comment operations for this run. Further add_comment calls will be ignored. Prioritize the most important comments (e.g. consolidate multiple updates into one), or call noop. Note: other safe-output types (e.g. add_labels) have independent budgets, so applying a label without its companion comment can leave inconsistent state."
    }
  }
}

This lets the agent react in-loop: reprioritize, consolidate comments, stop applying labels it can't accompany with a comment, or noop — instead of silently rationing off the static description and discovering (or not) the truncation only afterward.

Notes / open questions

  • Keep the existing downstream enforcement (collect_ndjson_output.cjs, safe_output_processor.cjs) as the defense-in-depth processor layer per MCE4; this issue is specifically about adding the invocation-time half.
  • Per MCE5 (Constraint Configuration Consistency), the limit used by the handler must come from the same max config the processor uses, so the two never diverge.
  • Worth deciding whether the error is hard (reject the (max+1)th call) or a one-time warning then reject — the spec language ("rejected with E002") points to hard rejection.
  • A secondary, related observation: per-type independent budgets mean an agent can apply a label whose companion comment was dropped. Surfacing the limit in-loop largely mitigates this, but the docs could also note the inter-type inconsistency risk.

Environment

  • Observed against the safe-outputs MCP server / handlers in actions/setup/js/ (safe_outputs_handlers.cjs, safe_outputs_append.cjs, safe_outputs_mcp_server_http.cjs).
  • Spec reference: docs/src/content/docs/specs/safe-outputs-specification.md §8.3 (MCE1–MCE5) and the max-limit conformance requirement (E002 LIMIT_EXCEEDED).

Metadata

Metadata

Assignees

Labels

bugSomething isn't workingmcp

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions