Enforce per-type safe-output `max` count at MCP invocation time (return actionable error), not only downstream

## Summary

The per-type **`max` operation count** (e.g. `safe-outputs.add-comment.max: 10`) is **not enforced inside the safe-outputs MCP server during tool invocation**. Every `add_comment` / `add_labels` / etc. call returns `success` regardless of how many of that type the agent has already emitted in the run. The cap is only applied *after the agent has finished*, in two downstream places:

- `actions/setup/js/collect_ndjson_output.cjs` — drops over-limit items with a non-fatal `core.warning` (`Too many items of type 'X'. Maximum allowed: N.`).
- `actions/setup/js/safe_output_processor.cjs` — silently `slice(0, maxCount)` (`Too many items (N), limiting to M`).

Because the agent gets **no in-loop feedback**, it can only self-ration based on the static tool description text (MCE2). When a run legitimately needs more operations than the cap allows, the agent silently rations, often inconsistently, and the excess work is dropped without anyone noticing.

This appears to contradict the project's own spec — **MCE4 (Dual Enforcement)** in `docs/src/content/docs/specs/safe-outputs-specification.md` — which requires count constraints to be enforced at *both* invocation time and processing time.

## Real-world impact (observed)

In a downstream factory workflow run, the agent decided 9 PRs passed a quality gate and:

- emitted **9 `add_labels`** (a "success" label) — all 9 landed, because `add_labels` has its **own separate** `max: 10` budget; and
- intended to post **9 explanatory comments + 6 `@copilot` CI-fix delegations = 15 `add_comment`s**, but `add_comment.max: 10` meant only **10** comments were emitted (the 4 it happened to author first + 6 delegations).

Result: **5 PRs received the success label but never received the corresponding comment.** The agent's own final summary even claimed it had commented on all 9 — a bookkeeping/hallucination failure that an in-loop limit error would have prevented. Because labels and comments have independent caps and neither call ever errored, the agent produced inconsistent, silently-truncated state.

This is not specific to that workflow — any workflow whose agent emits more than `max` items of a single safe-output type in one run hits the same silent truncation.

## Why the MCP layer is the right place to fix it

The `add_comment` handler in `actions/setup/js/safe_outputs_handlers.cjs` already enforces several constraints at invocation time and explicitly cites the early-validation requirements:

```js
/**
 * Handler for add_comment tool
 * Per Safe Outputs Specification MCE1: Enforces constraints during tool invocation
 * to provide immediate feedback to the LLM before recording to NDJSON
 */
const addCommentHandler = args => {
  // ...
  enforceCommentLimits(body); // body length / mentions / links → immediate -32602 error
  // ... discussion guard, target-context guard, wildcard-target guard, intent validation
  appendSafeOutput(entry); // <-- appended unconditionally; no per-type count check
};
```

So content-level **entity limits** (mentions, links, length) already produce immediate, actionable `-32602` errors — but the **operation-count limit** (the per-type `max`) is not among them, even though MCE1's "Constraint Categories" explicitly include *"Entity Limits: Maximum counts for … other entities"* and MCE4 requires dual enforcement:

> **MCE4: Dual Enforcement** — Constraints MUST be enforced at both invocation time (MCP server) and processing time (safe output processor) to provide defense-in-depth.

The conformance section likewise specifies that exceeding `max` should be **rejected with an `E002 (LIMIT_EXCEEDED)` error**, but today that rejection only happens post-hoc.

There's already precedent for a session-scoped counter in the same file (`inlineReviewCommentCount` for buffered inline review comments), so the mechanism fits cleanly.

## Proposed change

In the safe-outputs MCP server (`safe_outputs_handlers.cjs` / `safe_outputs_mcp_server_http.cjs`), track a **session-scoped per-type counter** of appended items and, when a handler is invoked after its configured `max` is already reached, return an **actionable MCP error** instead of appending — consistent with MCE3:

```json
{
  "error": {
    "code": -32602,
    "message": "E002: add_comment limit reached — 10 of 10 comments already used this run",
    "data": {
      "constraint": "max",
      "type": "add_comment",
      "limit": 10,
      "guidance": "You have used all 10 add_comment operations for this run. Further add_comment calls will be ignored. Prioritize the most important comments (e.g. consolidate multiple updates into one), or call noop. Note: other safe-output types (e.g. add_labels) have independent budgets, so applying a label without its companion comment can leave inconsistent state."
    }
  }
}
```

This lets the agent **react in-loop**: reprioritize, consolidate comments, stop applying labels it can't accompany with a comment, or `noop` — instead of silently rationing off the static description and discovering (or not) the truncation only afterward.

### Notes / open questions

- Keep the existing downstream enforcement (`collect_ndjson_output.cjs`, `safe_output_processor.cjs`) as the defense-in-depth processor layer per MCE4; this issue is specifically about adding the **invocation-time** half.
- Per MCE5 (Constraint Configuration Consistency), the limit used by the handler must come from the same `max` config the processor uses, so the two never diverge.
- Worth deciding whether the error is hard (reject the (max+1)th call) or a one-time warning then reject — the spec language ("rejected with E002") points to hard rejection.
- A secondary, related observation: per-type independent budgets mean an agent can apply a label whose companion comment was dropped. Surfacing the limit in-loop largely mitigates this, but the docs could also note the inter-type inconsistency risk.

## Environment

- Observed against the safe-outputs MCP server / handlers in `actions/setup/js/` (`safe_outputs_handlers.cjs`, `safe_outputs_append.cjs`, `safe_outputs_mcp_server_http.cjs`).
- Spec reference: `docs/src/content/docs/specs/safe-outputs-specification.md` §8.3 (MCE1–MCE5) and the max-limit conformance requirement (E002 LIMIT_EXCEEDED).


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enforce per-type safe-output `max` count at MCP invocation time (return actionable error), not only downstream #40311

Summary

Real-world impact (observed)

Why the MCP layer is the right place to fix it

Proposed change

Notes / open questions

Environment

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Enforce per-type safe-output max count at MCP invocation time (return actionable error), not only downstream #40311

Description

Summary

Real-world impact (observed)

Why the MCP layer is the right place to fix it

Proposed change

Notes / open questions

Environment

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Enforce per-type safe-output `max` count at MCP invocation time (return actionable error), not only downstream #40311