Improve max-tool-denials report with recent shell call context#40506
Conversation
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
|
🧠 Matt Pocock Skills Reviewer has completed the skills-based review. ✅ |
|
✅ Design Decision Gate 🏗️ completed the design decision gate check. No ADR enforcement needed: PR #40506 does not have the 'implementation' label and has 0 new lines of code in business logic directories (≤100 threshold). |
|
✅ Test Quality Sentinel completed test quality analysis. |
|
✅ PR Code Quality Reviewer completed the code quality review. |
There was a problem hiding this comment.
Pull request overview
This PR improves the diagnostic quality of the “max tool denials exceeded” reporting by enriching tool.execution_start events with optional shell command context, then rendering compact command previews for recent bash/shell tool calls in the denials report.
Changes:
- Added a shared helper to best-effort extract shell command text from heterogeneous tool payload shapes.
- Enriched Copilot SDK
tool.execution_startJSONL serialization to include optionalcommanddetails when present. - Updated max-tool-denials recent-call rendering and extended tests to validate command-aware formatting.
Show a summary per file
| File | Description |
|---|---|
| actions/setup/js/tool_call_details.cjs | New shared helper to extract shell command text from tool payloads. |
| actions/setup/js/copilot_sdk_session.cjs | Adds optional command field to serialized tool.execution_start events. |
| actions/setup/js/handle_agent_failure.cjs | Formats recent tool calls with compact command previews for bash/shell. |
| actions/setup/js/handle_agent_failure.test.cjs | Adds coverage for command-aware recent-tool-call rendering in tool-denials events. |
| actions/setup/js/copilot_sdk_driver.test.cjs | Adds coverage for tool.execution_start command serialization from SDK events. |
Copilot's findings
Tip
Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Files reviewed: 5/5 changed files
- Comments generated: 2
| function normalizeToolCallPreview(value, maxLen = 120) { | ||
| const singleLine = String(value || "") | ||
| .replace(/\s+/g, " ") | ||
| .trim(); | ||
| if (!singleLine) return ""; | ||
| if (singleLine.length <= maxLen) return singleLine; | ||
| return `${singleLine.slice(0, maxLen - ELLIPSIS_LENGTH)}${ELLIPSIS}`; | ||
| } |
There was a problem hiding this comment.
Skills-Based Review 🧠
Applied /tdd, /grill-with-docs, and /improve-codebase-architecture — the changes are solid and the core feature works, but there are test coverage gaps and a cohesion issue worth addressing before the module is reused.
📋 Key Themes & Highlights
Key Themes
- Test coverage gaps (
/tdd):tool_call_details.cjshas no dedicated unit tests. The multi-candidate fallback chain (6 key variants + 2 nested subkeys) is exercised only indirectly. The happy-path fixtures inhandle_agent_failure.test.cjsandcopilot_sdk_driver.test.cjsuse only thedata.commandtop-level path, leavinginput.command(MCP-standard),toolInput.cmd, and the absence-of-command case uncovered. - Module cohesion (
/improve-codebase-architecture):normalizeToolCallPreview,extractShellCommandPreview, andformatRecentToolCallbelong intool_call_details.cjs. Leaving them inhandle_agent_failure.cjssplits the "tool call display" concern across two files and makes reuse harder. - Undocumented extensibility points (
/grill-with-docs):RECENT_TOOL_CALLS_WITH_COMMAND_PREVIEWand themaxLen = 120default are both configuration-like constants with no comment explaining their criteria, making future extension error-prone. - Silent
args-array gap (/tdd): Ifdata.argsis an array (a common CLI pattern), the current loop silently returns""— no preview shown, no warning.
Positive Highlights
- ✅ Great diagnostic improvement: the before/after in the PR description (
bash→terminal.bash(git diff --name-only)) immediately shows the value. - ✅ Clean additive schema design —
command?ontool.execution_startis backward-compatible and the conditional write is correct. - ✅ Good separation with
RECENT_TOOL_CALLS_WITH_COMMAND_PREVIEWallowlist — previews are only shown where they're useful. - ✅ Integration tests cover both the SDK driver pipeline and the denial-report rendering end-to-end.
🧠 Reviewed using Matt Pocock's skills by Matt Pocock Skills Reviewer · 75.1 AIC · ⌖ 11.3 AIC · ⊞ 6.9K
| if ("command" in data) commandFieldCandidates.push(data.command); | ||
| if ("input" in data) commandFieldCandidates.push(data.input); | ||
| if ("arguments" in data) commandFieldCandidates.push(data.arguments); | ||
| if ("args" in data) commandFieldCandidates.push(data.args); |
There was a problem hiding this comment.
[/tdd] args as an array is silently skipped — a common shape for shell-like tools that pass arguments as ["-c", "echo hello"].
If data.args is ["-c", "git status"], the candidate is an array: not a string, so the first check fails; it's an object, so continue is skipped; but .command and .cmd are both undefined on arrays, so the function falls through and returns "". No preview is shown.
💡 Suggested fix
Add an explicit array branch in the loop:
if (Array.isArray(candidate)) {
// Some tools pass args as [flag, command_string]; try the last string element
const last = [...candidate].reverse().find(x => typeof x === "string" && x.trim());
if (last) return last.trim();
continue;
}Or document in the JSDoc which array shapes are intentionally out of scope.
| if ("args" in data) commandFieldCandidates.push(data.args); | ||
| if ("toolInput" in data) commandFieldCandidates.push(data.toolInput); | ||
| if ("parameters" in data) commandFieldCandidates.push(data.parameters); | ||
| for (const candidate of commandFieldCandidates) { |
There was a problem hiding this comment.
[/tdd] The multi-candidate fallback chain has no dedicated unit tests — bugs in specific paths (e.g., toolInput.cmd, parameters.command, or arguments as an object with .command) would be invisible today.
Both call sites rely on this function silently returning "" on a miss; a regression produces empty previews in denial reports without any failing test.
💡 Suggested `tool_call_details.test.cjs` cases
describe("extractShellCommandFromToolData", () => {
it("returns top-level command string", () =>
expect(fn({ command: "ls" })).toBe("ls"));
it("extracts from input.command (MCP schema)", () =>
expect(fn({ input: { command: "git status" } })).toBe("git status"));
it("extracts from toolInput.cmd", () =>
expect(fn({ toolInput: { cmd: "make test" } })).toBe("make test"));
it("prefers command over input when both present", () =>
expect(fn({ command: "first", input: { command: "second" } })).toBe("first"));
it("returns empty string when no match", () =>
expect(fn({ workingDir: "/tmp" })).toBe(""));
it("returns empty string for null input", () =>
expect(fn(null)).toBe(""));
});| const FAILURE_CATEGORIES_PATH = "/tmp/gh-aw/failure_categories.json"; | ||
| const GITHUB_API_VERSION = "2022-11-28"; | ||
| const COPILOT_SESSION_STATE_DIR = path.join(os.tmpdir(), "gh-aw", "sandbox", "agent", "logs", "copilot-session-state"); | ||
| const RECENT_TOOL_CALLS_WITH_COMMAND_PREVIEW = new Set(["bash", "shell"]); |
There was a problem hiding this comment.
[/grill-with-docs] RECENT_TOOL_CALLS_WITH_COMMAND_PREVIEW hard-codes ["bash", "shell"] with no comment explaining the selection criteria. Future shell-like tools (e.g., powershell, run_in_terminal, computer) will silently fall back to bare name display.
A single-line comment stating the intent — "tools that accept a free-form command string" — would help maintainers decide whether to extend the set without guessing.
💡 Suggested comment
// Tools that accept a free-form shell command string and should
// show a command preview in denial reports. Add new shell-like
// MCP server tool names here (lowercase).
const RECENT_TOOL_CALLS_WITH_COMMAND_PREVIEW = new Set(["bash", "shell"]);| * @param {number} [maxLen] | ||
| * @returns {string} | ||
| */ | ||
| function normalizeToolCallPreview(value, maxLen = 120) { |
There was a problem hiding this comment.
[/grill-with-docs] The maxLen = 120 default is a behavior constant that controls issue-body quality, but it's not a named export or top-level constant — it's buried as a default parameter. If this preview is reused (e.g., in a Slack notification or OTEL span attribute with a different length limit), the 120 will be duplicated or silently wrong.
💡 Suggested change
// At the top of the file (or in tool_call_details.cjs)
const TOOL_CALL_PREVIEW_MAX_LEN = 120;
// Then:
function normalizeToolCallPreview(value, maxLen = TOOL_CALL_PREVIEW_MAX_LEN) {This makes the constant visible in grep/audit and easy to tune in one place.
| * @param {Record<string, any>} data | ||
| * @returns {string} | ||
| */ | ||
| function formatRecentToolCall(toolName, mcpServerName, data) { |
There was a problem hiding this comment.
[/improve-codebase-architecture] normalizeToolCallPreview, extractShellCommandPreview, and formatRecentToolCall belong in tool_call_details.cjs, not here. The PR introduces tool_call_details.cjs specifically as the shared extraction module, but these rendering helpers live in the wrong file.
This splits "what is a tool call" logic across two files. When a third call site needs formatted previews (e.g., OTEL trace rendering), the author may re-implement rather than reuse.
💡 Suggested move
Export formatRecentToolCall and normalizeToolCallPreview from tool_call_details.cjs and import them here. The 120-char TOOL_CALL_PREVIEW_MAX_LEN constant and ELLIPSIS constant can also live there, turning the module into a complete "tool call display" utility rather than a narrow extractor.
| }) | ||
| .filter(Boolean); | ||
| const startEvent = parsedEvents.find(event => event.type === "tool.execution_start"); | ||
| expect(startEvent).toMatchObject({ |
There was a problem hiding this comment.
[/tdd] The test verifies the happy path (command present → field added), but not the absence path: when a non-shell tool fires tool.execution_start with no command data, the JSONL event must not get a command field.
The if (command) eventData.command = command conditional in copilot_sdk_session.cjs is untested for the falsy case, so a future refactor that always writes command: "" would go undetected.
💡 Suggested addition
Add a parallel it block with toolName: "read_file" and input: {}, then assert that the captured tool.execution_start event has no command key:
expect(startEvent.data).not.toHaveProperty("command");| JSON.stringify({ | ||
| type: "tool.execution_start", | ||
| timestamp: "2026-06-06T00:00:00Z", | ||
| data: { toolName: "bash", mcpServerName: "terminal", command: "cd /home/runner/work/gh-aw/gh-aw && git diff --name-only" }, |
There was a problem hiding this comment.
[/tdd] The test fixture pre-bakes command directly on the event data, which exercises only the first (data.command) branch of extractShellCommandFromToolData. The MCP-standard schema — where command text lives at data.input.command — is not covered here.
If extractShellCommandFromToolData's input.command path regressed, this test would still pass.
💡 Suggested additional fixture
Add a second it block that writes a JSONL event with data: { toolName: "bash", mcpServerName: "terminal", input: { command: "make test" } } (no top-level command field) and asserts the same terminal.bash(make test) shape in recentToolCalls. This guards the nested extraction path end-to-end.
🧪 Test Quality Sentinel Report
📊 Metrics & Test Classification (2 tests analyzed)
JavaScript: 2 ( Inflation analysis:
💡 Suggested Improvements (edge-case gaps)
Verdict
|
There was a problem hiding this comment.
✅ Test Quality Sentinel: 70/100. Test quality is acceptable — 0% of new tests are implementation tests (threshold: 30%). Both tests verify behavioral contracts with no coding-guideline violations. See the comment above for edge-case improvement suggestions.
There was a problem hiding this comment.
REQUEST_CHANGES — two blocking issues, two low-priority notes
The core idea is sound, but two issues must be fixed before merging.
Blocking issues
-
TS2339 type error (
copilot_sdk_session.cjs:265) —eventData.command = commandmutates an object whose inferred type doesn't declarecommand. Withstrict: true+checkJs: truethis will failnpm run typecheckin CI. -
Backtick injection in denial-report Markdown (
handle_agent_failure.cjs:1219) —commandPreviewis embedded verbatim inside inline backtick code spans (- `${toolCall}`). Bash commands commonly contain backticks (command substitution), which will close the code span early and corrupt the "Last 5 tool calls" section in every GitHub issue generated for such a run.
Low-priority notes
tool_call_details.cjsis missing"use strict"(inconsistent with every other file in the directory).argsarray form is silently unhandled — PR description listsargsas a supported shape, butargs: ["cmd", "arg"]yields nothing because the code probesargs.command/args.cmdon an array.
🔎 Code quality review by PR Code Quality Reviewer · 189.3 AIC · ⌖ 7.82 AIC · ⊞ 5.1K
| } | ||
| writeEvent("tool.execution_start", { toolName, mcpServerName }, event.timestamp); | ||
| const eventData = { toolName, mcpServerName }; | ||
| if (command) eventData.command = command; |
There was a problem hiding this comment.
TypeScript type error (TS2339): eventData.command = command assigns a property that doesn't exist on the inferred type { toolName: any; mcpServerName: any }, and will fail npm run typecheck in this strict: true / checkJs: true project.
💡 Suggested fix
Use a conditional spread so TypeScript sees the full object shape at declaration:
const eventData = command
? { toolName, mcpServerName, command }
: { toolName, mcpServerName };
writeEvent("tool.execution_start", eventData, event.timestamp);Or annotate the variable explicitly before mutation:
/** `@type` {{ toolName: any, mcpServerName: string, command?: string }} */
const eventData = { toolName, mcpServerName };
if (command) eventData.command = command;Without this, tsc emits TS2339: Property 'command' does not exist on type '{ toolName: any; mcpServerName: any; }'. and CI typechecks fail.
| return base; | ||
| } | ||
| const commandPreview = extractShellCommandPreview(data); | ||
| return commandPreview ? `${base}(${commandPreview})` : base; |
There was a problem hiding this comment.
Backtick injection corrupts markdown rendering: commandPreview is embedded verbatim inside an inline code span at the call site (- \${toolCall}`). Shell commands routinely contain backticks (e.g. `` bash -c 'echo id`' ``), which close the code span early and produce malformed GitHub Markdown in the denial report issue.
💡 Suggested fix
Escape backticks in the preview before returning, so the rendered output stays inside the code span:
function formatRecentToolCall(toolName, mcpServerName, data) {
const base = mcpServerName ? `${mcpServerName}.${toolName}` : toolName;
const normalizedToolName = typeof toolName === "string" ? toolName.toLowerCase() : "";
if (!RECENT_TOOL_CALLS_WITH_COMMAND_PREVIEW.has(normalizedToolName)) return base;
const commandPreview = extractShellCommandPreview(data).replace(/`/g, "'");
return commandPreview ? `${base}(${commandPreview})` : base;
}Alternatively, render recentToolCallsList in buildToolDenialsExceededContext using a fenced code block instead of inline backtick spans, which is immune to embedded backtick characters.
A test using a command like echo `hostname` would catch this regression.
| @@ -0,0 +1,37 @@ | |||
| // @ts-check | |||
There was a problem hiding this comment.
Missing "use strict" directive: every other .cjs file in this directory opens with "use strict";. Omitting it means sloppy-mode semantics — silent global variable creation on undeclared assignments, different this binding in functions, and missed V8 optimisations. Add "use strict"; on the line after // @ts-check``.
| if ("command" in data) commandFieldCandidates.push(data.command); | ||
| if ("input" in data) commandFieldCandidates.push(data.input); | ||
| if ("arguments" in data) commandFieldCandidates.push(data.arguments); | ||
| if ("args" in data) commandFieldCandidates.push(data.args); |
There was a problem hiding this comment.
Array-shaped args silently yields nothing: the PR description lists args as a supported extraction shape, but args: ["cmd", "arg"] — the most common CLI array form — is pushed as a candidate then skipped because typeof array === "object" and array.command/array.cmd are both undefined. The function returns "" without any signal.
💡 Suggested fix
Either handle the array case explicitly, or clarify in the JSDoc that only string and {command, cmd} object shapes are supported for args:
if (Array.isArray(candidate)) {
const joined = candidate.filter(s => typeof s === "string").join(" ").trim();
if (joined) return joined;
continue;
}As-is, consumers passing args: ["-c", "git status"] get no preview, making the extraction gap invisible.
| .trim(); | ||
| if (!singleLine) return ""; | ||
| if (singleLine.length <= maxLen) return singleLine; | ||
| return `${singleLine.slice(0, maxLen - ELLIPSIS_LENGTH)}${ELLIPSIS}`; |
There was a problem hiding this comment.
Negative slice index when maxLen < ELLIPSIS_LENGTH: singleLine.slice(0, maxLen - 3) becomes slice(0, -N) for any maxLen < 3, which drops characters from the end rather than truncating to maxLen. The result exceeds the stated length contract.
💡 Suggested fix
Add a guard before the slice:
if (maxLen <= ELLIPSIS_LENGTH) return singleLine.slice(0, maxLen);
return `${singleLine.slice(0, maxLen - ELLIPSIS_LENGTH)}${ELLIPSIS}`;Currently only called with the default maxLen = 120, so this is latent rather than live — but the parameter is part of the public function signature and a future caller could trigger it.
|
`` After that, summarize the remaining blockers and next steps.
|
|
@copilot run pr-finisher skill |
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Completed. I ran the pr-finisher pass, addressed the blocking review issues, validated locally, and pushed commit |
|
@copilot please summarize the remaining blockers and rerun checks after the review threads are resolved.
|
|
@copilot review all comments and address unresolved review feedback.
|
The max-tool-denials report showed the last five calls as bare tool names (for example, repeated
bash), which made the denial chain hard to diagnose. This change adds lightweight command context so the report shows what was attempted, not just which tool was invoked.SDK event enrichment
tool.execution_startserialization to include command text when present in event payloads.Denial report signal quality
bash/shell).Shared extraction path
command,input,arguments,args,toolInput,parameters).Coverage updates
tool.execution_startserialization