Python: Shell tool with support for local and Docker#5664
Conversation
… package Introduces a safe, cross-OS local shell tool as the first citizen of a new agent-framework-tools workspace package. Supports persistent (default) and stateless modes across pwsh/powershell.exe/bash/sh, with policy denylist, allowlist, approval gating, process-tree kill on timeout, output truncation, and audit hooks. Integrates with existing provider get_shell_tool(func=...) factories via FunctionTool kind='shell'. See docs/decisions/0026-builtin-tools-local-shell.md for the full design. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Codifies what LocalShellTool does and does not defend against, and delegates the security-relevant lifecycle primitive to a battle-tested library instead of hand-rolled per-OS code. Changes: - Adopt psutil for cross-OS process-tree termination (executor + session). Replaces hand-rolled taskkill/killpg with one canonical implementation. - Resolve taskkill.exe to absolute %SystemRoot%\System32 path so PATH poisoning cannot redirect us to an attacker-supplied binary. - Reframe ShellPolicy docstring + ADR + README: denylist is a guardrail, not a security boundary. - Require acknowledge_unsafe=True to set approval_mode='never_require', making the unsafe path explicitly opt-in with a self-documenting name. - Add tests/test_security.py codifying named CVE-style cases. Defenses we DO claim are asserted; non-defenses (denylist bypasses via backslash insertion, variable expansion, interpreter escape, base64, alternative tools, PowerShell-native verbs) are documented as expected-to-pass tests so residual risk stays visible. - Add Threat Model + Confidence Strategy sections to ADR 0026. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Adds a container-backed shell executor as the recommended pattern for untrusted-input shell workflows. The container provides the security boundary (--network none, non-root user, --read-only, --cap-drop ALL, no-new-privileges, memory/pids limits, tmpfs /tmp), so approval gating is optional unlike LocalShellTool. Also introduces a ShellExecutor Protocol so callers can plug in custom backends (Firecracker, SSH, WASI) without forking the framework. Removes the planned HyperlightShellExecutor follow-up from ADR 0026: Hyperlight is a WASM code sandbox with no kernel/userland/shell binary, so a Hyperlight-backed shell is not viable. Docker is the realistic sandbox tier for shell. Tests: 11 unit tests for argv builders + lifecycle (no Docker daemon required); 3 integration tests gated on is_docker_available(). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Applies the applicable subset of bug fixes accumulated during the .NET shell-tool PR review (microsoft#5604) to the Python shell tool. A1 - Quote workdir safely in _maybe_reanchor Previously _tool.py used double-quote interpolation when emitting the cd/Set-Location prefix, which expanded $VAR, $(), and backticks in the workdir path. A workdir containing shell metacharacters could trigger arbitrary command execution before the user command ran. Replaced with single-quote escaping helpers _quote_posix and _quote_powershell that emit literal-string forms safe for both hosts. A5/A6 - Consolidate truncation to a single byte-aware helper Extracted a shared truncate_head_tail / truncate_text_head_tail helper in _truncate.py. The new implementation distributes odd caps so head receives floor(cap/2) and tail receives ceil(cap/2) bytes, matching the .NET round-9 fix and ensuring no input bytes are silently dropped on the boundary. _session.py previously truncated by Python str length while the caller passed _max_output_bytes - the unit mismatch is now gone: raw byte buffers go through truncate_head_tail and decoded text goes through truncate_text_head_tail. Unit tests added for the truncate and quote helpers. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…tool
The shell tool's docstrings and comments contained two patterns that
the .NET review pushed back on:
- Narrative framing about implementation history ("hard-won",
"we sidestep", "design inspiration: ...", competitor framework
name-drops in module docstrings).
- Overstated security guarantees ("battle-tested",
"reasonable for untrusted input", "recommended executor for any
agent that runs commands from untrusted input",
"destructive commands are blocked", "safe local shell tool",
"blocks shell injection").
Rewrites the affected docstrings and comments to describe what the
code does in neutral terms. Behaviour is unchanged.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Ports the .NET ShellEnvironmentProvider as a Python ContextProvider so agents using LocalShellTool or DockerShellTool can be primed with an accurate description of the shell they're talking to (family, version, OS, working directory, and which CLIs are available). The provider runs probes through any ShellExecutor, caches the resulting snapshot, and on every before_run extends the session instructions with a markdown block describing the shell idiom to use. A failed first probe leaves the cache empty so the next call retries (no permanent poisoning). Probe failures from a narrow set of expected error types (ShellCommandError, ShellExecutionError, ShellTimeoutError, and asyncio.TimeoutError from the per-probe timeout) are recorded as None fields in the snapshot. Other exceptions propagate. Tool names are validated against ^[A-Za-z0-9._-]+$ before being interpolated into a probe command. Includes 12 unit tests covering happy path, stderr fallback, timeout handling, expected/unexpected exception paths, malicious tool name rejection, case-insensitive deduplication, retry after failure, concurrent first-callers sharing one probe, and the default and custom formatter paths. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…anup Add a README section introducing ShellEnvironmentProvider, soften two remaining overconfident security-boundary comments in _executor_base.py and the DockerShellTool class docstring, and add a sample (shell_with_environment_provider.py) that demonstrates the provider in stateless and persistent modes. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The repository convention is to host samples under python/samples/ rather than inside the package directory. Move the two net-new shell samples (allow-list and environment-provider) to python/samples/02-agents/tools/ and drop the in-package samples/ directory; the existing top-level providers/openai/client_with_local_shell.py already covers the basic LocalShellTool walkthrough. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
This PR adds a new first-party Python workspace package, agent-framework-tools, introducing a cross-platform shell execution surface (LocalShellTool) plus a container-sandboxed variant (DockerShellTool) and a context provider (ShellEnvironmentProvider) to help models emit correct shell idioms and discover available CLIs.
Changes:
- Add
agent-framework-toolspackage (shell tools, policy/denylist, truncation, process-tree kill, persistent session protocol, environment context provider). - Add unit + integration-gated tests and runnable samples for local and Docker-backed shell execution.
- Register the new package in the Python workspace (pyproject + uv lock) and update an existing OpenAI sample to use
LocalShellTool.
Show a summary per file
| File | Description |
|---|---|
| python/uv.lock | Adds agent-framework-tools as a workspace member and locked editable package entry. |
| python/pyproject.toml | Registers agent-framework-tools as a workspace source dependency. |
| python/samples/02-agents/providers/openai/client_with_local_shell.py | Updates sample to use LocalShellTool instead of a hand-rolled subprocess tool. |
| python/packages/tools/README.md | Documents installation, modes, safety model, and tool/provider usage. |
| python/packages/tools/LICENSE | Adds MIT license for the new tools package. |
| python/packages/tools/pyproject.toml | Defines packaging metadata, deps (incl. psutil), and test/lint/tooling config. |
| python/packages/tools/agent_framework_tools/init.py | Adds package root and version discovery. |
| python/packages/tools/agent_framework_tools/py.typed | Marks the package as typed for type checkers. |
| python/packages/tools/agent_framework_tools/shell/init.py | Exposes the public shell-tool API surface. |
| python/packages/tools/agent_framework_tools/shell/_types.py | Introduces shared types and core exceptions for shell execution. |
| python/packages/tools/agent_framework_tools/shell/_truncate.py | Implements head/tail UTF-8 byte-budget truncation helpers. |
| python/packages/tools/agent_framework_tools/shell/_policy.py | Adds allow/deny policy model and default denylist patterns. |
| python/packages/tools/agent_framework_tools/shell/_resolve.py | Implements cross-platform shell argv resolution and PowerShell detection. |
| python/packages/tools/agent_framework_tools/shell/_killtree.py | Adds cross-OS process-tree termination (psutil + fallback). |
| python/packages/tools/agent_framework_tools/shell/_executor.py | Implements stateless execution via subprocess with timeout + truncation. |
| python/packages/tools/agent_framework_tools/shell/_executor_base.py | Defines a minimal ShellExecutor protocol for pluggable backends. |
| python/packages/tools/agent_framework_tools/shell/_session.py | Implements persistent shell session using sentinel framing and reader tasks. |
| python/packages/tools/agent_framework_tools/shell/_tool.py | Adds LocalShellTool facade + agent-framework FunctionTool wiring. |
| python/packages/tools/agent_framework_tools/shell/_environment.py | Adds ShellEnvironmentProvider to probe and inject shell environment guidance. |
| python/packages/tools/agent_framework_tools/shell/_docker.py | Adds DockerShellTool and argv builders for container-sandboxed execution. |
| python/packages/tools/samples/init.py | Adds samples package marker. |
| python/packages/tools/samples/shell_openai_persistent.py | Demonstrates OpenAI usage with an approval loop and persistent local shell. |
| python/packages/tools/samples/shell_allowlist_stateless.py | Demonstrates a strict allowlist + stateless mode configuration. |
| python/packages/tools/samples/shell_with_environment_provider.py | Demonstrates using ShellEnvironmentProvider with stateless vs persistent shells. |
| python/packages/tools/tests/init.py | Adds tests package marker. |
| python/packages/tools/tests/test_shell_truncate_and_quote.py | Tests truncation helpers and quoting helpers. |
| python/packages/tools/tests/test_shell_environment_provider.py | Tests probing, formatting, caching, and concurrency behavior of environment provider. |
| python/packages/tools/tests/test_security.py | Adds security regression tests documenting denylist behavior and residual risk. |
| python/packages/tools/tests/test_policy.py | Tests default policy behavior, allowlist behavior, and custom overrides. |
| python/packages/tools/tests/test_local_shell_tool.py | Tests local shell tool modes, timeouts, policy, persistence, and concurrency. |
| python/packages/tools/tests/test_docker_shell_tool.py | Tests Docker argv builders, basic tool behavior, and docker-availability-gated integration tests. |
Copilot's findings
- Files reviewed: 27/29 changed files
- Comments generated: 6
…_model Two new tests in test_local_shell_tool.py exercise the default confine_workdir=True behaviour on POSIX and PowerShell, asserting that 'cd' inside one persistent-mode call does not leak into the next. A new test_shell_result.py module provides direct unit coverage for every conditional branch of ShellResult.format_for_model (stdout, truncated, stderr, timed_out, exit_code) so regressions in the LLM-facing format are caught immediately. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- _tool.py: detect PowerShell via is_powershell() helper instead of basename string match - _environment.py: use public ContextProvider import (no private _ prefix) - _session.py: trim _stdout_buf/_stderr_buf after copying to avoid unbounded retention across calls - _docker.py: short-circuit start()/close() in stateless mode; add configurable shell kwarg (default bash, e.g. 'sh' for alpine) - tests: parenthesized multi-line assert; alpine integration tests now pass shell='sh' Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Python Test Coverage Report •
Python Unit Test Overview
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
- pyupgrade: drop quoted self-class refs in __aenter__/method annotations - ruff format: reflow long lines per workspace style - pyright: assert psutil non-None in optional-import branch; lowercase mutable module globals; annotate _approval_mode as Literal so tool() Literal-typed kwarg is accepted; add ... body to ShellExecutor.run protocol; remove unused deprecated _kill_tree wrapper - tests: skip docker integration tests on win32 (Windows containers don't support --read-only / alpine images) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…t findings Mirrors the .NET PR microsoft#5604 cleanup: - Remove DEFAULT_DENYLIST from ShellPolicy. ShellPolicy() now ships with an empty deny-list; operators opt into site-specific patterns explicitly. No major agent framework uses regex matching as a primary security control; AutoGen v2 removed theirs. Approval gating + sandbox tier remain the real boundaries. - Rewrite module / class docstrings to frame ShellPolicy as a UX pre-filter, not a security control. - Add Single-session ownership paragraphs to ShellExecutor, ShellSession, LocalShellTool, and DockerShellTool: a persistent-mode tool is owned by exactly one conversation / agent session; do not share across users or concurrent conversations. - Tests now supply explicit deny patterns instead of relying on a default. - Address Pre-commit Hooks (bandit) CI failures: convert internal-invariant asserts to explicit RuntimeError, annotate intentional subprocess/shell usage with # nosec, document container-internal /tmp paths. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Automated Code Review
Reviewers: 2 | Confidence: 85%
✓ Test Coverage
The PR adds substantial new shell tool functionality with good high-level integration tests for LocalShellTool and DockerShellTool. However, three production modules totaling ~250 lines have zero test coverage:
_killtree.py(security-critical process-tree termination),_resolve.py(shell discovery with env-var override and fallback chain), and_session.py's_parse_rc()helper (exit-code parsing from sentinel output). The_kiltree.pygap is most concerning since timeout enforcement is listed as a security property in the README. Additionally,LocalShellTool's environment-merging logic (env/clean_envparameters) has three distinct branches with no test coverage.
✓ Design Approach
I found two design-level issues. First, the
shell=/AGENT_FRAMEWORK_SHELLoverride path bypasses the interactive-vs-stateless argv shaping, so common overrides likebashorpwshsilently stop working in stateless mode unless callers already know to include-c/-Commandthemselves. Second,ShellEnvironmentProvidercaches its probe result on the provider instance rather than in provider session state, so a shared provider can leak one session's working-directory/tool snapshot into another session even though the surrounding framework passes per-session provider state and the README describes probing once per session. The code paths and tests look internally consistent, but two new samples currently teach a stronger security model than the PR itself documents. The added tests explicitly defineShellPolicyas a best-effort pre-filter rather than a security boundary, yet one sample saysLocalShellToolhas a default destructive-command deny-list and another says an allow-list is sufficient reason to disable approval. Those sample docs should be corrected so users do not rely on protections the package says it does not provide.
Automated review by alliscode's agents
Deny-list documentation drift: - README and the OpenAI/local-shell sample no longer claim a built-in deny-list of destructive commands. ShellPolicy is described as an optional, operator-supplied UX pre-filter; the real boundaries remain approval gating and the sandbox tier. Behavioural fixes called out in review: - ShellPolicy.evaluate() now denies empty / whitespace-only commands explicitly instead of returning allow with no rationale. - truncate_head_tail() raises ValueError for cap <= 0 instead of silently returning the full input with truncated=False, which previously could defeat output-capping in callers that mis-configured the budget. - LocalShellTool.as_function() / DockerShellTool.as_function() return the ShellCommandError text directly so the model sees a single, non-redundant 'Command rejected by policy: …' message instead of the prior duplicated 'Command blocked by policy: Command rejected …' wrapping. - ShellSession POSIX sentinel trailer now snapshots and restores the prior errexit (set -e) state around the trailer, so a user 'set -e' in the persistent shell is no longer permanently disabled by the next run(). Tests: - New test_shell_parse_rc.py covers the full _parse_rc() edge-case surface (zero, positive, negative, CRLF, no newline, missing prefix, empty input, non-digits, trailing garbage, partial digits). - test_policy.py asserts the new empty-command deny. - test_shell_truncate_and_quote.py asserts ValueError for cap=0 and cap<0. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
Round-2 review fixes pushed in Deny-list documentation drift (Copilot review on README.md,
Behaviour fixes:
Tests:
bandit / ruff / pytest all clean locally. |
- _resolve.py: reject empty/whitespace shell override string - _tool.py / _docker.py: mode-aware default tool description (persistent vs stateless) - _tool.py: fix misleading workdir docstring (re-anchor, not blocking) - _types.py: emit stream-agnostic [output truncated] marker - _policy.py: declare _denies/_allows as dataclass fields - _environment.py: use $(pwd) instead of $PWD in POSIX probe Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- _resolve.py: in stateless mode, ensure shell overrides end with -c/-Command so commands aren't misinterpreted as script-file paths. - ShellExecutor.run / LocalShellTool.run / DockerShellTool.run now accept an optional imeout kwarg; ShellEnvironmentProvider drops the outer asyncio.wait_for and lets the executor enforce the probe timeout internally, so cancellation no longer risks leaving a hung subprocess or corrupted session. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- pyproject.toml: bump agent-framework-core minimum from 1.2.0 to 1.2.2 to align with the rest of the workspace. - _docker.py: validate extra_run_args at construction time and reject flags that would dismantle the isolation defaults (--privileged, --cap-add, --security-opt, --network/--net, -v/--volume/--mount, --device, --pid, --ipc, --userns, --user, --read-only, --tmpfs, --add-host, --gpus, --cgroupns, --device-cgroup-rule); also documented the warning on the docstring. - _docker._stop_container: retry docker rm -f once and log a warning/error when it does not succeed, so operators can audit leaked containers instead of getting a silent success. - _docker._run_stateless timeout path: fall back to docker rm -f when docker kill fails or times out (--rm only reaps on clean exit), and log instead of silently swallowing communicate() errors. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- ag-ui: 1.0.0rc2 -> 1.0.0rc3 - orchestrations: 1.0.0rc1 -> 1.0.0rc2 - Add shell tool (microsoft#5664) to CHANGELOG - uv.lock refreshed Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* Python: bump package versions for 1.6.0 release - Released cohort (agent-framework, core, openai, foundry): 1.5.0 -> 1.6.0 - Beta packages (21 packages): 1.0.0b260519 -> 1.0.0b260521 - Alpha packages (azure-contentunderstanding, foundry-hosting, gemini, monty): 1.0.0a260518/19 -> 1.0.0a260521 - ag-ui stays at 1.0.0rc2, orchestrations at 1.0.0rc1 (dependency bounds updated) - Inter-package dependency lower bounds updated (>=1.5.0,<2 -> >=1.6.0,<2) - Update CHANGELOG compare links - uv.lock refreshed Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Address review: bump RC packages, add shell tool to changelog - ag-ui: 1.0.0rc2 -> 1.0.0rc3 - orchestrations: 1.0.0rc1 -> 1.0.0rc2 - Add shell tool (microsoft#5664) to CHANGELOG - uv.lock refreshed Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This pull request introduces a new built-in tools package for the Microsoft Agent Framework, focusing on a cross-platform local shell tool (
LocalShellTool) and its supporting infrastructure. It adds comprehensive documentation, licensing, and a Python package structure to support safe and extensible shell command execution, with future growth in mind.