A Python-based code scanning tool that uses the Semgrep Python SDK to detect AI/LLM-specific vulnerabilities. This tool is designed to run in both GitHub Actions (headless CI) and as the scanning engine behind a VS Code extension.
- Scanner (
llm_scan/): Python package for scanning code - VS Code Extension (
vscode-extension/): IDE integration - Visual Studio Extension (
visual-studio-extension/): Full IDE integration - Backend API (
backend/): Node.js server with MySQL for storing scan results (similar to Semgrep dashboard)
- Semgrep Python SDK Integration: Uses Semgrep's Python APIs directly (no CLI subprocess calls)
- Multi-language Support: Architecture supports Python, JavaScript, and TypeScript (initial rules are Python-focused)
- Offline-first: All scanning runs without network access
- Multiple Output Formats: SARIF (for GitHub Code Scanning), JSON (for VS Code), and human-readable console output
- Extensible Rule System: Easy to add new rule packs and vulnerability patterns
- MCP (Model Context Protocol): Rules for Python MCP SDK / FastMCP (
@tool,@async_tool,@resource,@prompt) – code/command/path injection, SSRF, SQL injection, prompt injection - Test Case Generation: Automatically generates security test cases by extracting system prompts, tool definitions (MCP, LangChain), and detecting dangerous sinks. See TEST_GENERATION.md for details.
- Evaluation test generation (multi-framework): Extract tool definitions from FastMCP, LangChain, LlamaIndex, or LangGraph code, then use AI to generate natural-language test prompts. Output JSON includes eval_type per case (
tool_selection,safety,prompt_injection,argument_correctness,robustness) and, for LangGraph, graph_structure (nodes, edges, entry point) for path-aware evals. See TEST_GENERATION.md. - Concrete eval runner: Run evals against a compiled graph/agent to measure tool-selection accuracy, valid path rate (LangGraph), and tool coverage. Use
python -m llm_scan.evalorllm-scan-evalwith an eval JSON and graph spec. See TEST_GENERATION.md. - Performance Optimized: Incremental scanning, respects .gitignore, configurable include/exclude patterns
pip install trusys-llm-scanThis will install the trusys-llm-scan command-line tool and all dependencies.
For development or if you need the latest version:
# Clone the repository
git clone https://github.com/spydra-tech/truscan.git
cd truscan
# Install dependencies
pip install semgrep requests
# Install in development mode
pip install -e .After installation, you can use the trusys-llm-scan command:
# Show installed version
trusys-llm-scan --version
# or
trusys-llm-scan -V
# Check PyPI for a newer version
trusys-llm-scan --check-updates
# Scan current directory
trusys-llm-scan . --format console
# Or use as a Python module
python -m llm_scan.runner . --format console
# Scan specific paths with SARIF output
python -m llm_scan.runner \
src/ tests/ \
--rules llm_scan/rules/python \
--format sarif \
--out results.sarif \
--exclude 'tests/**' \
--exclude '**/__pycache__/**'
# Filter by severity
python -m llm_scan.runner \
. \
--severity critical high \
--format json \
--out results.json
# Enable AI-based false positive filtering
python -m llm_scan.runner \
. \
--enable-ai-filter \
--ai-provider openai \
--ai-model gpt-4 \
--format console
# AI filtering with specific rules only
python -m llm_scan.runner \
. \
--enable-ai-filter \
--ai-analyze-rules openai-prompt-injection-direct \
--ai-analyze-rules openai-excessive-agency-file-deletion \
--format console
# Generate security test cases (extracts tools, system prompts, generates test cases)
python -m llm_scan.runner \
. \
--generate-tests \
--format console
# Generate test cases with AI enhancement
python -m llm_scan.runner \
. \
--generate-tests \
--enable-ai-filter \
--ai-provider openai \
--ai-model gpt-4 \
--test-max-cases 30
# Generate FastMCP evaluation tests (extract tools, AI generates prompts per tool, write JSON)
python -m llm_scan.runner samples/mcp \
--generate-eval-tests \
--eval-test-out eval_tests.json \
--ai-provider openai \
--ai-model gpt-4
# Generate LangChain evaluation tests (extract @tool definitions, same AI flow)
python -m llm_scan.runner samples/langchain \
--generate-eval-tests \
--eval-framework langchain \
--eval-test-out eval_tests.json \
--ai-provider openai \
--ai-model gpt-4
# Generate LangGraph evaluation tests (extract @tool and ToolNode; includes graph_structure for valid-path evals)
python -m llm_scan.runner samples/langgraph \
--generate-eval-tests \
--eval-framework langgraph \
--eval-test-out eval_tests.json \
--ai-provider openai \
--ai-model gpt-4
# Generate LlamaIndex evaluation tests (extract FunctionTool.from_defaults)
python -m llm_scan.runner samples/llama-index \
--generate-eval-tests \
--eval-framework llamaindex \
--eval-test-out eval_tests.json \
--ai-provider openai \
--ai-model gpt-4
# Run concrete evals (tool-selection accuracy, valid path rate, tool coverage)
python -m llm_scan.eval --eval-json eval_tests.json \
--graph samples.langgraph.langgraph_multi_agent_app:graphfrom llm_scan.config import ScanConfig
from llm_scan.runner import run_scan
from llm_scan.models import Severity
config = ScanConfig(
paths=["src/"],
rules_dir="llm_scan/rules/python",
include_patterns=["*.py"],
exclude_patterns=["tests/**"],
severity_filter=[Severity.CRITICAL, Severity.HIGH],
output_format="json",
)
result = run_scan(config)
for finding in result.findings:
print(f"{finding.severity}: {finding.message}")from llm_scan.runner import run_scan_for_vscode
from llm_scan.models import ScanRequest, Severity
request = ScanRequest(
paths=["/workspace/src"],
rules_dir="/workspace/llm_scan/rules/python",
include_patterns=["*.py"],
severity_filter=[Severity.CRITICAL, Severity.HIGH],
output_format="json"
)
response = run_scan_for_vscode(request)
if response.success:
# Process response.result.findings
passSee vscode-integration.md for the complete integration contract.
The scanner includes an optional AI-based false positive filter that uses LLM APIs to analyze Semgrep findings and filter out false positives. This feature helps reduce noise and improve the accuracy of security findings.
- Semgrep Scan: First, Semgrep runs and finds potential vulnerabilities
- AI Analysis: Selected findings are analyzed by an AI model (OpenAI GPT-4 or Anthropic Claude)
- Context-Aware Filtering: AI considers code context, sanitization, framework protections, and exploitability
- Confidence-Based Filtering: Only high-confidence false positives are filtered (configurable threshold)
# Enable AI filtering with OpenAI
python -m llm_scan.runner . \
--enable-ai-filter \
--ai-provider openai \
--ai-model gpt-4 \
--ai-confidence-threshold 0.7
# Use Anthropic Claude
python -m llm_scan.runner . \
--enable-ai-filter \
--ai-provider anthropic \
--ai-model claude-3-opus-20240229
# Analyze only specific rules (cost optimization)
python -m llm_scan.runner . \
--enable-ai-filter \
--ai-analyze-rules openai-prompt-injection-direct \
--ai-analyze-rules openai-excessive-agency-file-deletion--enable-ai-filter: Enable AI filtering--ai-provider: Choose provider (openaioranthropic)--ai-model: Model name (e.g.,gpt-4,gpt-3.5-turbo,claude-3-opus-20240229)--ai-api-key: API key (or use environment variables)--ai-confidence-threshold: Confidence threshold (0.0-1.0, default: 0.7)--ai-analyze-rules: Specific rule IDs to analyze (can be used multiple times)
- AI filtering is optional and disabled by default
- Only analyzes findings with
confidence: "medium"or"low"by default - Uses caching to avoid re-analyzing identical code patterns
- Processes findings in batches for efficiency
- Estimated cost: ~$0.01-0.10 per analyzed finding
- Recommended for: Medium/low confidence rules, complex patterns, reducing false positives
- Not needed for: High confidence rules, simple patterns, cost-sensitive environments
- Best practice: Start with specific rules (
--ai-analyze-rules) to test effectiveness
The scanner detects vulnerabilities based on the OWASP Top 10 for LLM Applications:
- LLM01: Prompt Injection - Unsanitized user input in prompts
- LLM02: Insecure Output Handling - LLM output used unsafely (code/command injection, XSS)
- LLM03: Training Data Poisoning - Training data from untrusted sources
- LLM04: Model Denial of Service - Resource exhaustion through excessive tokens/requests
- LLM05: Supply Chain Vulnerabilities - Untrusted models, libraries, or plugins
- LLM06: Sensitive Information Disclosure - Secrets/PII in prompts or responses
- LLM07: Insecure Plugin Design - Plugin execution without authorization/validation
- LLM08: Excessive Agency - LLM granted excessive permissions
- LLM09: Overreliance - Blind trust in LLM output without validation
- LLM10: Model Theft - Unauthorized model access or extraction
- Code Injection (CWE-94): LLM output passed to
eval(),exec(), orcompile() - Command Injection (CWE-78): LLM output passed to
subprocess.run(),subprocess.call(),subprocess.Popen(), oros.system() - XSS (CWE-79): LLM output rendered in HTML without escaping
- OpenAI:
- Legacy API (
openai.ChatCompletion.create,openai.Completion.create) - v1 client (
OpenAI().chat.completions.create)
- Legacy API (
- Anthropic:
Anthropic().messages.create - Generic LLM wrappers:
call_llm(),.llm(),.generate(),.chat()
Rules are organized by provider in python/{provider}/generic/ directories.
The scanner includes rules for MCP servers built with the Python MCP SDK (FastMCP). Handler parameters for tools, resources, and prompts are treated as untrusted (client/LLM-controlled) and checked for unsafe use.
Decorators covered:
@mcp.tool()– sync tool handlers@mcp.async_tool()– async tool handlers@mcp.resource(...)– resource URI handlers (e.g.@mcp.resource("file:///docs/{filename}"))@mcp.prompt()– prompt template handlers
Vulnerability rules:
- Code injection – handler params →
eval()/exec()/compile() - Command injection – handler params →
subprocess/os.system - Path traversal – handler params →
open()/Path()/ file ops - Prompt injection – handler output/params → LLM
messages/content - SSRF – handler params (URLs) →
requests.get/urllib.request.urlopen/httpx - SQL injection – handler params → raw
cursor.execute()
Rule pack: llm_scan/rules/python/mcp/generic/
Sample servers: samples/mcp/ – vulnerable examples for each pattern (see samples/mcp/README.md).
# Scan MCP server code
python -m llm_scan.runner . --rules llm_scan/rules/python/mcp --format console
# Run against included MCP samples
python -m llm_scan.runner samples/mcp --rules llm_scan/rules/python/mcp --format console
# Generate evaluation test cases for FastMCP (prompts that should trigger each tool)
python -m llm_scan.runner samples/mcp --generate-eval-tests --eval-test-out eval_tests.jsonllm_scan/
├── __init__.py
├── models.py # Data models (Finding, ScanResult, etc.)
├── config.py # Configuration management
├── runner.py # Main entry point and CLI
├── engine/
│ ├── semgrep_engine.py # Semgrep Python SDK integration
│ ├── ai_engine.py # AI-based false positive filtering
│ ├── ai_providers.py # AI provider implementations (OpenAI, Anthropic)
│ ├── mcp_extractor.py # AST-based FastMCP tool extraction (for eval test generation)
│ ├── langchain_extractor.py # LangChain @tool extraction
│ ├── llamaindex_extractor.py # LlamaIndex FunctionTool.from_defaults extraction
│ ├── langgraph_extractor.py # LangGraph StateGraph + ToolNode extraction (tools + graph_structure)
│ └── eval_prompt_generator.py # AI-powered eval prompt generation (manifest + eval_type mix)
├── eval/
│ └── runner.py # Concrete eval runner (tool-selection, valid path, tool coverage)
├── utils/
│ └── code_context.py # Code context extraction for AI analysis
├── output/
│ ├── sarif.py # SARIF formatter
│ ├── json.py # JSON formatter
│ └── console.py # Console formatter
├── enrich/
│ └── uploader.py # Optional upload interface
└── rules/
└── python/ # Semgrep rule packs
├── openai/ # OpenAI-specific rules
│ ├── generic/ # Framework-agnostic OpenAI rules
│ │ ├── prompt-injection.yaml
│ │ ├── code-injection.yaml
│ │ ├── command-injection.yaml
│ │ ├── sql-injection.yaml
│ │ ├── sensitive-info-disclosure.yaml
│ │ ├── model-dos.yaml
│ │ ├── overreliance.yaml
│ │ ├── supply-chain.yaml
│ │ ├── jailbreak.yaml
│ │ ├── data-exfiltration.yaml
│ │ ├── inventory.yaml
│ │ └── taint-sources.yaml
│ ├── flask/ # Flask-specific patterns (future)
│ └── django/ # Django-specific patterns (future)
├── anthropic/ # Anthropic-specific rules
│ └── generic/
│ ├── prompt-injection.yaml
│ ├── code-injection.yaml
│ ├── inventory.yaml
│ └── taint-sources.yaml
├── mcp/ # MCP (Model Context Protocol) Python SDK / FastMCP
│ └── generic/
│ ├── code-injection.yaml
│ ├── command-injection.yaml
│ ├── path-traversal.yaml
│ ├── prompt-injection.yaml
│ ├── ssrf.yaml
│ └── sql-injection.yaml
└── [other providers]/ # Additional LLM providers
The repository also includes sample code for testing rules, including vulnerable MCP servers under samples/mcp/ (see samples/mcp/README.md).
- Create a new YAML file in the appropriate location following the structure:
llm_scan/rules/python/{llm_framework}/generic/for framework-agnostic rulesllm_scan/rules/python/{llm_framework}/{web_framework}/for web framework-specific rules- Example:
llm_scan/rules/python/openai/generic/my-new-rule.yaml
rules:
- id: my-new-rule
pattern: |
$LLM_OUTPUT = ...
dangerous_function($LLM_OUTPUT)
message: "LLM output passed to dangerous function"
severity: ERROR
languages: [python]
metadata:
category: security
cwe: "CWE-XXX"
remediation: "Fix guidance here"
paths:
include:
- "**/*.py"- The rule will be automatically loaded when scanning with the rules directory.
Sinks are dangerous functions that should not receive untrusted LLM output. To add a new sink family:
- Add patterns to existing rule files in the appropriate framework directory:
- For OpenAI:
llm_scan/rules/python/openai/generic/{vulnerability-type}.yaml - For Anthropic:
llm_scan/rules/python/anthropic/generic/{vulnerability-type}.yaml
- For OpenAI:
rules:
- id: llm-to-new-sink
patterns:
- pattern-either:
- pattern: dangerous_sink1($LLM_OUTPUT)
- pattern: dangerous_sink2($LLM_OUTPUT)
- pattern-inside: |
$LLM_RESPONSE = $LLM_CALL(...)
...
message: "LLM output flows to dangerous sink"
severity: ERROR
languages: [python]- Update the category mapping in
llm_scan/engine/semgrep_engine.pyif needed:
CATEGORY_MAP = {
"code-injection": Category.CODE_INJECTION,
"command-injection": Category.COMMAND_INJECTION,
"llm-to-new-sink": Category.OTHER, # Add your category
}LLM providers are sources of taint. To add support for a new LLM provider:
-
Create the provider directory structure:
mkdir -p llm_scan/rules/python/{new_provider}/generic -
Add taint source patterns to
llm_scan/rules/python/{new_provider}/generic/taint-sources.yaml:
rules:
- id: llm-taint-new-provider
patterns:
- pattern: $CLIENT.new_provider_api(...)
- pattern-inside: |
$RESPONSE = ...
$CONTENT = $RESPONSE.output
...
$SINK($CONTENT)
message: "New provider API response content flows to dangerous sink"
severity: WARNING
languages: [python]- Add complete taint flow rules in
llm_scan/rules/python/{new_provider}/generic/code-injection.yamlor similar:
rules:
- id: llm-to-sink-new-provider
patterns:
- pattern: |
$RESPONSE = $CLIENT.new_provider_api(...)
- pattern: |
$CONTENT = $RESPONSE.output
- pattern: |
dangerous_sink($CONTENT)
message: "New provider LLM output flows to dangerous sink"
severity: ERRORpython -m llm_scan.runner --paths . --format consolepython -m llm_scan.runner \
--paths src/ \
--rules /path/to/custom/rules \
--format json \
--out results.jsonpython -m llm_scan.runner \
--paths . \
--exclude 'tests/**' \
--exclude '**/__pycache__/**' \
--exclude '.venv/**'See .github/workflows/llm-scan.yml for a complete example.
The workflow:
- Checks out code
- Sets up Python
- Installs dependencies (semgrep, pytest)
- Runs the scanner
- Uploads SARIF results to GitHub Code Scanning
The scanner can be integrated into any CI system that supports Python:
pip install semgrep
pip install -e .
python -m llm_scan.runner \
--paths . \
--format sarif \
--out results.sarifRun tests with pytest:
pip install pytest
pytest tests/The tests/fixtures/ directory contains:
- Positive tests (
positive/): Files that should trigger findings - Negative tests (
negative/): Files that should not trigger findings
# Test engine only
pytest tests/test_engine.py
# Test output formatters
pytest tests/test_output.py
# Test with verbose output
pytest -v tests/For GitHub Code Scanning integration:
python -m llm_scan.runner --paths . --format sarif --out results.sarifFor programmatic consumption:
python -m llm_scan.runner --paths . --format json --out results.jsonHuman-readable output (default):
python -m llm_scan.runner --paths . --format consolepaths: List of paths to scan (files or directories)rules_dir: Path to rules directoryinclude_patterns: Glob patterns for files to includeexclude_patterns: Glob patterns for files to excludeenabled_rules: List of rule IDs to enable (None = all)disabled_rules: List of rule IDs to disableseverity_filter: List of severity levels to includeoutput_format: "sarif", "json", or "console"output_file: Output file path (optional for console)respect_gitignore: Whether to respect .gitignore (default: True)max_target_bytes: Maximum file size to scan (default: 1MB)enable_eval_test_generation: Generate evaluation test cases (default: False)eval_test_output: Path to write eval test JSON (used with--generate-eval-tests)eval_test_max_prompts_per_tool: Max prompts per tool for eval generation (default: 3)eval_framework: Framework for tool extraction:mcp,langchain,llamaindex, orlanggraph(used with--generate-eval-tests)
Rule packs are defined by:
- Name
- Languages supported
- Path to rule files
- Version
- Default enabled status
Future versions will support:
- Entrypoint-based rule pack discovery
- Local folder configuration
- Remote rule pack fetching (with offline fallback)
- Incremental Scanning: Only scan changed files when possible
- File Size Limits: Large files (>1MB) are skipped by default
- Gitignore Support: Automatically excludes files in .gitignore
- Parallel Execution: Semgrep handles parallel rule execution internally
- Initial rule set focuses on Python vulnerabilities
- JavaScript/TypeScript rules are planned but not yet implemented
- Dataflow analysis is limited to Semgrep's taint tracking capabilities
- Some complex taint flows may require multiple rule passes
- Add test fixtures for new vulnerability patterns
- Create Semgrep YAML rules following existing patterns
- Update documentation
- Add tests for new functionality
[Specify your license here]
- Built on Semgrep
- Inspired by security scanning tools like CodeQL and Bandit