Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
54 changes: 48 additions & 6 deletions .github/workflows/integrity-filtering-audit.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,8 @@ safe-outputs:
timeout-minutes: 20
features:
difc-proxy: true
imports:
- shared/mcp-api-routing.md
Comment on lines +42 to +43

Copilot AI Apr 4, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Importing shared/mcp-api-routing.md makes this workflow explicitly require MCP tools for GitHub API access, but the procedure in this same document instructs using gh run list / gh run download (direct GitHub API calls). This creates conflicting guidance for the agent. Consider updating the procedure to use the GitHub MCP server tools for listing runs/downloading artifacts, or adjust the imported constraint to explicitly allow the needed gh usage for this audit workflow.

Suggested change
imports:
- shared/mcp-api-routing.md

Copilot uses AI. Check for mistakes.
---

# Integrity Filtering Audit
Expand All @@ -60,6 +62,9 @@ Common problems to look for:
- **Unscoped integrity tags** (e.g., `approved` instead of `approved:owner/repo`)
- **Empty responses** where data was expected (over-filtering)
- **Search result leaks** where out-of-scope items appear in filtered results
- **Direct API bypass attempts** where an agent contacts `api.github.com`, `github.com`,
or external AI services (e.g., `chatgpt.com`, `openai.com`) without going through
the MCP Gateway — these show up as network firewall blocks in the job logs

## Procedure

Expand Down Expand Up @@ -110,6 +115,17 @@ For each downloaded artifact set, check:
5. **Scope violations**: Check if any response contains data from repositories
NOT in the workflow's `allowed-repos` policy.

6. **Direct API bypass attempts**: Search job logs and stderr for network firewall
blocks that reveal the agent trying to reach external domains directly instead
of through the MCP Gateway. Key domains to flag:
- `api.github.com` — GitHub API (must go through MCP Gateway, not curl/fetch)
- `github.com` — GitHub web (should not be contacted directly)
- `chatgpt.com`, `openai.com`, `api.openai.com` — external AI services
- Any other non-allowlisted HTTP endpoint

For each block, record: the blocked domain, the number of block events, which
workflow run, and what step appears to have triggered it.

```bash
# Example: Count DIFC events in JSONL
grep -c 'difc_integrity' "$TMPDIR"/*/mcp-logs/rpc-messages.jsonl 2>/dev/null || echo "0"
Expand All @@ -119,6 +135,16 @@ grep -iE 'error|failed|blocked|unknown|wasm error:|WASM guard trap' "$TMPDIR"/*/

# Example: Specifically search for WASM guard panics
grep -iE 'wasm error:|WASM guard trap|unreachable' "$TMPDIR"/*/mcp-logs/mcp-gateway.log 2>/dev/null

# Example: Detect direct API bypass attempts in job logs
# The network firewall logs blocked connections; search agent stderr/stdout for clues
grep -iE 'api\.github\.com|chatgpt\.com|openai\.com|curl.*https?://[^ ]*github|fetch.*https?://[^ ]*github' \
"$TMPDIR"/*/mcp-logs/*.log 2>/dev/null | head -30

# Example: Summarize firewall blocks by domain from network-firewall logs (if present)
grep -iE 'BLOCK|DENY|firewall' "$TMPDIR"/*/mcp-logs/*.log 2>/dev/null \
| grep -oE '(api\.github\.com|github\.com|chatgpt\.com|openai\.com|[a-z0-9.-]+\.[a-z]{2,})' \
Comment on lines +145 to +146

Copilot AI Apr 4, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The firewall-block summarization command extracts "domains" using a very broad regex ([a-z0-9.-]+\.[a-z]{2,}) which will also match non-domains commonly present in logs (e.g., file names like mcp-gateway.log, rpc-messages.jsonl, etc.), producing noisy/misleading counts. Suggest tightening extraction to hostnames from URLs (e.g., extract after https?://) or matching the firewall log’s structured destination field if available.

Suggested change
grep -iE 'BLOCK|DENY|firewall' "$TMPDIR"/*/mcp-logs/*.log 2>/dev/null \
| grep -oE '(api\.github\.com|github\.com|chatgpt\.com|openai\.com|[a-z0-9.-]+\.[a-z]{2,})' \
# Extract only URL hostnames to avoid counting filenames or other dotted log tokens as domains
grep -iE 'BLOCK|DENY|firewall' "$TMPDIR"/*/mcp-logs/*.log 2>/dev/null \
| grep -oE 'https?://[^/[:space:]]+' \
| sed -E 's#^https?://##; s#:[0-9]+$##' \

Copilot uses AI. Check for mistakes.
| sort | uniq -c | sort -rn | head -20
```

### Step 4: Classify Findings
Expand All @@ -127,9 +153,20 @@ Classify each finding by severity:
- 🔴 **Critical**: Data leak (out-of-scope data returned), guard bypass, or
labeling failure that could expose unauthorized data
- 🟡 **Warning**: Over-filtering (legitimate data blocked), unscoped tags,
zero DIFC events in a run that should have filtering, or WASM guard trap
zero DIFC events in a run that should have filtering, WASM guard trap, or
**direct API bypass attempt** (agent contacted `api.github.com`, `github.com`,
or an external AI service such as `chatgpt.com` / `openai.com` directly instead
of routing through the MCP Gateway — visible as network firewall blocks)
- 🟢 **Info**: Normal filtering behavior, expected blocks, or configuration notes

When classifying a **direct API bypass** warning (W-1), record:
- The blocked domain(s) and block count
- The workflow name and run ID
- The likely cause: misconfigured `network.allowed` list, agent prompt not
restricting tool use, or the workflow missing `features.difc-proxy: true`
- Recommended fix: strengthen agent system prompt to use MCP Gateway tools
exclusively; see `shared/mcp-api-routing.md` for reusable constraint language

### Step 5: Create Summary Issue

Create an issue with the audit results using the following structure:
Expand Down Expand Up @@ -159,7 +196,8 @@ Create an issue with the audit results using the following structure:
<details>
<summary><b>Warnings</b></summary>

[Details of each warning]
[Details of each warning — for direct API bypass (W-1) warnings include: blocked
domain(s), block count, workflow name, likely cause, and recommended fix]

</details>

Expand All @@ -172,13 +210,17 @@ Create an issue with the audit results using the following structure:

### Runs Analyzed

| Run | Workflow | Branch | DIFC Events | Filtered | Status |
|-----|----------|--------|-------------|----------|--------|
| [§ID](run_url) | name | branch | N | N | ✅/⚠️/❌ |
| Run | Workflow | Branch | Agent Invoked | DIFC Events | Firewall Blocks | Status |
|-----|----------|--------|---------------|-------------|-----------------|--------|
| [§ID](run_url) | name | branch | ✅/❌ early-exit | N | N/total | ✅/⚠️/❌ |

### Recommendations

[Actionable suggestions based on findings]
[Actionable suggestions based on findings. For direct API bypass (W-1) findings,
always include: 1) which workflow to investigate, 2) whether it has
`features.difc-proxy: true`, 3) whether the agent prompt restricts tool use to
MCP Gateway tools, and 4) a pointer to `shared/mcp-api-routing.md` for reusable
constraint language to add to the workflow prompt.]
```

If there are no findings (all runs look healthy), still create the issue with
Expand Down
55 changes: 55 additions & 0 deletions .github/workflows/shared/mcp-api-routing.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
---
# MCP Gateway API routing constraints — import this in any workflow that makes
# GitHub API calls to ensure the agent is reminded to use MCP tools exclusively.
---

## ⚠️ IMPORTANT: GitHub API Routing Constraint

**All GitHub API calls MUST be made exclusively through the MCP Gateway's GitHub
MCP server tools.** Direct network access to `api.github.com`, `github.com`, or
any external service is not permitted and will be blocked by the network firewall.

Copilot AI Apr 4, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The statement that direct access to api.github.com/github.com "will be blocked by the network firewall" isn’t consistently true in this repo (some workflows explicitly allow these domains, e.g. shared/gh.md allows api.github.com). Suggest rewording to reflect policy/constraint (e.g., "not permitted / will be flagged") rather than guaranteeing a firewall block, so the guidance stays accurate across workflows.

Suggested change
any external service is not permitted and will be blocked by the network firewall.
any external service is not permitted; attempts to bypass MCP routing may be
flagged or blocked depending on workflow policy.

Copilot uses AI. Check for mistakes.

### Correct Usage

Use the provided MCP tools (e.g., `github-mcp-server` toolset) for all GitHub
operations:

```
✅ Use github-mcp-server list_issues with owner=..., repo=...
✅ Use github-mcp-server get_file_contents with owner=..., repo=..., path=...
✅ Use github-mcp-server list_workflow_runs with owner=..., repo=...
```

### Incorrect Usage

Do NOT use `curl`, `wget`, `fetch`, or any other HTTP client to contact GitHub's
APIs directly. Do NOT attempt to contact external AI services:

```
❌ curl https://api.github.com/repos/... (blocked — use MCP tools)
❌ gh api /repos/... (blocked — use MCP tools)
❌ fetch("https://api.github.com/...") (blocked — use MCP tools)
❌ curl https://chatgpt.com/... (blocked — external service)
❌ curl https://api.openai.com/... (blocked — external service)
```

### Why This Matters

- The MCP Gateway applies **DIFC (Decentralized Information Flow Control)**
integrity and secrecy labels to all GitHub API responses, enforcing scope
restrictions and preventing data leaks.
- Direct API calls bypass DIFC enforcement entirely, making it impossible to
audit what data the agent accessed or ensure scope compliance.
- Direct calls to external AI services (e.g., ChatGPT) are out-of-scope and
constitute a security boundary violation; all reasoning must happen inside
the Copilot engine provided by the workflow runtime.
- Network firewall blocks from bypass attempts are **audited** by the Integrity
Filtering Audit workflow and will be flagged as W-1 warnings.

### Checklist

Before making any API call, verify:
1. ✅ Am I using a GitHub MCP server tool (not `curl`, `gh`, or HTTP fetch)?
2. ✅ Is the target repository in the workflow's `allowed-repos` list?
3. ✅ Is `features.difc-proxy: true` enabled in this workflow's configuration?
4. ✅ Am I NOT trying to contact any external AI service API?
Loading