feat(github-mcp): add 89-op parity case study, harness, and full-toolset adapter#41
feat(github-mcp): add 89-op parity case study, harness, and full-toolset adapter#41mickdarling wants to merge 1 commit into
Conversation
…set adapter This commit lands three deliverables produced during the GitHub MCP parity case study (worked example for MCP-AQL): 1. CASE-STUDY.md — measured comparison of the official GitHub MCP server vs the MCP-AQL adapter wrapping it. Headlines: 98.3% reduction in tool-discovery overhead (271,152 → 4,591 bytes), 89/89 operations behaviorally equivalent under the parity harness, 0 unprovoked adapter failures. Reproducibility commands included. 2. parity-harness/ — the GitHub-specific test Suite for the generic mcpaql-parity runner (PR'd at MCPAQL/tools#20). Defines 89 operation arg builders, throwaway-repo fixture lifecycle (create/delete via gh CLI, plus paired gists, discussion-target lookup, workflow-id resolution, etc.), and the Suite wiring. README explains the pattern so others can write suites for their own adapters. 3. generated/github-mcp-all/ — discovery capture, generated schema, and generated adapter for the full GitHub MCP toolset (89 ops, X-MCP-Toolsets: all). Mirrors the existing generated/github-mcp/ layout. Built from the now-fixed adapter-generator (header propagation patch, MCPAQL/adapter-generator#32) — no manual edits. Also: WEBSITE-POST.md is the marketing-shaped writeup that drives the MCP-AQL website case-studies page (PR'd at MCPAQL/website#31). Two .gitignore additions in subdirs to keep node_modules, dist, and local-run artifacts (fixtures-*.json, parity-*.json) out of the tree without affecting the repo-root .gitignore. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 4b5aa40807
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| "danger_level": "safe", | ||
| "requires_confirmation": false, | ||
| "non_idempotent": false |
There was a problem hiding this comment.
Do not mark notification mutations read-only
When mark_all_notifications_read is exposed from the read bucket with these safety flags, server.ts advertises the entire mcp_aql_read tool with readOnlyHint for endpoint === "read". Calling this operation without owner/repo marks all notifications for the authenticated user as read, so clients or approval policies that allow read-only tools can mutate real user state; please move it to a mutating endpoint and mark it non-idempotent/destructive accordingly.
Useful? React with 👍 / 👎.
| { | ||
| name: "get_copilot_job_status", | ||
| category: "COPILOT", | ||
| args: () => ({ owner: "mickdarling", repo: "mcpaql-parity-test-fake", session_id: "fake" }), |
There was a problem hiding this comment.
Pass id to Copilot status probe
The generated schema for get_copilot_job_status requires an id parameter, but this fixture sends session_id. When the COPILOT category is enabled, both sides will exercise the upstream missing-id validation path instead of the intended no-job 404 path described by the note, letting the report claim parity without testing the actual operation behavior.
Useful? React with 👍 / 👎.
…ink check Addresses review feedback on PR #31 plus the Codex P2 finding about broken receipt URLs. Five things in one commit: 1. CONTENT REWRITES (case-study HTML) - "Why The Adapter Footprint Stays Flat": split the wall of text into two side-by-side cards (adapter / official) plus a callout. Each side gets a bullet list of properties. - "What The First Pass Missed": chunked into three short cards (the mistake / what was missing / the fix), with the 15 missing toolset categories laid out in a 3-column bullet grid instead of buried in prose. - "A Generator Bug Surfaced By The Test Harness": split into two cards (the bug / the fix) plus the callout about why parity testing matters. - "The Harness Is Now A Public Tool": restructured into two cards. First lists all five MCPAQL CLIs as bullets with one-line descriptions. Second shows the parity-runner usage pattern with the load-bearing "harness cannot lie about coverage" property called out as bullets. - NEW SECTION: "Per-Case Detail (All 32 Non-Equivalent Outcomes)". Replaces the earlier card summaries with explicit per-case listings of every BOTH_ERROR (12), SKIPPED (17, in four sub-buckets by reason), DIVERGENT (2), and the one paired-fixture timing edge case. Designed to allay any "are we hand-waving" concerns by naming each case and its root cause. 2. RECEIPT URL FIXES (3 broken GitHub links Codex flagged) - MCPAQL/tools — was returning 404 (private repo). Now public. - MCPAQL/examples/blob/main/...CASE-STUDY.md — was 404 (path didn't exist on main). Now points at /tree/develop/ where the case-study work lives via PR #41. - MCPAQL/examples/tree/main/.../parity-harness — same fix. 3. LYCHEE LINK-CHECK HARDENING (.lychee.toml) The existing config had `offline = true`, which silently skipped ALL external URL validation. That's why three CI runs on this PR passed despite linking to dead URLs. Codex caught what our own workflow couldn't. Now: offline = false, with cache + accept = [200, 203, 206, 429] to keep CI fast and tolerate rate-limit responses on flaky hosts. The workflow already passes GITHUB_TOKEN so authenticated GitHub URLs don't 429 under load. 4. SEARCH INDEX REGEN (public/data/search-index.json) Codex P2 finding: I added the case-study entry to source/search-index.base.json but the deployed search.html fetches public/data/search-index.json (which only the doc generator produces). Without regenerating the deployed JSON, the search entry never shipped. Re-ran the generator; the case-study entry now appears in the deployed artifact. 5. NOT INCLUDED (deliberate) - public/spec/research/adversarial-input-security.html: incidental new mirror page from running the doc generator. Unrelated to this PR; should land via the regular spec-sync cycle. - Spec/adapter-reference mirror page diffs: same reason as above. Companion PRs (so the receipts links resolve): - MCPAQL/tools#20 — adds mcpaql-parity to the public toolchain - MCPAQL/examples#41 — pushes CASE-STUDY.md, parity-harness/, and the regenerated github-mcp-all/ adapter Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ilure Two P2 findings from Codex review on #20 plus one adjacent log-guard correction surfaced by the header-merge fix. 1. Schema headers no longer silently drop suite-supplied headers. The previous `schema.headers ?? suite.upstreamHeaders` meant any captured-at-discovery schema header — even one — discarded the suite.upstreamHeaders contract entirely. Merged instead, with schema headers taking precedence (captured-at-discovery remains authoritative when the same key appears on both sides). 2. Client connections moved inside the try block that wraps the ops loop. Previously, if setupFixtures() succeeded but official.connect() or mcpaql.connect() threw, execution never reached the try/finally and fixture state was leaked. Now both connects run inside the try, and connection cleanup tracks officialConnected / mcpaqlConnected booleans so close() only runs against clients that actually connected. 3. (Adjacent) `if (officialExtraHeaders)` log guard updated to `Object.keys(officialExtraHeaders).length > 0` because the merge fix above makes officialExtraHeaders always defined (empty object when neither side supplies headers). Build clean, 23/23 existing tests still pass. No new tests added in this PR — pure-function tests for normalize/maskVariantTokens/classify are reasonable follow-up work, and end-to-end is exercised by the GitHub MCP parity harness in MCPAQL/examples#41. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
Lands the GitHub MCP parity case study and supporting artifacts. Three deliverables:
generated/github-mcp/CASE-STUDY.md— measured comparison of the official GitHub MCP server vs the MCP-AQL adapter wrapping it. Headlines: 98.3% reduction in tool-discovery overhead (271,152 → 4,591 bytes), 89/89 operations behaviorally equivalent, 0 unprovoked adapter failures. Reproducibility commands included.generated/github-mcp/parity-harness/— the GitHub-specific test Suite for the genericmcpaql-parityrunner (MCPAQL/tools#20). Defines 89 operation arg builders, throwaway-repo fixture lifecycle, paired gists, public-discussion lookup, workflow-id resolution, and the Suite wiring. README explains the pattern so others can write suites for their own adapters.generated/github-mcp-all/— discovery capture, generated schema, and generated adapter for the full GitHub MCP toolset (89 ops,X-MCP-Toolsets: all). Mirrors the existinggenerated/github-mcp/layout. Built from the now-fixed adapter-generator (header propagation patch, MCPAQL/adapter-generator#32) — no manual edits.WEBSITE-POST.mdis the marketing-shaped writeup that drives the MCP-AQL website case-studies page (MCPAQL/website#31).What's not in this PR
mcpaql-parityrunner itself lives inMCPAQL/tools(PR #20). This PR just consumes it as a library.generated/airpods-mcp/andgenerated/shortcut-remote-mcp/artifacts I see locally are someone else's work-in-progress on a different branch — not part of this PR.Test plan
OperationSpec<F>interface.gitignoreadditions in subdirs keepnode_modules,dist, and local-run artifacts (fixtures-*.json,parity-*.json) out of the treeRelated PRs
mcpaql-parityrunner🤖 Generated with Claude Code