Skip to content

feat(github-mcp): add 89-op parity case study, harness, and full-toolset adapter#41

Open
mickdarling wants to merge 1 commit into
developfrom
feat/github-mcp-case-study
Open

feat(github-mcp): add 89-op parity case study, harness, and full-toolset adapter#41
mickdarling wants to merge 1 commit into
developfrom
feat/github-mcp-case-study

Conversation

@mickdarling
Copy link
Copy Markdown
Contributor

Summary

Lands the GitHub MCP parity case study and supporting artifacts. Three deliverables:

  1. generated/github-mcp/CASE-STUDY.md — measured comparison of the official GitHub MCP server vs the MCP-AQL adapter wrapping it. Headlines: 98.3% reduction in tool-discovery overhead (271,152 → 4,591 bytes), 89/89 operations behaviorally equivalent, 0 unprovoked adapter failures. Reproducibility commands included.

  2. generated/github-mcp/parity-harness/ — the GitHub-specific test Suite for the generic mcpaql-parity runner (MCPAQL/tools#20). Defines 89 operation arg builders, throwaway-repo fixture lifecycle, paired gists, public-discussion lookup, workflow-id resolution, and the Suite wiring. README explains the pattern so others can write suites for their own adapters.

  3. generated/github-mcp-all/ — discovery capture, generated schema, and generated adapter for the full GitHub MCP toolset (89 ops, X-MCP-Toolsets: all). Mirrors the existing generated/github-mcp/ layout. Built from the now-fixed adapter-generator (header propagation patch, MCPAQL/adapter-generator#32) — no manual edits.

WEBSITE-POST.md is the marketing-shaped writeup that drives the MCP-AQL website case-studies page (MCPAQL/website#31).

What's not in this PR

  • The mcpaql-parity runner itself lives in MCPAQL/tools (PR #20). This PR just consumes it as a library.
  • The generated/airpods-mcp/ and generated/shortcut-remote-mcp/ artifacts I see locally are someone else's work-in-progress on a different branch — not part of this PR.

Test plan

  • Parity harness builds clean
  • All 89 operation arg builders compile against the runner's OperationSpec<F> interface
  • End-to-end run produces the documented totals (43 IDENTICAL, 14 STRUCTURAL_PARITY, 12 BOTH_ERROR, 17 SKIPPED, 2 DIVERGENT, 1 paired-fixture timing edge case, 0 unprovoked adapter failures)
  • Two .gitignore additions in subdirs keep node_modules, dist, and local-run artifacts (fixtures-*.json, parity-*.json) out of the tree

Related PRs

🤖 Generated with Claude Code

…set adapter

This commit lands three deliverables produced during the GitHub MCP
parity case study (worked example for MCP-AQL):

1. CASE-STUDY.md — measured comparison of the official GitHub MCP
   server vs the MCP-AQL adapter wrapping it. Headlines: 98.3%
   reduction in tool-discovery overhead (271,152 → 4,591 bytes), 89/89
   operations behaviorally equivalent under the parity harness, 0
   unprovoked adapter failures. Reproducibility commands included.

2. parity-harness/ — the GitHub-specific test Suite for the generic
   mcpaql-parity runner (PR'd at MCPAQL/tools#20). Defines 89 operation
   arg builders, throwaway-repo fixture lifecycle (create/delete via
   gh CLI, plus paired gists, discussion-target lookup, workflow-id
   resolution, etc.), and the Suite wiring. README explains the
   pattern so others can write suites for their own adapters.

3. generated/github-mcp-all/ — discovery capture, generated schema,
   and generated adapter for the full GitHub MCP toolset (89 ops,
   X-MCP-Toolsets: all). Mirrors the existing generated/github-mcp/
   layout. Built from the now-fixed adapter-generator (header
   propagation patch, MCPAQL/adapter-generator#32) — no manual edits.

Also: WEBSITE-POST.md is the marketing-shaped writeup that drives the
MCP-AQL website case-studies page (PR'd at MCPAQL/website#31).

Two .gitignore additions in subdirs to keep node_modules, dist, and
local-run artifacts (fixtures-*.json, parity-*.json) out of the tree
without affecting the repo-root .gitignore.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4b5aa40807

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +1577 to +1579
"danger_level": "safe",
"requires_confirmation": false,
"non_idempotent": false
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Do not mark notification mutations read-only

When mark_all_notifications_read is exposed from the read bucket with these safety flags, server.ts advertises the entire mcp_aql_read tool with readOnlyHint for endpoint === "read". Calling this operation without owner/repo marks all notifications for the authenticated user as read, so clients or approval policies that allow read-only tools can mutate real user state; please move it to a mutating endpoint and mark it non-idempotent/destructive accordingly.

Useful? React with 👍 / 👎.

{
name: "get_copilot_job_status",
category: "COPILOT",
args: () => ({ owner: "mickdarling", repo: "mcpaql-parity-test-fake", session_id: "fake" }),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Pass id to Copilot status probe

The generated schema for get_copilot_job_status requires an id parameter, but this fixture sends session_id. When the COPILOT category is enabled, both sides will exercise the upstream missing-id validation path instead of the intended no-job 404 path described by the note, letting the report claim parity without testing the actual operation behavior.

Useful? React with 👍 / 👎.

mickdarling added a commit to MCPAQL/website that referenced this pull request May 14, 2026
…ink check

Addresses review feedback on PR #31 plus the Codex P2 finding about
broken receipt URLs. Five things in one commit:

1. CONTENT REWRITES (case-study HTML)

   - "Why The Adapter Footprint Stays Flat": split the wall of text
     into two side-by-side cards (adapter / official) plus a callout.
     Each side gets a bullet list of properties.

   - "What The First Pass Missed": chunked into three short cards
     (the mistake / what was missing / the fix), with the 15 missing
     toolset categories laid out in a 3-column bullet grid instead of
     buried in prose.

   - "A Generator Bug Surfaced By The Test Harness": split into two
     cards (the bug / the fix) plus the callout about why parity
     testing matters.

   - "The Harness Is Now A Public Tool": restructured into two cards.
     First lists all five MCPAQL CLIs as bullets with one-line
     descriptions. Second shows the parity-runner usage pattern with
     the load-bearing "harness cannot lie about coverage" property
     called out as bullets.

   - NEW SECTION: "Per-Case Detail (All 32 Non-Equivalent Outcomes)".
     Replaces the earlier card summaries with explicit per-case
     listings of every BOTH_ERROR (12), SKIPPED (17, in four
     sub-buckets by reason), DIVERGENT (2), and the one
     paired-fixture timing edge case. Designed to allay any "are we
     hand-waving" concerns by naming each case and its root cause.

2. RECEIPT URL FIXES (3 broken GitHub links Codex flagged)

   - MCPAQL/tools — was returning 404 (private repo). Now public.
   - MCPAQL/examples/blob/main/...CASE-STUDY.md — was 404 (path
     didn't exist on main). Now points at /tree/develop/ where the
     case-study work lives via PR #41.
   - MCPAQL/examples/tree/main/.../parity-harness — same fix.

3. LYCHEE LINK-CHECK HARDENING (.lychee.toml)

   The existing config had `offline = true`, which silently skipped
   ALL external URL validation. That's why three CI runs on this PR
   passed despite linking to dead URLs. Codex caught what our own
   workflow couldn't.

   Now: offline = false, with cache + accept = [200, 203, 206, 429]
   to keep CI fast and tolerate rate-limit responses on flaky hosts.
   The workflow already passes GITHUB_TOKEN so authenticated GitHub
   URLs don't 429 under load.

4. SEARCH INDEX REGEN (public/data/search-index.json)

   Codex P2 finding: I added the case-study entry to
   source/search-index.base.json but the deployed search.html fetches
   public/data/search-index.json (which only the doc generator
   produces). Without regenerating the deployed JSON, the search
   entry never shipped. Re-ran the generator; the case-study entry
   now appears in the deployed artifact.

5. NOT INCLUDED (deliberate)

   - public/spec/research/adversarial-input-security.html: incidental
     new mirror page from running the doc generator. Unrelated to
     this PR; should land via the regular spec-sync cycle.
   - Spec/adapter-reference mirror page diffs: same reason as above.

Companion PRs (so the receipts links resolve):
- MCPAQL/tools#20 — adds mcpaql-parity to the public toolchain
- MCPAQL/examples#41 — pushes CASE-STUDY.md, parity-harness/, and
  the regenerated github-mcp-all/ adapter

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
mickdarling added a commit to MCPAQL/tools that referenced this pull request May 26, 2026
…ilure

Two P2 findings from Codex review on #20 plus one adjacent log-guard
correction surfaced by the header-merge fix.

1. Schema headers no longer silently drop suite-supplied headers.
   The previous `schema.headers ?? suite.upstreamHeaders` meant any
   captured-at-discovery schema header — even one — discarded the
   suite.upstreamHeaders contract entirely. Merged instead, with
   schema headers taking precedence (captured-at-discovery remains
   authoritative when the same key appears on both sides).

2. Client connections moved inside the try block that wraps the
   ops loop. Previously, if setupFixtures() succeeded but
   official.connect() or mcpaql.connect() threw, execution never
   reached the try/finally and fixture state was leaked. Now both
   connects run inside the try, and connection cleanup tracks
   officialConnected / mcpaqlConnected booleans so close() only
   runs against clients that actually connected.

3. (Adjacent) `if (officialExtraHeaders)` log guard updated to
   `Object.keys(officialExtraHeaders).length > 0` because the merge
   fix above makes officialExtraHeaders always defined (empty object
   when neither side supplies headers).

Build clean, 23/23 existing tests still pass. No new tests added in
this PR — pure-function tests for normalize/maskVariantTokens/classify
are reasonable follow-up work, and end-to-end is exercised by the
GitHub MCP parity harness in MCPAQL/examples#41.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant