Skip to content

Add Claude Code marketplace configuration#10

Closed
hwittenborn wants to merge 3 commits into
garrytan:mainfrom
hwittenborn:marketplace-install
Closed

Add Claude Code marketplace configuration#10
hwittenborn wants to merge 3 commits into
garrytan:mainfrom
hwittenborn:marketplace-install

Conversation

@hwittenborn

Copy link
Copy Markdown

This allows users to install and update gstack through the native Claude Code plugin system:

/plugin marketplace add garrytan/gstack
/plugin install gstack@gstack-marketplace
  • Adds .claude-plugin/ marketplace and plugin manifests
  • Moves skills into skills/ directory for plugin auto-discovery
  • Uses ${CLAUDE_SKILL_DIR} for portable paths
  • Removes (now unneeded) setup script

This allows users to install and update gstack through the native Claude Code plugin system:

```
/plugin marketplace add garrytan/gstack
/plugin install gstack@gstack-marketplace
```

- Add `.claude-plugin/` marketplace and plugin manifests
- Move skills into `skills/` directory for plugin auto-discovery
- Use `${CLAUDE_SKILL_DIR}` for portable paths
- Removes (now unneeded) setup script

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings March 12, 2026 16:18

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds Claude Code plugin marketplace support for gstack by introducing .claude-plugin manifests, reorganizing skills for auto-discovery under skills/, and updating the browse CLI/build/test setup to match the new layout.

Changes:

  • Add .claude-plugin/plugin.json and .claude-plugin/marketplace.json for marketplace install/update.
  • Move skills under skills/ and update skill docs to use ${CLAUDE_SKILL_DIR} for portable paths.
  • Remove legacy setup/skill entrypoint files and update Bun build scripts + ignore patterns for the new skills/browse location.

Reviewed changes

Copilot reviewed 9 out of 30 changed files in this pull request and generated no comments.

Show a summary per file
File Description
skills/ship/SKILL.md Updates review-checklist path to use ${CLAUDE_SKILL_DIR}-relative location.
skills/review/checklist.md Adds pre-landing review checklist content under the new skills layout.
skills/review/SKILL.md Updates checklist read path to ${CLAUDE_SKILL_DIR}/checklist.md.
skills/retro/SKILL.md Adds new /retro skill for weekly engineering retrospective workflow.
skills/plan-eng-review/SKILL.md Adds new /plan-eng-review skill content.
skills/plan-ceo-review/SKILL.md Adds new /plan-ceo-review skill content.
skills/browse/SKILL.md Updates browse skill setup/build instructions and binary pathing for plugin installs.
skills/browse/src/buffers.ts Introduces shared console/network buffers (ring buffer + counters).
skills/browse/src/browser-manager.ts Adds Playwright lifecycle, tab management, ref-map, and console/network capture.
skills/browse/src/read-commands.ts Implements read-only browse commands (text/html/links/forms/etc).
skills/browse/src/write-commands.ts Implements mutating browse commands (goto/click/fill/etc).
skills/browse/src/meta-commands.ts Implements meta commands (tabs/status/screenshot/chain/diff/snapshot).
skills/browse/src/snapshot.ts Implements snapshot ref-based interaction mapping from ariaSnapshot.
skills/browse/src/server.ts Adds Bun HTTP server routing + auth + buffer flushing + idle shutdown.
skills/browse/src/cli.ts Adds CLI wrapper that starts/health-checks server and dispatches commands.
skills/browse/test/test-server.ts Adds local Bun fixture server for integration tests.
skills/browse/test/commands.test.ts Adds broad integration coverage for browse commands and buffers.
skills/browse/test/snapshot.test.ts Adds snapshot/ref-resolution tests.
skills/browse/test/fixtures/basic.html Adds basic HTML fixture for tests.
skills/browse/test/fixtures/forms.html Adds forms fixture for fill/select/forms discovery.
skills/browse/test/fixtures/responsive.html Adds responsive fixture for viewport/responsive screenshots.
skills/browse/test/fixtures/snapshot.html Adds snapshot fixture for aria snapshot/ref tests.
skills/browse/test/fixtures/spa.html Adds SPA fixture for wait/console/network tests.
package.json Updates bin + scripts to build/run browse from skills/browse/....
.gitignore Updates ignore rule to skills/browse/dist/.
CLAUDE.md Updates repo structure + install instructions to match plugin system.
.claude-plugin/plugin.json Adds plugin manifest for Claude Code plugin system.
.claude-plugin/marketplace.json Adds marketplace manifest for installing gstack via /plugin.
setup Removes legacy setup script (build + symlink registration).
SKILL.md Removes legacy top-level skill entrypoint doc (browse skill now lives under skills/browse/).
Comments suppressed due to low confidence (1)

skills/browse/SKILL.md:40

  • The build command is run from ${CLAUDE_SKILL_DIR}, but that directory does not contain package.json/dependencies (they live at the repo root). As written, bun install/bun build will fail or install a second, isolated node_modules. Update the instructions to run from the repository root (or invoke bun build with absolute paths while staying in the root) so Playwright/diff dependencies resolve correctly.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Replace git clone + setup instructions with /plugin marketplace commands.
Update troubleshooting, upgrading, and uninstalling sections to use the
plugin system. Note that auto-update is off by default for third-party
marketplaces and how to enable it.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 10 out of 31 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

All browse/ references updated to skills/browse/ to match the new
plugin layout. Remove old "deploying to active skill" section that
referenced the manual install flow.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 11 out of 32 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@garrytan

Copy link
Copy Markdown
Owner

Closing — the Claude Code marketplace plugin system isn't something we're adopting. The current setup script + skill symlink approach works and we don't want to restructure the repo layout.

@garrytan garrytan closed this Mar 13, 2026
@hwittenborn

Copy link
Copy Markdown
Author

Is the issue with backwards compatibility, or that you just don't want to adopt the marketplace layout in general? I do think a marketplace will help users install much more easily, since that's how users are already updating the rest of their skills/marketplaces/etc.

@vasiliyk

Copy link
Copy Markdown

@dependabot close

garrytan added a commit that referenced this pull request Mar 22, 2026
…language coverage

- Exclusion #10: test files must verify not imported by non-test code
- Exclusion #13: distinguish user-message AI input from system-prompt injection
- Exclusion #14: ReDoS in user-input regex IS a real CVE class, don't exclude
- Add anti-manipulation rule: ignore audit-influencing instructions in codebase
- Fix confidence gate: remove contradictory 7-8 tier, hard cutoff at 8
- Fix verifier anchoring: send only file+line, not category/description
- Add Go, PHP, Java, C#, Kotlin to grep patterns (was 4 languages, now 8)
- Add GraphQL, gRPC, WebSocket endpoint detection to attack surface mapping

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
garrytan added a commit that referenced this pull request Mar 22, 2026
* feat: add /cso skill — OWASP Top 10 + STRIDE security audit

* fix: harden gstack-slug against shell injection via eval

Whitelist safe characters (a-zA-Z0-9._-) in SLUG and BRANCH output
to prevent shell metacharacter injection when used with eval.

Only affects self-hosted git servers with lax naming rules — GitHub
and GitLab enforce safe characters already. Defense-in-depth.

* fix(security): sanitize gstack-slug output against shell injection

The gstack-slug script is consumed via eval $(gstack-slug) throughout
skill templates. If a git remote URL contains shell metacharacters
like $(), backticks, or semicolons, they would be executed by eval.

Fix: strip all characters except [a-zA-Z0-9._-] from both SLUG and
BRANCH before output. This preserves normal values while neutralizing
any injection payload in malicious remote URLs.

Before: eval $(gstack-slug) with remote "foo/bar$(rm -rf /)" → executes rm
After:  eval $(gstack-slug) with remote "foo/bar$(rm -rf /)" → SLUG=foo-barrm-rf-

* fix(security): redact sensitive values in storage command output

The browse `storage` command dumps all localStorage and sessionStorage
as JSON. This can expose tokens, API keys, JWTs, and session credentials
in QA reports and agent transcripts.

Fix: redact values where the key matches sensitive patterns (token,
secret, key, password, auth, jwt, csrf) or the value starts with known
credential prefixes (eyJ for JWT, sk- for Stripe, ghp_ for GitHub, etc.).

Redacted values show length to aid debugging: [REDACTED — 128 chars]

* fix(browse): kill old server before restart to prevent orphaned chromium processes

When the health check fails or the server connection drops, `ensureServer()`
and `sendCommand()` would call `startServer()` without first killing the
previous server process. This left orphaned `chrome-headless-shell` renderer
processes running at ~120% CPU each.

After several reconnect cycles (e.g. pages that crash during hydration or
trigger hard navigations via `window.location.href`), dozens of zombie
chromium processes accumulate and exhaust system resources.

Fix: call `killServer()` on the stale PID before spawning a new server in
both the `ensureServer()` unhealthy path and the `sendCommand()` connection-
lost retry path.

Fixes #294

* Fix YAML linter error: nested mapping in compact sequence entries

Having "Run: bun" inside a plain scalar is not allowed per YAML spec which states: Plain scalars must never contain the “: ” and “ #” character combinations.

This simple fix switches to block scalars (|) to eliminate the ambiguity without changing runtime behavior.

* fix(security): add Azure metadata endpoint to SSRF blocklist

Add metadata.azure.internal to BLOCKED_METADATA_HOSTS alongside the
existing AWS/GCP endpoints. Closes the coverage gap identified in #125.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* test: add coverage for storage redaction

Test key-based redaction (auth_token, api_key), value-based redaction
(JWT prefix, GitHub PAT prefix), pass-through for normal keys, and
length preservation in redacted output.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: add community PR triage process to CONTRIBUTING.md

Document the wave-based PR triage pattern used for batching community
contributions. References PR #205 (v0.8.3) as the original example.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: adjust test key names to avoid redaction pattern collision

Rename testKey→testData and normalKey→displayName in storage tests
to avoid triggering #238's SENSITIVE_KEY regex (which matches 'key').
Also generate Codex variant of /cso skill.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: update project documentation for v0.9.10.0

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: zero-noise /cso security audits with FP filtering (v0.11.0.0)

Absorb Anthropic's security-review false positive filtering into /cso:
- 17 hard exclusions (DOS, test files, log spoofing, SSRF path-only,
  regex injection, race conditions unless concrete, etc.)
- 9 precedents (React XSS-safe, env vars trusted, client-side code
  doesn't need auth, shell scripts need concrete untrusted input path)
- 8/10 confidence gate — below threshold = don't report
- Independent sub-agent verification for each finding
- Exploit scenario requirement per finding
- Framework-aware analysis (Rails CSRF, React escaping, Angular sanitization)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: consolidate CHANGELOG — merge /cso launch + community wave into v0.11.0.0

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: rewrite README — lead with Karpathy quote, cut LinkedIn phrases, add /cso

Opens with the revolution (Karpathy, Steinberger/OpenClaw), keeps credentials
and LOC numbers, cuts filler phrases, adds hater bait, restores hiring block,
removes bloated "What's new" section, adds /cso to skills table and install.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(cso): adversarial review fixes — FP filtering, prompt injection, language coverage

- Exclusion #10: test files must verify not imported by non-test code
- Exclusion #13: distinguish user-message AI input from system-prompt injection
- Exclusion #14: ReDoS in user-input regex IS a real CVE class, don't exclude
- Add anti-manipulation rule: ignore audit-influencing instructions in codebase
- Fix confidence gate: remove contradictory 7-8 tier, hard cutoff at 8
- Fix verifier anchoring: send only file+line, not category/description
- Add Go, PHP, Java, C#, Kotlin to grep patterns (was 4 languages, now 8)
- Add GraphQL, gRPC, WebSocket endpoint detection to attack surface mapping

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(docs): correct skill counts, add /autoplan to README tables

Skill count was wrong in 3 places (said 19+7=26, said 25, actual is 28).
Added /autoplan to specialist table. Fixed troubleshooting skills list
to include all skills added since v0.7.0.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(browse): DNS rebinding protection for SSRF blocklist

validateNavigationUrl is now async — resolves hostname to IP and checks
against blocked metadata IPs. Prevents DNS rebinding where evil.com
initially resolves to a safe IP, then switches to 169.254.169.254.
All callers updated to await. Tests updated for async assertions.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(browse): lockfile prevents concurrent server start races

Adds exclusive lockfile (O_CREAT|O_EXCL) around ensureServer to prevent
TOCTOU race where two CLI invocations could both kill the old server and
start new ones, leaving an orphaned chromium process. Second caller now
waits for the first to finish starting.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(browse): improve storage redaction — word-boundary keys + more value prefixes

Key regex: use underscore/dot/hyphen boundaries instead of \b (which treats
_ as word char). Now correctly redacts auth_token, session_token while
skipping keyboardShortcuts, monkeyPatch, primaryKey.

Value regex: add AWS (AKIA), Stripe (sk_live_, pk_live_), Anthropic (sk-ant-),
Google (AIza), Sendgrid (SG.), Supabase (sbp_) prefixes.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: migrate all remaining eval callers to source, fix stale CHANGELOG claim

5 templates and 2 bin scripts still used eval $(gstack-slug). All now use
source <(gstack-slug). Updated gstack-slug comment to match. Fixed v0.8.3
CHANGELOG entry that falsely claimed eval was fully eliminated — it was
the output sanitization that made it safe, not a calling convention change.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(docs): add /autoplan to install instructions, regen skill docs

The install instruction blocks and troubleshooting section were missing
/autoplan. All three skill list locations now include the complete 28-skill
set. Regenerated codex/agents SKILL.md files to match template changes.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: update project documentation for v0.11.0.0

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* docs(cso): add disclaimer — not a substitute for professional security audits

LLMs can miss subtle vulns and produce false negatives. For production
systems with sensitive data, hire a real firm. /cso is a first pass,
not your only line of defense. Disclaimer appended to every report.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Arun Kumar Thiagarajan <arunkt.bm14@gmail.com>
Co-authored-by: Tyrone Robb <tyrone.robb@icloud.com>
Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: Orkun Duman <orkun1675@gmail.com>
rafiulnakib pushed a commit to rafiulnakib/gstack that referenced this pull request Mar 26, 2026
…scanner limitations

Module split: scan-imports.ts (1252 lines) → scanner/{core,aliases,routes,dead-code,css,monorepo,non-ts}.ts
Fixes: garrytan#1 non-TS file discovery, garrytan#2 Vite AST alias parsing, garrytan#3 React Router AST route discovery,
garrytan#4 dynamic import tracking, garrytan#5 configurable max depth, garrytan#6 git frequency fallback,
garrytan#7 MEGA depth cap CLI flag, garrytan#8 dead code false positive reduction, garrytan#9 CSS import graphs,
garrytan#10 monorepo auto-detection and multi-root scanning.
rapidstartup pushed a commit to rapidstartup/gstack that referenced this pull request Mar 29, 2026
* feat: add /cso skill — OWASP Top 10 + STRIDE security audit

* fix: harden gstack-slug against shell injection via eval

Whitelist safe characters (a-zA-Z0-9._-) in SLUG and BRANCH output
to prevent shell metacharacter injection when used with eval.

Only affects self-hosted git servers with lax naming rules — GitHub
and GitLab enforce safe characters already. Defense-in-depth.

* fix(security): sanitize gstack-slug output against shell injection

The gstack-slug script is consumed via eval $(gstack-slug) throughout
skill templates. If a git remote URL contains shell metacharacters
like $(), backticks, or semicolons, they would be executed by eval.

Fix: strip all characters except [a-zA-Z0-9._-] from both SLUG and
BRANCH before output. This preserves normal values while neutralizing
any injection payload in malicious remote URLs.

Before: eval $(gstack-slug) with remote "foo/bar$(rm -rf /)" → executes rm
After:  eval $(gstack-slug) with remote "foo/bar$(rm -rf /)" → SLUG=foo-barrm-rf-

* fix(security): redact sensitive values in storage command output

The browse `storage` command dumps all localStorage and sessionStorage
as JSON. This can expose tokens, API keys, JWTs, and session credentials
in QA reports and agent transcripts.

Fix: redact values where the key matches sensitive patterns (token,
secret, key, password, auth, jwt, csrf) or the value starts with known
credential prefixes (eyJ for JWT, sk- for Stripe, ghp_ for GitHub, etc.).

Redacted values show length to aid debugging: [REDACTED — 128 chars]

* fix(browse): kill old server before restart to prevent orphaned chromium processes

When the health check fails or the server connection drops, `ensureServer()`
and `sendCommand()` would call `startServer()` without first killing the
previous server process. This left orphaned `chrome-headless-shell` renderer
processes running at ~120% CPU each.

After several reconnect cycles (e.g. pages that crash during hydration or
trigger hard navigations via `window.location.href`), dozens of zombie
chromium processes accumulate and exhaust system resources.

Fix: call `killServer()` on the stale PID before spawning a new server in
both the `ensureServer()` unhealthy path and the `sendCommand()` connection-
lost retry path.

Fixes garrytan#294

* Fix YAML linter error: nested mapping in compact sequence entries

Having "Run: bun" inside a plain scalar is not allowed per YAML spec which states: Plain scalars must never contain the “: ” and “ #” character combinations.

This simple fix switches to block scalars (|) to eliminate the ambiguity without changing runtime behavior.

* fix(security): add Azure metadata endpoint to SSRF blocklist

Add metadata.azure.internal to BLOCKED_METADATA_HOSTS alongside the
existing AWS/GCP endpoints. Closes the coverage gap identified in garrytan#125.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* test: add coverage for storage redaction

Test key-based redaction (auth_token, api_key), value-based redaction
(JWT prefix, GitHub PAT prefix), pass-through for normal keys, and
length preservation in redacted output.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: add community PR triage process to CONTRIBUTING.md

Document the wave-based PR triage pattern used for batching community
contributions. References PR garrytan#205 (v0.8.3) as the original example.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: adjust test key names to avoid redaction pattern collision

Rename testKey→testData and normalKey→displayName in storage tests
to avoid triggering garrytan#238's SENSITIVE_KEY regex (which matches 'key').
Also generate Codex variant of /cso skill.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: update project documentation for v0.9.10.0

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: zero-noise /cso security audits with FP filtering (v0.11.0.0)

Absorb Anthropic's security-review false positive filtering into /cso:
- 17 hard exclusions (DOS, test files, log spoofing, SSRF path-only,
  regex injection, race conditions unless concrete, etc.)
- 9 precedents (React XSS-safe, env vars trusted, client-side code
  doesn't need auth, shell scripts need concrete untrusted input path)
- 8/10 confidence gate — below threshold = don't report
- Independent sub-agent verification for each finding
- Exploit scenario requirement per finding
- Framework-aware analysis (Rails CSRF, React escaping, Angular sanitization)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: consolidate CHANGELOG — merge /cso launch + community wave into v0.11.0.0

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: rewrite README — lead with Karpathy quote, cut LinkedIn phrases, add /cso

Opens with the revolution (Karpathy, Steinberger/OpenClaw), keeps credentials
and LOC numbers, cuts filler phrases, adds hater bait, restores hiring block,
removes bloated "What's new" section, adds /cso to skills table and install.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(cso): adversarial review fixes — FP filtering, prompt injection, language coverage

- Exclusion garrytan#10: test files must verify not imported by non-test code
- Exclusion garrytan#13: distinguish user-message AI input from system-prompt injection
- Exclusion garrytan#14: ReDoS in user-input regex IS a real CVE class, don't exclude
- Add anti-manipulation rule: ignore audit-influencing instructions in codebase
- Fix confidence gate: remove contradictory 7-8 tier, hard cutoff at 8
- Fix verifier anchoring: send only file+line, not category/description
- Add Go, PHP, Java, C#, Kotlin to grep patterns (was 4 languages, now 8)
- Add GraphQL, gRPC, WebSocket endpoint detection to attack surface mapping

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(docs): correct skill counts, add /autoplan to README tables

Skill count was wrong in 3 places (said 19+7=26, said 25, actual is 28).
Added /autoplan to specialist table. Fixed troubleshooting skills list
to include all skills added since v0.7.0.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(browse): DNS rebinding protection for SSRF blocklist

validateNavigationUrl is now async — resolves hostname to IP and checks
against blocked metadata IPs. Prevents DNS rebinding where evil.com
initially resolves to a safe IP, then switches to 169.254.169.254.
All callers updated to await. Tests updated for async assertions.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(browse): lockfile prevents concurrent server start races

Adds exclusive lockfile (O_CREAT|O_EXCL) around ensureServer to prevent
TOCTOU race where two CLI invocations could both kill the old server and
start new ones, leaving an orphaned chromium process. Second caller now
waits for the first to finish starting.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(browse): improve storage redaction — word-boundary keys + more value prefixes

Key regex: use underscore/dot/hyphen boundaries instead of \b (which treats
_ as word char). Now correctly redacts auth_token, session_token while
skipping keyboardShortcuts, monkeyPatch, primaryKey.

Value regex: add AWS (AKIA), Stripe (sk_live_, pk_live_), Anthropic (sk-ant-),
Google (AIza), Sendgrid (SG.), Supabase (sbp_) prefixes.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: migrate all remaining eval callers to source, fix stale CHANGELOG claim

5 templates and 2 bin scripts still used eval $(gstack-slug). All now use
source <(gstack-slug). Updated gstack-slug comment to match. Fixed v0.8.3
CHANGELOG entry that falsely claimed eval was fully eliminated — it was
the output sanitization that made it safe, not a calling convention change.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(docs): add /autoplan to install instructions, regen skill docs

The install instruction blocks and troubleshooting section were missing
/autoplan. All three skill list locations now include the complete 28-skill
set. Regenerated codex/agents SKILL.md files to match template changes.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: update project documentation for v0.11.0.0

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* docs(cso): add disclaimer — not a substitute for professional security audits

LLMs can miss subtle vulns and produce false negatives. For production
systems with sensitive data, hire a real firm. /cso is a first pass,
not your only line of defense. Disclaimer appended to every report.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Arun Kumar Thiagarajan <arunkt.bm14@gmail.com>
Co-authored-by: Tyrone Robb <tyrone.robb@icloud.com>
Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: Orkun Duman <orkun1675@gmail.com>
24601 pushed a commit to 24601/gastack that referenced this pull request Mar 29, 2026
…language coverage

- Exclusion garrytan#10: test files must verify not imported by non-test code
- Exclusion garrytan#13: distinguish user-message AI input from system-prompt injection
- Exclusion garrytan#14: ReDoS in user-input regex IS a real CVE class, don't exclude
- Add anti-manipulation rule: ignore audit-influencing instructions in codebase
- Fix confidence gate: remove contradictory 7-8 tier, hard cutoff at 8
- Fix verifier anchoring: send only file+line, not category/description
- Add Go, PHP, Java, C#, Kotlin to grep patterns (was 4 languages, now 8)
- Add GraphQL, gRPC, WebSocket endpoint detection to attack surface mapping

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
garrytan added a commit that referenced this pull request May 6, 2026
artifacts-init replaces brain-init with provider choice (gh / glab /
manual), per-user gstack-artifacts-$USER repo, HTTPS-canonical storage in
~/.gstack-artifacts-remote.txt, and a "send this to your brain admin"
hookup printout. Always prints the command, never auto-executes — gbrain
v0.26.x has no admin-scope MCP probe (codex Finding #3).

artifacts-url centralizes HTTPS↔SSH/host/owner-repo conversion so callers
don't each string-mangle (codex Finding #10). The remote-conflict check in
artifacts-init compares at the canonical level so re-running with HTTPS
input doesn't trip on a stored SSH URL for the same logical repo.

The "URL form not supported" branch prints a two-line clone-then-path
form for gbrain v0.26.x; the supported branch is a one-liner with --url
ready for when gbrain ships URL ingest.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
garrytan added a commit that referenced this pull request May 7, 2026
… rename (#1351)

* feat: gstack-gbrain-mcp-verify helper for remote MCP probe

Probes a remote gbrain MCP endpoint with bearer auth. POSTs initialize,
classifies failures into NETWORK / AUTH / MALFORMED with one-line
remediation hints, and runs a tools/list capability probe to detect
sources_add MCP support (forward-compat for when gbrain ships URL ingest).

Token consumed from GBRAIN_MCP_TOKEN env, never argv. Required to set
both 'application/json' AND 'text/event-stream' in Accept; that gotcha
costs 10 minutes of debugging when missed (regression-tested).

Live-verified against wintermute (gbrain v0.27.1).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat: gstack-artifacts-init + gstack-artifacts-url helpers

artifacts-init replaces brain-init with provider choice (gh / glab /
manual), per-user gstack-artifacts-$USER repo, HTTPS-canonical storage in
~/.gstack-artifacts-remote.txt, and a "send this to your brain admin"
hookup printout. Always prints the command, never auto-executes — gbrain
v0.26.x has no admin-scope MCP probe (codex Finding #3).

artifacts-url centralizes HTTPS↔SSH/host/owner-repo conversion so callers
don't each string-mangle (codex Finding #10). The remote-conflict check in
artifacts-init compares at the canonical level so re-running with HTTPS
input doesn't trip on a stored SSH URL for the same logical repo.

The "URL form not supported" branch prints a two-line clone-then-path
form for gbrain v0.26.x; the supported branch is a one-liner with --url
ready for when gbrain ships URL ingest.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat: extend gstack-gbrain-detect with mcp_mode + artifacts_remote

Adds two new fields to detect's JSON output:

- gbrain_mcp_mode: local-stdio | remote-http | none
  Resolved via 3-tier fallback (codex Finding D3): claude mcp get --json
  → claude mcp list text-grep → ~/.claude.json jq read. If Anthropic moves
  the file format, the first two tiers absorb it.

- gstack_artifacts_remote: HTTPS URL from ~/.gstack-artifacts-remote.txt
  Falls back to ~/.gstack-brain-remote.txt during the v1.27.0.0 migration
  window so detect doesn't return empty between upgrade and migration.

Existing detect tests still pass (15/15). New 19 tests cover every fallback
tier independently, plus a schema regression for /sync-gbrain compat.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat: setup-gbrain Path 4 (remote MCP) + artifacts rename

Path 4 lets users paste an HTTPS MCP URL + bearer token and registers it
as an HTTP-transport MCP without needing a local gbrain CLI install. The
flow:

- Step 2 gains a fourth option (Remote gbrain MCP)
- Step 4 adds Path 4 sub-flow: collect URL, secret-read bearer, verify
  via gstack-gbrain-mcp-verify (NETWORK / AUTH / MALFORMED classifier)
- Step 5 (local doctor), Step 7.5 (transcript ingest), Step 5a's stdio
  branch all skip on Path 4
- Step 5a adds an HTTP+bearer registration form: claude mcp add
  --transport http --header "Authorization: Bearer ..."
- Step 7 renamed "session memory sync" → "artifacts sync" and now calls
  gstack-artifacts-init (which always prints the brain-admin hookup
  command — no auto-execute, codex Finding #3)
- Step 8 CLAUDE.md block branches: remote-http includes URL + server
  version (never the token); local-stdio keeps engine + config-file
- Step 9 smoke test on Path 4 prints the curl-equivalent for
  post-restart verification (MCP tools aren't visible mid-session)
- Step 10 verdict block has separate templates per mode

Idempotency: re-running with gbrain_mcp_mode=remote-http already in
detect output skips Step 2 entirely and goes to verification.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor: rename gbrain_sync_mode → artifacts_sync_mode (v1.27.0.0 prep)

Hard rename, no dual-read alias (codex Finding D4). The on-disk migration
script (Phase C, separate commit) renames the config key in users'
~/.gstack/config.yaml and any CLAUDE.md blocks.

Touched call sites:
- bin/gstack-config defaults + validation + list/defaults output
- bin/gstack-gbrain-detect (gstack_brain_sync_mode field still emitted
  with the same name for downstream-tool compat; reads new key)
- bin/gstack-brain-sync, bin/gstack-brain-enqueue, bin/gstack-brain-uninstall
- bin/gstack-timeline-log (comment ref)
- scripts/resolvers/preamble/generate-brain-sync-block.ts: renames key,
  branches on gbrain_mcp_mode=remote-http to emit "ARTIFACTS_SYNC:
  remote-mode (managed by brain server <host>)" instead of the local
  mode/queue/last_push line (codex Finding #11)
- bin/gstack-brain-restore + bin/gstack-gbrain-source-wireup: read
  ~/.gstack-artifacts-remote.txt with ~/.gstack-brain-remote.txt fallback
  during the migration window
- bin/gstack-artifacts-init: tolerant of unrecognized URL forms (local
  paths, file://, self-hosted gitea) so test infrastructure and unusual
  remotes work without canonicalization
- test/brain-sync.test.ts: gstack-brain-init → gstack-artifacts-init
- test/skill-e2e-brain-privacy-gate.test.ts: artifacts_sync_mode keys
- test/gen-skill-docs.test.ts: budget 35K → 36.5K for the new MCP-mode
  probe in the preamble resolver
- health/SKILL.md.tmpl, sync-gbrain/SKILL.md.tmpl: comment + verdict line

Hard delete:
- bin/gstack-brain-init (replaced by bin/gstack-artifacts-init in v1.27.0.0)
- test/gstack-brain-init-gh-mock.test.ts (replaced by gstack-artifacts-init.test.ts)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: regenerate SKILL.md files after artifacts-sync rename

Mechanical regen via \`bun run gen:skill-docs --host all\`. All */SKILL.md
files reflect the renamed config key (gbrain_sync_mode →
artifacts_sync_mode), the renamed remote-helper file
(~/.gstack-artifacts-remote.txt with brain fallback), the renamed init
script (gstack-artifacts-init), and the new ARTIFACTS_SYNC: remote-mode
status line that fires when a remote-http MCP is registered.

Golden fixtures (test/fixtures/golden/*-ship-SKILL.md) refreshed to match
the regenerated default-ship output.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat: v1.27.0.0 migration — gstack-brain → gstack-artifacts rename

Journaled, interruption-safe migration. Six steps, each writes to
~/.gstack/.migrations/v1.27.0.0.journal on success; re-entry resumes
from the next un-done step. On final success, journal is replaced by
~/.gstack/.migrations/v1.27.0.0.done.

Steps:
1. gh_repo_renamed       gh/glab repo rename gstack-brain-$USER →
                         gstack-artifacts-$USER (idempotent: detects
                         already-renamed and skips)
2. remote_txt_renamed    mv ~/.gstack-brain-remote.txt → artifacts file,
                         rewriting URL path to match the new repo name
3. config_key_renamed    sed -i in ~/.gstack/config.yaml flips
                         gbrain_sync_mode → artifacts_sync_mode
4. claude_md_block       sed flips "- Memory sync:" → "- Artifacts sync:"
                         in cwd CLAUDE.md and ~/.gstack/CLAUDE.md
5. sources_swapped       gbrain sources add NEW (verify) → remove OLD
                         (codex Finding #6: add-before-remove ordering,
                         no downtime window). On remote-MCP mode, prints
                         commands for the brain admin instead of executing.
6. done                  touchfile + delete journal

User opt-out: any "n" or "skip-for-now" answer at the initial prompt
writes a marker file that prevents re-prompting; user can re-invoke
via /setup-gbrain --rerun-migration.

11 unit tests cover: nothing-to-migrate, GitHub happy path, idempotent
re-run, journal-resume mid-flight, remote-MCP print-only path,
add-before-remove ordering verification, add-fail → old source stays
registered, CLAUDE.md field rewrite.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test: regression suite + E2E for v1.27.0.0 rename

Three new regression tests guard the rename's blast radius (per codex
Findings #1, #8, #9, #12):

- test/no-stale-gstack-brain-refs.test.ts: greps bin/, scripts/, *.tmpl,
  test/ for forbidden identifiers (gstack-brain-init, gbrain_sync_mode);
  fails CI if any non-allowlisted file references them.
- test/post-rename-doc-regen.test.ts: confirms gen-skill-docs output has
  no stale references in any */SKILL.md (the cross-product blind spot).
- test/setup-gbrain-path4-structure.test.ts: structural lint over the
  Path 4 prose contract — STOP gates after verify failure, never-write-
  token rules, mode-aware CLAUDE.md block, bearer always via env-var.

Two new gate-tier E2E tests (deterministic stub HTTP server, fixed inputs):

- test/skill-e2e-setup-gbrain-remote.test.ts: Path 4 happy path. Stubs
  an HTTP MCP server, drives the skill via Agent SDK with a stubbed
  bearer, asserts claude.json gets the http MCP entry, CLAUDE.md gets
  the remote-http block, the secret token NEVER leaks to CLAUDE.md.
- test/skill-e2e-setup-gbrain-bad-token.test.ts: stub server returns 401;
  asserts the AUTH classifier hint surfaces, no MCP registration occurs,
  CLAUDE.md is unchanged. Regression guard for the "verify failed → STOP"
  rule.

touchfiles.ts: setup-gbrain-remote and setup-gbrain-bad-token added at
gate-tier so CI catches Path 4 regressions on every PR.

Plus a few comment refs flipped: bin/gstack-jsonl-merge, bin/gstack-timeline-log
(legacy gstack-brain-init mentions in headers).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* release: v1.27.0.0 — /setup-gbrain Path 4 + brain → artifacts rename

Bumps VERSION 1.26.4.0 → 1.27.0.0 (MINOR per CLAUDE.md scale-aware bump
guidance: ~1500 line net change including a new path in /setup-gbrain,
two new bin helpers, a journaled migration, 59 new tests, and a config
key rename across the codebase).

CHANGELOG entry covers: Path 4 (Remote MCP) end-to-end, the brain →
artifacts rename, the journaled migration, the verify-helper error
classifier, the artifacts-init multi-host provider choice. Includes
the canonical Garry-voice headline + numbers table + audience close
per the release-summary format.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test: demote setup-gbrain Path 4 E2E to periodic-tier

The Agent SDK E2E tests for Path 4 (skill-e2e-setup-gbrain-remote and
skill-e2e-setup-gbrain-bad-token) are inherently non-deterministic —
the model interprets "follow Path 4 only" prompts flexibly and can
skip Step 8 (CLAUDE.md write) or shortcut past the verify helper, which
makes the gate-tier assertions flaky.

The deterministic gate coverage for Path 4 is in
test/setup-gbrain-path4-structure.test.ts: a fast structural lint that
catches AUQ-pacing regressions and prose contract drift in <200ms with
zero token spend. That test is the right tool for catching the failure
mode the gate-tier was meant to guard against.

The Agent SDK E2E tests stay available on-demand for periodic-tier runs
(EVALS=1 EVALS_TIER=periodic bun test test/skill-e2e-setup-gbrain-*.test.ts).
Also tightened the verify-error assertion to the literal field shape
("error_class": "AUTH") instead of a substring match that false-matches
the parent claude session's "needs-auth" MCP discovery markers.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: sync package.json version to 1.27.0.0

VERSION was bumped to 1.27.0.0 in f6ec11e but package.json was not
updated in the same commit. The gen-skill-docs.test.ts assertion
"package.json version matches VERSION file" caught the drift.

This is the DRIFT_STALE_PKG case the /ship Step 12 idempotency check
is designed for; the fix is the documented sync-only repair (no
re-bump, package.json synced to existing VERSION).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
gonnabe88 pushed a commit to gonnabe88/gstack that referenced this pull request May 9, 2026
… rename (garrytan#1351)

* feat: gstack-gbrain-mcp-verify helper for remote MCP probe

Probes a remote gbrain MCP endpoint with bearer auth. POSTs initialize,
classifies failures into NETWORK / AUTH / MALFORMED with one-line
remediation hints, and runs a tools/list capability probe to detect
sources_add MCP support (forward-compat for when gbrain ships URL ingest).

Token consumed from GBRAIN_MCP_TOKEN env, never argv. Required to set
both 'application/json' AND 'text/event-stream' in Accept; that gotcha
costs 10 minutes of debugging when missed (regression-tested).

Live-verified against wintermute (gbrain v0.27.1).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat: gstack-artifacts-init + gstack-artifacts-url helpers

artifacts-init replaces brain-init with provider choice (gh / glab /
manual), per-user gstack-artifacts-$USER repo, HTTPS-canonical storage in
~/.gstack-artifacts-remote.txt, and a "send this to your brain admin"
hookup printout. Always prints the command, never auto-executes — gbrain
v0.26.x has no admin-scope MCP probe (codex Finding garrytan#3).

artifacts-url centralizes HTTPS↔SSH/host/owner-repo conversion so callers
don't each string-mangle (codex Finding garrytan#10). The remote-conflict check in
artifacts-init compares at the canonical level so re-running with HTTPS
input doesn't trip on a stored SSH URL for the same logical repo.

The "URL form not supported" branch prints a two-line clone-then-path
form for gbrain v0.26.x; the supported branch is a one-liner with --url
ready for when gbrain ships URL ingest.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat: extend gstack-gbrain-detect with mcp_mode + artifacts_remote

Adds two new fields to detect's JSON output:

- gbrain_mcp_mode: local-stdio | remote-http | none
  Resolved via 3-tier fallback (codex Finding D3): claude mcp get --json
  → claude mcp list text-grep → ~/.claude.json jq read. If Anthropic moves
  the file format, the first two tiers absorb it.

- gstack_artifacts_remote: HTTPS URL from ~/.gstack-artifacts-remote.txt
  Falls back to ~/.gstack-brain-remote.txt during the v1.27.0.0 migration
  window so detect doesn't return empty between upgrade and migration.

Existing detect tests still pass (15/15). New 19 tests cover every fallback
tier independently, plus a schema regression for /sync-gbrain compat.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat: setup-gbrain Path 4 (remote MCP) + artifacts rename

Path 4 lets users paste an HTTPS MCP URL + bearer token and registers it
as an HTTP-transport MCP without needing a local gbrain CLI install. The
flow:

- Step 2 gains a fourth option (Remote gbrain MCP)
- Step 4 adds Path 4 sub-flow: collect URL, secret-read bearer, verify
  via gstack-gbrain-mcp-verify (NETWORK / AUTH / MALFORMED classifier)
- Step 5 (local doctor), Step 7.5 (transcript ingest), Step 5a's stdio
  branch all skip on Path 4
- Step 5a adds an HTTP+bearer registration form: claude mcp add
  --transport http --header "Authorization: Bearer ..."
- Step 7 renamed "session memory sync" → "artifacts sync" and now calls
  gstack-artifacts-init (which always prints the brain-admin hookup
  command — no auto-execute, codex Finding garrytan#3)
- Step 8 CLAUDE.md block branches: remote-http includes URL + server
  version (never the token); local-stdio keeps engine + config-file
- Step 9 smoke test on Path 4 prints the curl-equivalent for
  post-restart verification (MCP tools aren't visible mid-session)
- Step 10 verdict block has separate templates per mode

Idempotency: re-running with gbrain_mcp_mode=remote-http already in
detect output skips Step 2 entirely and goes to verification.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor: rename gbrain_sync_mode → artifacts_sync_mode (v1.27.0.0 prep)

Hard rename, no dual-read alias (codex Finding D4). The on-disk migration
script (Phase C, separate commit) renames the config key in users'
~/.gstack/config.yaml and any CLAUDE.md blocks.

Touched call sites:
- bin/gstack-config defaults + validation + list/defaults output
- bin/gstack-gbrain-detect (gstack_brain_sync_mode field still emitted
  with the same name for downstream-tool compat; reads new key)
- bin/gstack-brain-sync, bin/gstack-brain-enqueue, bin/gstack-brain-uninstall
- bin/gstack-timeline-log (comment ref)
- scripts/resolvers/preamble/generate-brain-sync-block.ts: renames key,
  branches on gbrain_mcp_mode=remote-http to emit "ARTIFACTS_SYNC:
  remote-mode (managed by brain server <host>)" instead of the local
  mode/queue/last_push line (codex Finding garrytan#11)
- bin/gstack-brain-restore + bin/gstack-gbrain-source-wireup: read
  ~/.gstack-artifacts-remote.txt with ~/.gstack-brain-remote.txt fallback
  during the migration window
- bin/gstack-artifacts-init: tolerant of unrecognized URL forms (local
  paths, file://, self-hosted gitea) so test infrastructure and unusual
  remotes work without canonicalization
- test/brain-sync.test.ts: gstack-brain-init → gstack-artifacts-init
- test/skill-e2e-brain-privacy-gate.test.ts: artifacts_sync_mode keys
- test/gen-skill-docs.test.ts: budget 35K → 36.5K for the new MCP-mode
  probe in the preamble resolver
- health/SKILL.md.tmpl, sync-gbrain/SKILL.md.tmpl: comment + verdict line

Hard delete:
- bin/gstack-brain-init (replaced by bin/gstack-artifacts-init in v1.27.0.0)
- test/gstack-brain-init-gh-mock.test.ts (replaced by gstack-artifacts-init.test.ts)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: regenerate SKILL.md files after artifacts-sync rename

Mechanical regen via \`bun run gen:skill-docs --host all\`. All */SKILL.md
files reflect the renamed config key (gbrain_sync_mode →
artifacts_sync_mode), the renamed remote-helper file
(~/.gstack-artifacts-remote.txt with brain fallback), the renamed init
script (gstack-artifacts-init), and the new ARTIFACTS_SYNC: remote-mode
status line that fires when a remote-http MCP is registered.

Golden fixtures (test/fixtures/golden/*-ship-SKILL.md) refreshed to match
the regenerated default-ship output.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat: v1.27.0.0 migration — gstack-brain → gstack-artifacts rename

Journaled, interruption-safe migration. Six steps, each writes to
~/.gstack/.migrations/v1.27.0.0.journal on success; re-entry resumes
from the next un-done step. On final success, journal is replaced by
~/.gstack/.migrations/v1.27.0.0.done.

Steps:
1. gh_repo_renamed       gh/glab repo rename gstack-brain-$USER →
                         gstack-artifacts-$USER (idempotent: detects
                         already-renamed and skips)
2. remote_txt_renamed    mv ~/.gstack-brain-remote.txt → artifacts file,
                         rewriting URL path to match the new repo name
3. config_key_renamed    sed -i in ~/.gstack/config.yaml flips
                         gbrain_sync_mode → artifacts_sync_mode
4. claude_md_block       sed flips "- Memory sync:" → "- Artifacts sync:"
                         in cwd CLAUDE.md and ~/.gstack/CLAUDE.md
5. sources_swapped       gbrain sources add NEW (verify) → remove OLD
                         (codex Finding garrytan#6: add-before-remove ordering,
                         no downtime window). On remote-MCP mode, prints
                         commands for the brain admin instead of executing.
6. done                  touchfile + delete journal

User opt-out: any "n" or "skip-for-now" answer at the initial prompt
writes a marker file that prevents re-prompting; user can re-invoke
via /setup-gbrain --rerun-migration.

11 unit tests cover: nothing-to-migrate, GitHub happy path, idempotent
re-run, journal-resume mid-flight, remote-MCP print-only path,
add-before-remove ordering verification, add-fail → old source stays
registered, CLAUDE.md field rewrite.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test: regression suite + E2E for v1.27.0.0 rename

Three new regression tests guard the rename's blast radius (per codex
Findings garrytan#1, garrytan#8, garrytan#9, garrytan#12):

- test/no-stale-gstack-brain-refs.test.ts: greps bin/, scripts/, *.tmpl,
  test/ for forbidden identifiers (gstack-brain-init, gbrain_sync_mode);
  fails CI if any non-allowlisted file references them.
- test/post-rename-doc-regen.test.ts: confirms gen-skill-docs output has
  no stale references in any */SKILL.md (the cross-product blind spot).
- test/setup-gbrain-path4-structure.test.ts: structural lint over the
  Path 4 prose contract — STOP gates after verify failure, never-write-
  token rules, mode-aware CLAUDE.md block, bearer always via env-var.

Two new gate-tier E2E tests (deterministic stub HTTP server, fixed inputs):

- test/skill-e2e-setup-gbrain-remote.test.ts: Path 4 happy path. Stubs
  an HTTP MCP server, drives the skill via Agent SDK with a stubbed
  bearer, asserts claude.json gets the http MCP entry, CLAUDE.md gets
  the remote-http block, the secret token NEVER leaks to CLAUDE.md.
- test/skill-e2e-setup-gbrain-bad-token.test.ts: stub server returns 401;
  asserts the AUTH classifier hint surfaces, no MCP registration occurs,
  CLAUDE.md is unchanged. Regression guard for the "verify failed → STOP"
  rule.

touchfiles.ts: setup-gbrain-remote and setup-gbrain-bad-token added at
gate-tier so CI catches Path 4 regressions on every PR.

Plus a few comment refs flipped: bin/gstack-jsonl-merge, bin/gstack-timeline-log
(legacy gstack-brain-init mentions in headers).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* release: v1.27.0.0 — /setup-gbrain Path 4 + brain → artifacts rename

Bumps VERSION 1.26.4.0 → 1.27.0.0 (MINOR per CLAUDE.md scale-aware bump
guidance: ~1500 line net change including a new path in /setup-gbrain,
two new bin helpers, a journaled migration, 59 new tests, and a config
key rename across the codebase).

CHANGELOG entry covers: Path 4 (Remote MCP) end-to-end, the brain →
artifacts rename, the journaled migration, the verify-helper error
classifier, the artifacts-init multi-host provider choice. Includes
the canonical Garry-voice headline + numbers table + audience close
per the release-summary format.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test: demote setup-gbrain Path 4 E2E to periodic-tier

The Agent SDK E2E tests for Path 4 (skill-e2e-setup-gbrain-remote and
skill-e2e-setup-gbrain-bad-token) are inherently non-deterministic —
the model interprets "follow Path 4 only" prompts flexibly and can
skip Step 8 (CLAUDE.md write) or shortcut past the verify helper, which
makes the gate-tier assertions flaky.

The deterministic gate coverage for Path 4 is in
test/setup-gbrain-path4-structure.test.ts: a fast structural lint that
catches AUQ-pacing regressions and prose contract drift in <200ms with
zero token spend. That test is the right tool for catching the failure
mode the gate-tier was meant to guard against.

The Agent SDK E2E tests stay available on-demand for periodic-tier runs
(EVALS=1 EVALS_TIER=periodic bun test test/skill-e2e-setup-gbrain-*.test.ts).
Also tightened the verify-error assertion to the literal field shape
("error_class": "AUTH") instead of a substring match that false-matches
the parent claude session's "needs-auth" MCP discovery markers.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: sync package.json version to 1.27.0.0

VERSION was bumped to 1.27.0.0 in f6ec11e but package.json was not
updated in the same commit. The gen-skill-docs.test.ts assertion
"package.json version matches VERSION file" caught the drift.

This is the DRIFT_STALE_PKG case the /ship Step 12 idempotency check
is designed for; the fix is the documented sync-only repair (no
re-bump, package.json synced to existing VERSION).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
garrytan added a commit that referenced this pull request May 13, 2026
…path

Per plan D7 (rollback semantics) and codex #10 (rollback scope). The
/setup-gbrain skill instructs the model to follow a specific shell
sequence when running `gbrain init --pglite` against an existing
config:

  1. mv ~/.gbrain/config.json ~/.gbrain/config.json.gstack-bak-<ts>
  2. gbrain init --pglite --json
  3. on non-zero exit: mv .bak back; surface error

This test verifies that contract using a fake `gbrain` binary that
fails on init. Three cases:

  - FAILURE: gbrain init exits non-zero → broken config restored to
    original path, no leftover .bak.
  - SUCCESS: gbrain init exits 0 → new config in place, .bak survives
    for audit (user reviews + deletes manually).
  - SCOPE: any partial PGLite directory at ~/.gbrain/pglite/ is NOT
    auto-cleaned. We only promise to restore config.json; PGLite
    cleanup is the user's call (codex #10).

If the skill template rewrites this sequence in a future change, this
test should fail until the test's shell is updated too. That's the
point — keep the test and the skill template aligned.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
garrytan added a commit that referenced this pull request May 15, 2026
…for code) (#1500)

* feat(gbrain): add lib/gbrain-local-status classifier with 5-state engine status + 60s cache

Foundation for split-engine gbrain: shared classifier used by both
bin/gstack-gbrain-detect (preamble probe) and bin/gstack-gbrain-sync.ts
(orchestrator SKIP-when-not-ok). Single source of truth.

Probes via `gbrain sources list --json` and classifies stderr against the
same patterns lib/gbrain-sources.ts:66-67 already uses ("Cannot connect to
database", "config.json"). Returns one of: ok, no-cli, missing-config,
broken-config, broken-db. Defensive default: unrecognized failures
classify as broken-config so the raw stderr can be surfaced upstream.

Cache at ~/.gstack/.gbrain-local-status-cache.json keyed on
{home, path_hash, gbrain_bin_path, gbrain_version, config_mtime, config_size}
with 60s TTL. Cache invalidates on any invariant change. --no-cache option
busts the cache for callers that just mutated state (/setup-gbrain,
/sync-gbrain after init/migration).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(gbrain): rewrite gstack-gbrain-detect bash→TS + add gbrain_local_status field

Replaces the bash detect helper with a bun shebang script sharing the
gbrain_local_status classifier from lib/gbrain-local-status.ts with the
sync orchestrator. Single source of truth for engine-status classification
between preamble-probe and orchestrator-skip paths.

Filename stays gstack-gbrain-detect (no .ts extension) so existing skill
preamble callers shell out unchanged. Shebang `#!/usr/bin/env -S bun run`
resolves bun at runtime.

Output is key/type backward-compatible with the bash version per plan
codex #5: the 9 pre-existing keys (gbrain_on_path, gbrain_version,
gbrain_config_exists, gbrain_engine, gbrain_doctor_ok, gbrain_mcp_mode,
gstack_brain_sync_mode, gstack_brain_git, gstack_artifacts_remote) stay
identical in name + type + value semantics. One new key added:
gbrain_local_status (5-state string enum).

Updates the existing schema regression at test/gstack-gbrain-detect-mcp-mode.test.ts
to include the new key. Adds test/gbrain-detect-shape.test.ts asserting
the regression contract for future changes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(gbrain): orchestrator SKIP when local engine not ok + remote-http transcripts via artifacts pipeline

Two changes in the sync orchestrator, both per plan D11/D12:

1. bin/gstack-gbrain-sync.ts: runCodeImport + runMemoryIngest call
   localEngineStatus() (shared classifier from lib/gbrain-local-status.ts).
   When status is not 'ok', return a SKIP stage result with a clear reason
   instead of crashing with "source registration failed: gbrain not
   configured". Brain-sync stage runs regardless — it doesn't depend on
   local engine. dry-run preview path is gated above the check so it
   continues to show would-do steps even when the engine is broken.

2. bin/gstack-memory-ingest.ts: when gbrain MCP is registered as
   remote-http (Path 4), persist staged transcripts to
   ~/.gstack/transcripts/run-<pid>-<ts>/ instead of the ephemeral
   ~/.gstack/.staging-ingest-<pid>-<ts>/ tmp dir, and SKIP the local
   `gbrain import` call entirely. The artifacts pipeline (gstack-brain-sync
   push to git, brain admin pulls and indexes) handles routing to the
   remote brain. Local PGLite (when present via Step 4.5) stays code-only.

State recording still happens — prepared pages get their mtime+sha256
stamped under remote-http mode so the next /sync-gbrain doesn't
re-stage them. Cleanup is skipped intentionally so the persisted dir
survives until gstack-brain-sync moves it.

Adds test/gbrain-sync-skip.test.ts covering 5 SKIP scenarios (broken-db,
broken-config, no-cli, missing-config, ok pass-through). All 25
sync-related unit tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(gbrain): v1.34.0.0 migration notice + transcripts allowlist for artifacts pipeline

Per plan D5 + D11. Two pieces of the split-engine rollout:

1. gstack-upgrade/migrations/v1.34.0.0.sh — prints a one-time
   discoverability notice for existing Path 4 (remote-http MCP) users
   whose machine has no local engine yet. Tells them about /setup-gbrain
   Step 4.5 (the new local-PGLite opt-in). Silent for everyone else.
   User can suppress permanently via `gstack-config set
   local_code_index_offered true`. Touchfile at
   ~/.gstack/.migrations/v1.34.0.0.done makes it idempotent.

2. bin/gstack-artifacts-init — adds `transcripts/run-*/*.md` and
   `transcripts/run-*/**/*.md` to the managed allowlist so the
   gstack-memory-ingest persistent staging dir (used in remote-http
   mode per D11) gets pushed to the artifacts repo. Brain admin's
   pull job then indexes transcripts into the remote brain.
   Privacy class: behavioral (matches transcript content).

Adds test/gstack-upgrade-migration-v1_34_0_0.test.ts with 5 cases:
state match, no-MCP, local-config-present, opt-out, and idempotency.
All 5 pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(gbrain): /setup-gbrain Step 1.5/4.5 + /sync-gbrain Step 1.5 templates

Per plan D4, D10, D11, D12. Wires the skill prose to the new
split-engine flow + classifier introduced in earlier commits.

setup-gbrain/SKILL.md.tmpl:
  - Step 1: detect output description now includes the v1.34.0.0
    gbrain_local_status field (5 values).
  - Step 1.5 (NEW): broken-db / broken-config remediation. AskUserQuestion
    with 4 options — Retry / Switch to PGLite / Switch brain mode / Quit
    (plan D4). Retry is recommended first since broken-db often = transient
    Postgres outage. PGLite is explicitly one-way + destructive (moves
    existing config to ~/.gbrain/config.json.gstack-bak-<ts>); rollback on
    init failure restores the .bak (plan D7).
  - Step 4d → Step 4.5 (NEW): in Path 4, after the verify step, offer
    local PGLite for code search. AskUserQuestion Yes/No (plan D10/D11).
    Yes path runs gstack-gbrain-install + `gbrain init --pglite --json`
    with the same rollback-safe sequence. No path skips Steps 3/4/5/7.5.
  - Step 10 verdict (Path 4): adds "Code search" row reflecting Step 4.5
    choice. Updates "Transcripts" row to describe the new D11 routing
    (artifacts repo → remote brain).

sync-gbrain/SKILL.md.tmpl:
  - Step 1 split-engine prose: corrects the prior misleading claim that
    "memory routes through whatever setup-gbrain configured, including
    remote-MCP" (codex finding #3). Memory stage shells out to local
    `gbrain import` in local-stdio mode; in remote-http mode it persists
    to ~/.gstack/transcripts/ for the artifacts pipeline.
  - Step 1.5 (NEW): local-engine pre-flight. STOP on no-cli, broken-config,
    broken-db. Soft skip (continue with code+memory SKIP) on
    missing-config + remote-http per plan D12. Surfaces actionable user
    remediation message instead of the orchestrator crashing two stages
    with ERR.

Regenerated SKILL.md for all hosts (claude, kiro, opencode, slate,
cursor, openclaw, hermes, gbrain). All 712 skill-validation + gen-skill-docs
tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(gbrain): .bak-rollback contract for Step 1.5 / 4.5 init failure path

Per plan D7 (rollback semantics) and codex #10 (rollback scope). The
/setup-gbrain skill instructs the model to follow a specific shell
sequence when running `gbrain init --pglite` against an existing
config:

  1. mv ~/.gbrain/config.json ~/.gbrain/config.json.gstack-bak-<ts>
  2. gbrain init --pglite --json
  3. on non-zero exit: mv .bak back; surface error

This test verifies that contract using a fake `gbrain` binary that
fails on init. Three cases:

  - FAILURE: gbrain init exits non-zero → broken config restored to
    original path, no leftover .bak.
  - SUCCESS: gbrain init exits 0 → new config in place, .bak survives
    for audit (user reviews + deletes manually).
  - SCOPE: any partial PGLite directory at ~/.gbrain/pglite/ is NOT
    auto-cleaned. We only promise to restore config.json; PGLite
    cleanup is the user's call (codex #10).

If the skill template rewrites this sequence in a future change, this
test should fail until the test's shell is updated too. That's the
point — keep the test and the skill template aligned.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(gbrain): periodic E2E for /setup-gbrain Path 4 + Step 4.5 Yes flow

End-to-end coverage of the new opt-in question via runAgentSdkTest.
Stubs the MCP endpoint at /tools/list with a 200 response carrying a
fake gbrain v0.32.3.0 serverInfo, and fakes the gbrain + claude CLIs
so init writes a PGLite config and mcp add succeeds. Asserts the model:

  1. invokes gstack-gbrain-install (Step 4.5 Yes branch)
  2. invokes `gbrain init --pglite --json`
  3. writes a working ~/.gbrain/config.json with engine=pglite
  4. registers the remote MCP via `claude mcp add --transport http`
  5. never leaks the bearer token to CLAUDE.md

Classified as periodic-tier per plan D6 (codex #12 flagged AgentSDK
flakiness; gate-tier coverage of the split-engine behavior lives in the
deterministic unit tests at gbrain-local-status.test.ts and
gbrain-sync-skip.test.ts). Touchfile fires the test when the skill
template, install/verify/init helpers, the local-status classifier, or
the agent-sdk-runner harness changes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(gbrain): bump migration to v1.35.0.0 after main merge

main shipped v1.34.0.0 (factory-export submodule) + v1.34.1.0 (update-check
hardening) while this branch was in flight. The migration file I named
v1.34.0.0.sh now belongs at v1.35.0.0 — the next minor on top of main,
matching the scale of split-engine work (new lib + orchestrator skip +
template overhaul + transcripts routing).

Renames the migration script and its test file; updates all internal
version references in both files. Behavior unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* perf(gbrain): memoize gbrain resolution + use --fast doctor in detect

Cuts detect's wall time substantially by sharing fork-exec results
between the helper that walks the JSON output and the localEngineStatus
classifier from lib/gbrain-local-status.ts.

Before: detect made 2x `command -v gbrain` calls (one in detect's
detectGbrain, one in the classifier's resolveGbrainBin) and 2x
`gbrain --version` calls. With memoization keyed on PATH, both
collapse to one fork each (~400ms saved per skill preamble).

Also adds `--fast` to the `gbrain doctor --json` call in detect so a
broken-db config (Garry's repro) doesn't burn a full 5s timeout on the
doctor's DB-connection check. The classifier still probes the DB
directly via `gbrain sources list --json` for engine reachability —
that's `gbrain_local_status`, separate from the coarse
`gbrain_doctor_ok` summary flag.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(gbrain): relax E2E assertions to smoke-test contract

Per codex #12 (AgentSDK harness is non-deterministic): the E2E now
asserts the model followed the split-engine path WITHOUT requiring a
specific subcommand sequence. Three assertions:

  1. AskUserQuestion was called (model reached interactive branches)
  2. At least one of {gstack-gbrain-install, `gbrain init --pglite`,
     `claude mcp add`} fired (model followed the skill, not a no-op)
  3. The fake bearer token never leaked to CLAUDE.md (security regression)

Deterministic per-step coverage of the same flow lives in the gate-tier
unit tests (gbrain-local-status, gbrain-sync-skip, init-rollback,
upgrade-migration). The E2E exists to catch the "model can't follow
the skill at all" regression class, not to pin the exact tool sequence.

Test passes in 280s against the live Agent SDK.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(version): bump CLI smoke-test timeout to 15s (flaky at 5s under load)

The gstack-next-version integration smoke test spawns a child process
that does git operations + sibling-worktree probing. Wall time hovers
4-5s on M-series Macs; flakes at exactly 5001-5002ms when the test
suite runs under load (bun's parallel scheduling). Bumping per-test
timeout to 15s eliminates the flake without changing test logic.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: bump version and changelog (v1.37.0.0)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
RyotaKun pushed a commit to RyotaKun/gstack that referenced this pull request May 18, 2026
… rename (garrytan#1351)

* feat: gstack-gbrain-mcp-verify helper for remote MCP probe

Probes a remote gbrain MCP endpoint with bearer auth. POSTs initialize,
classifies failures into NETWORK / AUTH / MALFORMED with one-line
remediation hints, and runs a tools/list capability probe to detect
sources_add MCP support (forward-compat for when gbrain ships URL ingest).

Token consumed from GBRAIN_MCP_TOKEN env, never argv. Required to set
both 'application/json' AND 'text/event-stream' in Accept; that gotcha
costs 10 minutes of debugging when missed (regression-tested).

Live-verified against wintermute (gbrain v0.27.1).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat: gstack-artifacts-init + gstack-artifacts-url helpers

artifacts-init replaces brain-init with provider choice (gh / glab /
manual), per-user gstack-artifacts-$USER repo, HTTPS-canonical storage in
~/.gstack-artifacts-remote.txt, and a "send this to your brain admin"
hookup printout. Always prints the command, never auto-executes — gbrain
v0.26.x has no admin-scope MCP probe (codex Finding garrytan#3).

artifacts-url centralizes HTTPS↔SSH/host/owner-repo conversion so callers
don't each string-mangle (codex Finding garrytan#10). The remote-conflict check in
artifacts-init compares at the canonical level so re-running with HTTPS
input doesn't trip on a stored SSH URL for the same logical repo.

The "URL form not supported" branch prints a two-line clone-then-path
form for gbrain v0.26.x; the supported branch is a one-liner with --url
ready for when gbrain ships URL ingest.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat: extend gstack-gbrain-detect with mcp_mode + artifacts_remote

Adds two new fields to detect's JSON output:

- gbrain_mcp_mode: local-stdio | remote-http | none
  Resolved via 3-tier fallback (codex Finding D3): claude mcp get --json
  → claude mcp list text-grep → ~/.claude.json jq read. If Anthropic moves
  the file format, the first two tiers absorb it.

- gstack_artifacts_remote: HTTPS URL from ~/.gstack-artifacts-remote.txt
  Falls back to ~/.gstack-brain-remote.txt during the v1.27.0.0 migration
  window so detect doesn't return empty between upgrade and migration.

Existing detect tests still pass (15/15). New 19 tests cover every fallback
tier independently, plus a schema regression for /sync-gbrain compat.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat: setup-gbrain Path 4 (remote MCP) + artifacts rename

Path 4 lets users paste an HTTPS MCP URL + bearer token and registers it
as an HTTP-transport MCP without needing a local gbrain CLI install. The
flow:

- Step 2 gains a fourth option (Remote gbrain MCP)
- Step 4 adds Path 4 sub-flow: collect URL, secret-read bearer, verify
  via gstack-gbrain-mcp-verify (NETWORK / AUTH / MALFORMED classifier)
- Step 5 (local doctor), Step 7.5 (transcript ingest), Step 5a's stdio
  branch all skip on Path 4
- Step 5a adds an HTTP+bearer registration form: claude mcp add
  --transport http --header "Authorization: Bearer ..."
- Step 7 renamed "session memory sync" → "artifacts sync" and now calls
  gstack-artifacts-init (which always prints the brain-admin hookup
  command — no auto-execute, codex Finding garrytan#3)
- Step 8 CLAUDE.md block branches: remote-http includes URL + server
  version (never the token); local-stdio keeps engine + config-file
- Step 9 smoke test on Path 4 prints the curl-equivalent for
  post-restart verification (MCP tools aren't visible mid-session)
- Step 10 verdict block has separate templates per mode

Idempotency: re-running with gbrain_mcp_mode=remote-http already in
detect output skips Step 2 entirely and goes to verification.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor: rename gbrain_sync_mode → artifacts_sync_mode (v1.27.0.0 prep)

Hard rename, no dual-read alias (codex Finding D4). The on-disk migration
script (Phase C, separate commit) renames the config key in users'
~/.gstack/config.yaml and any CLAUDE.md blocks.

Touched call sites:
- bin/gstack-config defaults + validation + list/defaults output
- bin/gstack-gbrain-detect (gstack_brain_sync_mode field still emitted
  with the same name for downstream-tool compat; reads new key)
- bin/gstack-brain-sync, bin/gstack-brain-enqueue, bin/gstack-brain-uninstall
- bin/gstack-timeline-log (comment ref)
- scripts/resolvers/preamble/generate-brain-sync-block.ts: renames key,
  branches on gbrain_mcp_mode=remote-http to emit "ARTIFACTS_SYNC:
  remote-mode (managed by brain server <host>)" instead of the local
  mode/queue/last_push line (codex Finding garrytan#11)
- bin/gstack-brain-restore + bin/gstack-gbrain-source-wireup: read
  ~/.gstack-artifacts-remote.txt with ~/.gstack-brain-remote.txt fallback
  during the migration window
- bin/gstack-artifacts-init: tolerant of unrecognized URL forms (local
  paths, file://, self-hosted gitea) so test infrastructure and unusual
  remotes work without canonicalization
- test/brain-sync.test.ts: gstack-brain-init → gstack-artifacts-init
- test/skill-e2e-brain-privacy-gate.test.ts: artifacts_sync_mode keys
- test/gen-skill-docs.test.ts: budget 35K → 36.5K for the new MCP-mode
  probe in the preamble resolver
- health/SKILL.md.tmpl, sync-gbrain/SKILL.md.tmpl: comment + verdict line

Hard delete:
- bin/gstack-brain-init (replaced by bin/gstack-artifacts-init in v1.27.0.0)
- test/gstack-brain-init-gh-mock.test.ts (replaced by gstack-artifacts-init.test.ts)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: regenerate SKILL.md files after artifacts-sync rename

Mechanical regen via \`bun run gen:skill-docs --host all\`. All */SKILL.md
files reflect the renamed config key (gbrain_sync_mode →
artifacts_sync_mode), the renamed remote-helper file
(~/.gstack-artifacts-remote.txt with brain fallback), the renamed init
script (gstack-artifacts-init), and the new ARTIFACTS_SYNC: remote-mode
status line that fires when a remote-http MCP is registered.

Golden fixtures (test/fixtures/golden/*-ship-SKILL.md) refreshed to match
the regenerated default-ship output.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat: v1.27.0.0 migration — gstack-brain → gstack-artifacts rename

Journaled, interruption-safe migration. Six steps, each writes to
~/.gstack/.migrations/v1.27.0.0.journal on success; re-entry resumes
from the next un-done step. On final success, journal is replaced by
~/.gstack/.migrations/v1.27.0.0.done.

Steps:
1. gh_repo_renamed       gh/glab repo rename gstack-brain-$USER →
                         gstack-artifacts-$USER (idempotent: detects
                         already-renamed and skips)
2. remote_txt_renamed    mv ~/.gstack-brain-remote.txt → artifacts file,
                         rewriting URL path to match the new repo name
3. config_key_renamed    sed -i in ~/.gstack/config.yaml flips
                         gbrain_sync_mode → artifacts_sync_mode
4. claude_md_block       sed flips "- Memory sync:" → "- Artifacts sync:"
                         in cwd CLAUDE.md and ~/.gstack/CLAUDE.md
5. sources_swapped       gbrain sources add NEW (verify) → remove OLD
                         (codex Finding garrytan#6: add-before-remove ordering,
                         no downtime window). On remote-MCP mode, prints
                         commands for the brain admin instead of executing.
6. done                  touchfile + delete journal

User opt-out: any "n" or "skip-for-now" answer at the initial prompt
writes a marker file that prevents re-prompting; user can re-invoke
via /setup-gbrain --rerun-migration.

11 unit tests cover: nothing-to-migrate, GitHub happy path, idempotent
re-run, journal-resume mid-flight, remote-MCP print-only path,
add-before-remove ordering verification, add-fail → old source stays
registered, CLAUDE.md field rewrite.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test: regression suite + E2E for v1.27.0.0 rename

Three new regression tests guard the rename's blast radius (per codex
Findings garrytan#1, garrytan#8, garrytan#9, garrytan#12):

- test/no-stale-gstack-brain-refs.test.ts: greps bin/, scripts/, *.tmpl,
  test/ for forbidden identifiers (gstack-brain-init, gbrain_sync_mode);
  fails CI if any non-allowlisted file references them.
- test/post-rename-doc-regen.test.ts: confirms gen-skill-docs output has
  no stale references in any */SKILL.md (the cross-product blind spot).
- test/setup-gbrain-path4-structure.test.ts: structural lint over the
  Path 4 prose contract — STOP gates after verify failure, never-write-
  token rules, mode-aware CLAUDE.md block, bearer always via env-var.

Two new gate-tier E2E tests (deterministic stub HTTP server, fixed inputs):

- test/skill-e2e-setup-gbrain-remote.test.ts: Path 4 happy path. Stubs
  an HTTP MCP server, drives the skill via Agent SDK with a stubbed
  bearer, asserts claude.json gets the http MCP entry, CLAUDE.md gets
  the remote-http block, the secret token NEVER leaks to CLAUDE.md.
- test/skill-e2e-setup-gbrain-bad-token.test.ts: stub server returns 401;
  asserts the AUTH classifier hint surfaces, no MCP registration occurs,
  CLAUDE.md is unchanged. Regression guard for the "verify failed → STOP"
  rule.

touchfiles.ts: setup-gbrain-remote and setup-gbrain-bad-token added at
gate-tier so CI catches Path 4 regressions on every PR.

Plus a few comment refs flipped: bin/gstack-jsonl-merge, bin/gstack-timeline-log
(legacy gstack-brain-init mentions in headers).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* release: v1.27.0.0 — /setup-gbrain Path 4 + brain → artifacts rename

Bumps VERSION 1.26.4.0 → 1.27.0.0 (MINOR per CLAUDE.md scale-aware bump
guidance: ~1500 line net change including a new path in /setup-gbrain,
two new bin helpers, a journaled migration, 59 new tests, and a config
key rename across the codebase).

CHANGELOG entry covers: Path 4 (Remote MCP) end-to-end, the brain →
artifacts rename, the journaled migration, the verify-helper error
classifier, the artifacts-init multi-host provider choice. Includes
the canonical Garry-voice headline + numbers table + audience close
per the release-summary format.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test: demote setup-gbrain Path 4 E2E to periodic-tier

The Agent SDK E2E tests for Path 4 (skill-e2e-setup-gbrain-remote and
skill-e2e-setup-gbrain-bad-token) are inherently non-deterministic —
the model interprets "follow Path 4 only" prompts flexibly and can
skip Step 8 (CLAUDE.md write) or shortcut past the verify helper, which
makes the gate-tier assertions flaky.

The deterministic gate coverage for Path 4 is in
test/setup-gbrain-path4-structure.test.ts: a fast structural lint that
catches AUQ-pacing regressions and prose contract drift in <200ms with
zero token spend. That test is the right tool for catching the failure
mode the gate-tier was meant to guard against.

The Agent SDK E2E tests stay available on-demand for periodic-tier runs
(EVALS=1 EVALS_TIER=periodic bun test test/skill-e2e-setup-gbrain-*.test.ts).
Also tightened the verify-error assertion to the literal field shape
("error_class": "AUTH") instead of a substring match that false-matches
the parent claude session's "needs-auth" MCP discovery markers.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: sync package.json version to 1.27.0.0

VERSION was bumped to 1.27.0.0 in f6ec11e but package.json was not
updated in the same commit. The gen-skill-docs.test.ts assertion
"package.json version matches VERSION file" caught the drift.

This is the DRIFT_STALE_PKG case the /ship Step 12 idempotency check
is designed for; the fix is the documented sync-only repair (no
re-bump, package.json synced to existing VERSION).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
RyotaKun pushed a commit to RyotaKun/gstack that referenced this pull request May 18, 2026
…for code) (garrytan#1500)

* feat(gbrain): add lib/gbrain-local-status classifier with 5-state engine status + 60s cache

Foundation for split-engine gbrain: shared classifier used by both
bin/gstack-gbrain-detect (preamble probe) and bin/gstack-gbrain-sync.ts
(orchestrator SKIP-when-not-ok). Single source of truth.

Probes via `gbrain sources list --json` and classifies stderr against the
same patterns lib/gbrain-sources.ts:66-67 already uses ("Cannot connect to
database", "config.json"). Returns one of: ok, no-cli, missing-config,
broken-config, broken-db. Defensive default: unrecognized failures
classify as broken-config so the raw stderr can be surfaced upstream.

Cache at ~/.gstack/.gbrain-local-status-cache.json keyed on
{home, path_hash, gbrain_bin_path, gbrain_version, config_mtime, config_size}
with 60s TTL. Cache invalidates on any invariant change. --no-cache option
busts the cache for callers that just mutated state (/setup-gbrain,
/sync-gbrain after init/migration).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(gbrain): rewrite gstack-gbrain-detect bash→TS + add gbrain_local_status field

Replaces the bash detect helper with a bun shebang script sharing the
gbrain_local_status classifier from lib/gbrain-local-status.ts with the
sync orchestrator. Single source of truth for engine-status classification
between preamble-probe and orchestrator-skip paths.

Filename stays gstack-gbrain-detect (no .ts extension) so existing skill
preamble callers shell out unchanged. Shebang `#!/usr/bin/env -S bun run`
resolves bun at runtime.

Output is key/type backward-compatible with the bash version per plan
codex garrytan#5: the 9 pre-existing keys (gbrain_on_path, gbrain_version,
gbrain_config_exists, gbrain_engine, gbrain_doctor_ok, gbrain_mcp_mode,
gstack_brain_sync_mode, gstack_brain_git, gstack_artifacts_remote) stay
identical in name + type + value semantics. One new key added:
gbrain_local_status (5-state string enum).

Updates the existing schema regression at test/gstack-gbrain-detect-mcp-mode.test.ts
to include the new key. Adds test/gbrain-detect-shape.test.ts asserting
the regression contract for future changes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(gbrain): orchestrator SKIP when local engine not ok + remote-http transcripts via artifacts pipeline

Two changes in the sync orchestrator, both per plan D11/D12:

1. bin/gstack-gbrain-sync.ts: runCodeImport + runMemoryIngest call
   localEngineStatus() (shared classifier from lib/gbrain-local-status.ts).
   When status is not 'ok', return a SKIP stage result with a clear reason
   instead of crashing with "source registration failed: gbrain not
   configured". Brain-sync stage runs regardless — it doesn't depend on
   local engine. dry-run preview path is gated above the check so it
   continues to show would-do steps even when the engine is broken.

2. bin/gstack-memory-ingest.ts: when gbrain MCP is registered as
   remote-http (Path 4), persist staged transcripts to
   ~/.gstack/transcripts/run-<pid>-<ts>/ instead of the ephemeral
   ~/.gstack/.staging-ingest-<pid>-<ts>/ tmp dir, and SKIP the local
   `gbrain import` call entirely. The artifacts pipeline (gstack-brain-sync
   push to git, brain admin pulls and indexes) handles routing to the
   remote brain. Local PGLite (when present via Step 4.5) stays code-only.

State recording still happens — prepared pages get their mtime+sha256
stamped under remote-http mode so the next /sync-gbrain doesn't
re-stage them. Cleanup is skipped intentionally so the persisted dir
survives until gstack-brain-sync moves it.

Adds test/gbrain-sync-skip.test.ts covering 5 SKIP scenarios (broken-db,
broken-config, no-cli, missing-config, ok pass-through). All 25
sync-related unit tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(gbrain): v1.34.0.0 migration notice + transcripts allowlist for artifacts pipeline

Per plan D5 + D11. Two pieces of the split-engine rollout:

1. gstack-upgrade/migrations/v1.34.0.0.sh — prints a one-time
   discoverability notice for existing Path 4 (remote-http MCP) users
   whose machine has no local engine yet. Tells them about /setup-gbrain
   Step 4.5 (the new local-PGLite opt-in). Silent for everyone else.
   User can suppress permanently via `gstack-config set
   local_code_index_offered true`. Touchfile at
   ~/.gstack/.migrations/v1.34.0.0.done makes it idempotent.

2. bin/gstack-artifacts-init — adds `transcripts/run-*/*.md` and
   `transcripts/run-*/**/*.md` to the managed allowlist so the
   gstack-memory-ingest persistent staging dir (used in remote-http
   mode per D11) gets pushed to the artifacts repo. Brain admin's
   pull job then indexes transcripts into the remote brain.
   Privacy class: behavioral (matches transcript content).

Adds test/gstack-upgrade-migration-v1_34_0_0.test.ts with 5 cases:
state match, no-MCP, local-config-present, opt-out, and idempotency.
All 5 pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(gbrain): /setup-gbrain Step 1.5/4.5 + /sync-gbrain Step 1.5 templates

Per plan D4, D10, D11, D12. Wires the skill prose to the new
split-engine flow + classifier introduced in earlier commits.

setup-gbrain/SKILL.md.tmpl:
  - Step 1: detect output description now includes the v1.34.0.0
    gbrain_local_status field (5 values).
  - Step 1.5 (NEW): broken-db / broken-config remediation. AskUserQuestion
    with 4 options — Retry / Switch to PGLite / Switch brain mode / Quit
    (plan D4). Retry is recommended first since broken-db often = transient
    Postgres outage. PGLite is explicitly one-way + destructive (moves
    existing config to ~/.gbrain/config.json.gstack-bak-<ts>); rollback on
    init failure restores the .bak (plan D7).
  - Step 4d → Step 4.5 (NEW): in Path 4, after the verify step, offer
    local PGLite for code search. AskUserQuestion Yes/No (plan D10/D11).
    Yes path runs gstack-gbrain-install + `gbrain init --pglite --json`
    with the same rollback-safe sequence. No path skips Steps 3/4/5/7.5.
  - Step 10 verdict (Path 4): adds "Code search" row reflecting Step 4.5
    choice. Updates "Transcripts" row to describe the new D11 routing
    (artifacts repo → remote brain).

sync-gbrain/SKILL.md.tmpl:
  - Step 1 split-engine prose: corrects the prior misleading claim that
    "memory routes through whatever setup-gbrain configured, including
    remote-MCP" (codex finding garrytan#3). Memory stage shells out to local
    `gbrain import` in local-stdio mode; in remote-http mode it persists
    to ~/.gstack/transcripts/ for the artifacts pipeline.
  - Step 1.5 (NEW): local-engine pre-flight. STOP on no-cli, broken-config,
    broken-db. Soft skip (continue with code+memory SKIP) on
    missing-config + remote-http per plan D12. Surfaces actionable user
    remediation message instead of the orchestrator crashing two stages
    with ERR.

Regenerated SKILL.md for all hosts (claude, kiro, opencode, slate,
cursor, openclaw, hermes, gbrain). All 712 skill-validation + gen-skill-docs
tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(gbrain): .bak-rollback contract for Step 1.5 / 4.5 init failure path

Per plan D7 (rollback semantics) and codex garrytan#10 (rollback scope). The
/setup-gbrain skill instructs the model to follow a specific shell
sequence when running `gbrain init --pglite` against an existing
config:

  1. mv ~/.gbrain/config.json ~/.gbrain/config.json.gstack-bak-<ts>
  2. gbrain init --pglite --json
  3. on non-zero exit: mv .bak back; surface error

This test verifies that contract using a fake `gbrain` binary that
fails on init. Three cases:

  - FAILURE: gbrain init exits non-zero → broken config restored to
    original path, no leftover .bak.
  - SUCCESS: gbrain init exits 0 → new config in place, .bak survives
    for audit (user reviews + deletes manually).
  - SCOPE: any partial PGLite directory at ~/.gbrain/pglite/ is NOT
    auto-cleaned. We only promise to restore config.json; PGLite
    cleanup is the user's call (codex garrytan#10).

If the skill template rewrites this sequence in a future change, this
test should fail until the test's shell is updated too. That's the
point — keep the test and the skill template aligned.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(gbrain): periodic E2E for /setup-gbrain Path 4 + Step 4.5 Yes flow

End-to-end coverage of the new opt-in question via runAgentSdkTest.
Stubs the MCP endpoint at /tools/list with a 200 response carrying a
fake gbrain v0.32.3.0 serverInfo, and fakes the gbrain + claude CLIs
so init writes a PGLite config and mcp add succeeds. Asserts the model:

  1. invokes gstack-gbrain-install (Step 4.5 Yes branch)
  2. invokes `gbrain init --pglite --json`
  3. writes a working ~/.gbrain/config.json with engine=pglite
  4. registers the remote MCP via `claude mcp add --transport http`
  5. never leaks the bearer token to CLAUDE.md

Classified as periodic-tier per plan D6 (codex garrytan#12 flagged AgentSDK
flakiness; gate-tier coverage of the split-engine behavior lives in the
deterministic unit tests at gbrain-local-status.test.ts and
gbrain-sync-skip.test.ts). Touchfile fires the test when the skill
template, install/verify/init helpers, the local-status classifier, or
the agent-sdk-runner harness changes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(gbrain): bump migration to v1.35.0.0 after main merge

main shipped v1.34.0.0 (factory-export submodule) + v1.34.1.0 (update-check
hardening) while this branch was in flight. The migration file I named
v1.34.0.0.sh now belongs at v1.35.0.0 — the next minor on top of main,
matching the scale of split-engine work (new lib + orchestrator skip +
template overhaul + transcripts routing).

Renames the migration script and its test file; updates all internal
version references in both files. Behavior unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* perf(gbrain): memoize gbrain resolution + use --fast doctor in detect

Cuts detect's wall time substantially by sharing fork-exec results
between the helper that walks the JSON output and the localEngineStatus
classifier from lib/gbrain-local-status.ts.

Before: detect made 2x `command -v gbrain` calls (one in detect's
detectGbrain, one in the classifier's resolveGbrainBin) and 2x
`gbrain --version` calls. With memoization keyed on PATH, both
collapse to one fork each (~400ms saved per skill preamble).

Also adds `--fast` to the `gbrain doctor --json` call in detect so a
broken-db config (Garry's repro) doesn't burn a full 5s timeout on the
doctor's DB-connection check. The classifier still probes the DB
directly via `gbrain sources list --json` for engine reachability —
that's `gbrain_local_status`, separate from the coarse
`gbrain_doctor_ok` summary flag.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(gbrain): relax E2E assertions to smoke-test contract

Per codex garrytan#12 (AgentSDK harness is non-deterministic): the E2E now
asserts the model followed the split-engine path WITHOUT requiring a
specific subcommand sequence. Three assertions:

  1. AskUserQuestion was called (model reached interactive branches)
  2. At least one of {gstack-gbrain-install, `gbrain init --pglite`,
     `claude mcp add`} fired (model followed the skill, not a no-op)
  3. The fake bearer token never leaked to CLAUDE.md (security regression)

Deterministic per-step coverage of the same flow lives in the gate-tier
unit tests (gbrain-local-status, gbrain-sync-skip, init-rollback,
upgrade-migration). The E2E exists to catch the "model can't follow
the skill at all" regression class, not to pin the exact tool sequence.

Test passes in 280s against the live Agent SDK.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(version): bump CLI smoke-test timeout to 15s (flaky at 5s under load)

The gstack-next-version integration smoke test spawns a child process
that does git operations + sibling-worktree probing. Wall time hovers
4-5s on M-series Macs; flakes at exactly 5001-5002ms when the test
suite runs under load (bun's parallel scheduling). Bumping per-test
timeout to 15s eliminates the flake without changing test logic.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: bump version and changelog (v1.37.0.0)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
adamtasteslikegood added a commit to adamtasteslikegood/gstack that referenced this pull request May 19, 2026
* v1.26.5.0 fix wave: gbrain ingest writer (hybrid frontmatter) + gbrain-valid source ids (#1344)

* fix: use correct `gbrain put <slug>` CLI verb in memory ingest

`put_page` is the MCP tool name, not a CLI subcommand. The actual
gbrain verb is `put <slug>` with content via stdin and tags in YAML
frontmatter. Every transcript / memory ingest fails today on clean
installs.

Switch to the right verb and inject title/type/tags into the
frontmatter that buildTranscriptPage / buildArtifactPage already
produce.

Bundled in the same function:

- timeout: 30s → 60s. Auto-link reconciliation hits 30s once the
  brain has a few hundred pages.
- maxBuffer: 1MB → 16MB. Without it Node truncates gbrain's stderr
  and callers see only `Command failed:` with no detail.
- Surface stderr/stdout in the returned error instead of the bare
  exception.

Verified: bun test test/gstack-memory-ingest.test.ts -> 15/15 pass.
bun test on the three test files touching this path -> 362/362.

* fix(sync-gbrain): generate gbrain-valid source ids for repos with dots or long names

`deriveCodeSourceId` previously concatenated the canonicalized remote with only `/`
and whitespace stripped, leaving dots from hostnames (`github.com`) and no length
cap. gbrain rejects any source id containing characters outside [a-z0-9-] or longer
than 32 chars, so `github.com/<org>/<repo>` produced `gstack-code-github.com-<org>-<repo>`
(40 chars, plus dots) and registration failed:

    code  source registration failed: Invalid source id
          "gstack-code-github.com-radubach-platform". Must be 1-32 lowercase alnum
          chars with optional interior hyphens.

Fix:
- Drop the host segment (`github.com` is the same for nearly every user and just
  consumes the 32-char budget). Use only the last two path segments (org-repo).
- Sanitize any remaining non-alnum to hyphens, then collapse and trim.
- For genuinely long org/repo names that still exceed the budget, keep the tail
  (most distinctive end of the slug) and append a 6-char sha1 hash for collision
  resistance.

Adds a regression test that spawns the CLI in temp git repos with controlled
remotes (dot in hostname, SCP-style, multi-dot host, long names forcing
hash-truncation) and asserts every derived id is ≤32 chars and matches the
gbrain validator regex.

* fix(memory-ingest): hybrid frontmatter writer + tightened gbrain availability probe

PR #1328 (merged in the prior commit) correctly injects title/type/tags
into the YAML frontmatter that buildTranscriptPage already prepends. But
buildArtifactPage emits raw markdown without frontmatter, so design-docs,
learnings, and builder-profile-entries were landing in gbrain with empty
title/type/tags. Add the no-frontmatter wrap branch so artifact pages get
the same metadata the inject branch provides for transcripts.

Also bring in gbrainAvailable()'s --help probe (originally proposed in
PR #1341 by Alex Medina), with the regex tightened from /(^|\s)put(\s|$)/m
to /^\s+put\s/m. Anchoring on the indented subcommand format gbrain's
help actually uses keeps the probe from matching "put" appearing as
prose in help text, while still failing fast with one clean error if a
future gbrain renames or removes the put subcommand.

Updates the V1.5 NOTE doc block at the top of the file to describe the
current put-via-stdin shape rather than the legacy put_page flag form.

Co-Authored-By: Alex Medina <oficina@puntoverdemc.com>

* test+fix(memory-ingest): strengthen regression tests, fix inject for malformed-close frontmatter

Imports the shim-based regression tests from PR #1341 (Alex Medina) and
strengthens them to assert title, type, and tags actually arrive in put
stdin — not just `agent: claude-code`. Asserting the metadata fields
matches the regression class that's caused this fix wave: writers can
"succeed" while metadata is silently lost. The original PR #1341 tests
would have passed even with title/type/tags missing.

Strengthening the test surfaced a deeper issue. buildTranscriptPage joins
frontmatter array elements with "\n" and does not append a trailing
newline, so the close fence is "\n---<content>" directly, not "\n---\n".
PR #1328's inject branch searched for "\n---\n" and never matched —
which means even with PR #1328 alone, transcript pages were landing in
gbrain with no title/type/tags. Two-line fix: search for "\n---" only,
since the inject lands before the close fence regardless of what
follows it.

Also imports PR #1341's V1.5 NOTE doc-block update and the section
comment refresh so the prose stays accurate against the new writer
shape.

Co-Authored-By: Alex Medina <oficina@puntoverdemc.com>

* fix+test(gbrain-sync): handle empty-slug edge in constrainSourceId, add no-origin and basename-empty regression tests

PR #1330 (merged in the prior commit) addressed the dot-in-host and
length-overflow cases for source-id derivation, but constrainSourceId
silently returned "${prefix}-" when the input sanitized to an empty
slug — invalid per gbrain's `^[a-z0-9](?:[a-z0-9-]{0,30}[a-z0-9])?$`
validator on the trailing hyphen. Adds an explicit empty-slug branch
that falls back to a sha1-prefixed id ("gstack-code-<6hex>") so the
output stays gbrain-valid for every input shape.

Two new regression tests cover the corners PR #1330's coverage left
exposed:
- no-origin fallback: a cwd repo with no `origin` remote configured
  must still derive a valid id from the basename.
- basename-sanitizes-to-empty: a repo whose path basename is all
  non-alnum (e.g. "___") must produce the hash-only fallback, not
  an invalid trailing-hyphen id.

Both run the CLI inside temp git repos for genuine end-to-end
coverage (matches the pattern PR #1330 established for its own four
remote-shape cases).

Co-Authored-By: Richard Dubach <radubach@gmail.com>

* chore: bump VERSION to 1.26.5.0 + CHANGELOG entry for fix wave

PATCH bump. Three bug fixes (memory-ingest put_page CLI verb mismatch,
hybrid frontmatter writer for transcripts AND artifacts, gbrain-valid
source-id derivation for github-hosted repos), no new user capability.

CHANGELOG release-summary leads with what users can now do (clean-
install transcripts populate the brain, github-hosted repos register
code sources) and tabulates before/after numbers from real gbrain
v0.25.1 smoke output. Itemized changes credit @smithjoshua, @AZ-1224,
and @radubach for the originating PRs plus the additional hybrid
branch + strengthened tests added on top per Codex plan-review.

* docs(todos): file P2 (gbrain install-pin staleness) + P3 (source-id host-collision) follow-ups

Two follow-ups surfaced during the v1.26.5.0 fix-wave plan review.

P2 — Issue #1305 part 2: bin/gstack-gbrain-install pins gbrain to
v0.18.2 (commit 08b3698) but doesn't move when gstack ships features
that depend on newer gbrain ops or schema. Fresh /setup-gbrain on
v1.26.x lands users on schema 24 with v1.26 features expecting 32+.
Captured for a future fix-wave.

P3 — Codex P1.3 from the v1.26.5.0 plan review: deriveCodeSourceId
drops the host segment to fit gbrain's 32-char source-id budget,
which means github.com/acme/foo and gitlab.com/acme/foo collapse to
the same source id. Real but rare; PR #1330 author explicitly
considered this and chose budget over cross-host uniqueness. Captured
as a long-tail concern.

---------

Co-authored-by: Joshua Smith <joshualowellsmith@gmail.com>
Co-authored-by: Richard Dubach <radubach@gmail.com>
Co-authored-by: Alex Medina <oficina@puntoverdemc.com>

* v1.27.0.0 feat: /setup-gbrain Path 4 (remote MCP) + brain → artifacts rename (#1351)

* feat: gstack-gbrain-mcp-verify helper for remote MCP probe

Probes a remote gbrain MCP endpoint with bearer auth. POSTs initialize,
classifies failures into NETWORK / AUTH / MALFORMED with one-line
remediation hints, and runs a tools/list capability probe to detect
sources_add MCP support (forward-compat for when gbrain ships URL ingest).

Token consumed from GBRAIN_MCP_TOKEN env, never argv. Required to set
both 'application/json' AND 'text/event-stream' in Accept; that gotcha
costs 10 minutes of debugging when missed (regression-tested).

Live-verified against wintermute (gbrain v0.27.1).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat: gstack-artifacts-init + gstack-artifacts-url helpers

artifacts-init replaces brain-init with provider choice (gh / glab /
manual), per-user gstack-artifacts-$USER repo, HTTPS-canonical storage in
~/.gstack-artifacts-remote.txt, and a "send this to your brain admin"
hookup printout. Always prints the command, never auto-executes — gbrain
v0.26.x has no admin-scope MCP probe (codex Finding #3).

artifacts-url centralizes HTTPS↔SSH/host/owner-repo conversion so callers
don't each string-mangle (codex Finding #10). The remote-conflict check in
artifacts-init compares at the canonical level so re-running with HTTPS
input doesn't trip on a stored SSH URL for the same logical repo.

The "URL form not supported" branch prints a two-line clone-then-path
form for gbrain v0.26.x; the supported branch is a one-liner with --url
ready for when gbrain ships URL ingest.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat: extend gstack-gbrain-detect with mcp_mode + artifacts_remote

Adds two new fields to detect's JSON output:

- gbrain_mcp_mode: local-stdio | remote-http | none
  Resolved via 3-tier fallback (codex Finding D3): claude mcp get --json
  → claude mcp list text-grep → ~/.claude.json jq read. If Anthropic moves
  the file format, the first two tiers absorb it.

- gstack_artifacts_remote: HTTPS URL from ~/.gstack-artifacts-remote.txt
  Falls back to ~/.gstack-brain-remote.txt during the v1.27.0.0 migration
  window so detect doesn't return empty between upgrade and migration.

Existing detect tests still pass (15/15). New 19 tests cover every fallback
tier independently, plus a schema regression for /sync-gbrain compat.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat: setup-gbrain Path 4 (remote MCP) + artifacts rename

Path 4 lets users paste an HTTPS MCP URL + bearer token and registers it
as an HTTP-transport MCP without needing a local gbrain CLI install. The
flow:

- Step 2 gains a fourth option (Remote gbrain MCP)
- Step 4 adds Path 4 sub-flow: collect URL, secret-read bearer, verify
  via gstack-gbrain-mcp-verify (NETWORK / AUTH / MALFORMED classifier)
- Step 5 (local doctor), Step 7.5 (transcript ingest), Step 5a's stdio
  branch all skip on Path 4
- Step 5a adds an HTTP+bearer registration form: claude mcp add
  --transport http --header "Authorization: Bearer ..."
- Step 7 renamed "session memory sync" → "artifacts sync" and now calls
  gstack-artifacts-init (which always prints the brain-admin hookup
  command — no auto-execute, codex Finding #3)
- Step 8 CLAUDE.md block branches: remote-http includes URL + server
  version (never the token); local-stdio keeps engine + config-file
- Step 9 smoke test on Path 4 prints the curl-equivalent for
  post-restart verification (MCP tools aren't visible mid-session)
- Step 10 verdict block has separate templates per mode

Idempotency: re-running with gbrain_mcp_mode=remote-http already in
detect output skips Step 2 entirely and goes to verification.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor: rename gbrain_sync_mode → artifacts_sync_mode (v1.27.0.0 prep)

Hard rename, no dual-read alias (codex Finding D4). The on-disk migration
script (Phase C, separate commit) renames the config key in users'
~/.gstack/config.yaml and any CLAUDE.md blocks.

Touched call sites:
- bin/gstack-config defaults + validation + list/defaults output
- bin/gstack-gbrain-detect (gstack_brain_sync_mode field still emitted
  with the same name for downstream-tool compat; reads new key)
- bin/gstack-brain-sync, bin/gstack-brain-enqueue, bin/gstack-brain-uninstall
- bin/gstack-timeline-log (comment ref)
- scripts/resolvers/preamble/generate-brain-sync-block.ts: renames key,
  branches on gbrain_mcp_mode=remote-http to emit "ARTIFACTS_SYNC:
  remote-mode (managed by brain server <host>)" instead of the local
  mode/queue/last_push line (codex Finding #11)
- bin/gstack-brain-restore + bin/gstack-gbrain-source-wireup: read
  ~/.gstack-artifacts-remote.txt with ~/.gstack-brain-remote.txt fallback
  during the migration window
- bin/gstack-artifacts-init: tolerant of unrecognized URL forms (local
  paths, file://, self-hosted gitea) so test infrastructure and unusual
  remotes work without canonicalization
- test/brain-sync.test.ts: gstack-brain-init → gstack-artifacts-init
- test/skill-e2e-brain-privacy-gate.test.ts: artifacts_sync_mode keys
- test/gen-skill-docs.test.ts: budget 35K → 36.5K for the new MCP-mode
  probe in the preamble resolver
- health/SKILL.md.tmpl, sync-gbrain/SKILL.md.tmpl: comment + verdict line

Hard delete:
- bin/gstack-brain-init (replaced by bin/gstack-artifacts-init in v1.27.0.0)
- test/gstack-brain-init-gh-mock.test.ts (replaced by gstack-artifacts-init.test.ts)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: regenerate SKILL.md files after artifacts-sync rename

Mechanical regen via \`bun run gen:skill-docs --host all\`. All */SKILL.md
files reflect the renamed config key (gbrain_sync_mode →
artifacts_sync_mode), the renamed remote-helper file
(~/.gstack-artifacts-remote.txt with brain fallback), the renamed init
script (gstack-artifacts-init), and the new ARTIFACTS_SYNC: remote-mode
status line that fires when a remote-http MCP is registered.

Golden fixtures (test/fixtures/golden/*-ship-SKILL.md) refreshed to match
the regenerated default-ship output.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat: v1.27.0.0 migration — gstack-brain → gstack-artifacts rename

Journaled, interruption-safe migration. Six steps, each writes to
~/.gstack/.migrations/v1.27.0.0.journal on success; re-entry resumes
from the next un-done step. On final success, journal is replaced by
~/.gstack/.migrations/v1.27.0.0.done.

Steps:
1. gh_repo_renamed       gh/glab repo rename gstack-brain-$USER →
                         gstack-artifacts-$USER (idempotent: detects
                         already-renamed and skips)
2. remote_txt_renamed    mv ~/.gstack-brain-remote.txt → artifacts file,
                         rewriting URL path to match the new repo name
3. config_key_renamed    sed -i in ~/.gstack/config.yaml flips
                         gbrain_sync_mode → artifacts_sync_mode
4. claude_md_block       sed flips "- Memory sync:" → "- Artifacts sync:"
                         in cwd CLAUDE.md and ~/.gstack/CLAUDE.md
5. sources_swapped       gbrain sources add NEW (verify) → remove OLD
                         (codex Finding #6: add-before-remove ordering,
                         no downtime window). On remote-MCP mode, prints
                         commands for the brain admin instead of executing.
6. done                  touchfile + delete journal

User opt-out: any "n" or "skip-for-now" answer at the initial prompt
writes a marker file that prevents re-prompting; user can re-invoke
via /setup-gbrain --rerun-migration.

11 unit tests cover: nothing-to-migrate, GitHub happy path, idempotent
re-run, journal-resume mid-flight, remote-MCP print-only path,
add-before-remove ordering verification, add-fail → old source stays
registered, CLAUDE.md field rewrite.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test: regression suite + E2E for v1.27.0.0 rename

Three new regression tests guard the rename's blast radius (per codex
Findings #1, #8, #9, #12):

- test/no-stale-gstack-brain-refs.test.ts: greps bin/, scripts/, *.tmpl,
  test/ for forbidden identifiers (gstack-brain-init, gbrain_sync_mode);
  fails CI if any non-allowlisted file references them.
- test/post-rename-doc-regen.test.ts: confirms gen-skill-docs output has
  no stale references in any */SKILL.md (the cross-product blind spot).
- test/setup-gbrain-path4-structure.test.ts: structural lint over the
  Path 4 prose contract — STOP gates after verify failure, never-write-
  token rules, mode-aware CLAUDE.md block, bearer always via env-var.

Two new gate-tier E2E tests (deterministic stub HTTP server, fixed inputs):

- test/skill-e2e-setup-gbrain-remote.test.ts: Path 4 happy path. Stubs
  an HTTP MCP server, drives the skill via Agent SDK with a stubbed
  bearer, asserts claude.json gets the http MCP entry, CLAUDE.md gets
  the remote-http block, the secret token NEVER leaks to CLAUDE.md.
- test/skill-e2e-setup-gbrain-bad-token.test.ts: stub server returns 401;
  asserts the AUTH classifier hint surfaces, no MCP registration occurs,
  CLAUDE.md is unchanged. Regression guard for the "verify failed → STOP"
  rule.

touchfiles.ts: setup-gbrain-remote and setup-gbrain-bad-token added at
gate-tier so CI catches Path 4 regressions on every PR.

Plus a few comment refs flipped: bin/gstack-jsonl-merge, bin/gstack-timeline-log
(legacy gstack-brain-init mentions in headers).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* release: v1.27.0.0 — /setup-gbrain Path 4 + brain → artifacts rename

Bumps VERSION 1.26.4.0 → 1.27.0.0 (MINOR per CLAUDE.md scale-aware bump
guidance: ~1500 line net change including a new path in /setup-gbrain,
two new bin helpers, a journaled migration, 59 new tests, and a config
key rename across the codebase).

CHANGELOG entry covers: Path 4 (Remote MCP) end-to-end, the brain →
artifacts rename, the journaled migration, the verify-helper error
classifier, the artifacts-init multi-host provider choice. Includes
the canonical Garry-voice headline + numbers table + audience close
per the release-summary format.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test: demote setup-gbrain Path 4 E2E to periodic-tier

The Agent SDK E2E tests for Path 4 (skill-e2e-setup-gbrain-remote and
skill-e2e-setup-gbrain-bad-token) are inherently non-deterministic —
the model interprets "follow Path 4 only" prompts flexibly and can
skip Step 8 (CLAUDE.md write) or shortcut past the verify helper, which
makes the gate-tier assertions flaky.

The deterministic gate coverage for Path 4 is in
test/setup-gbrain-path4-structure.test.ts: a fast structural lint that
catches AUQ-pacing regressions and prose contract drift in <200ms with
zero token spend. That test is the right tool for catching the failure
mode the gate-tier was meant to guard against.

The Agent SDK E2E tests stay available on-demand for periodic-tier runs
(EVALS=1 EVALS_TIER=periodic bun test test/skill-e2e-setup-gbrain-*.test.ts).
Also tightened the verify-error assertion to the literal field shape
("error_class": "AUTH") instead of a substring match that false-matches
the parent claude session's "needs-auth" MCP discovery markers.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: sync package.json version to 1.27.0.0

VERSION was bumped to 1.27.0.0 in f6ec11eb but package.json was not
updated in the same commit. The gen-skill-docs.test.ts assertion
"package.json version matches VERSION file" caught the drift.

This is the DRIFT_STALE_PKG case the /ship Step 12 idempotency check
is designed for; the fix is the documented sync-only repair (no
re-bump, package.json synced to existing VERSION).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* v1.27.1.0 fix: anti-shortcut clause + gate-tier AskUserQuestion floor tests for all plan-* skills (#1354)

* feat(test/helpers): runPlanSkillFloorCheck — minimal AskUserQuestion-floor observer

Adds a focused PTY observer that exits at the first non-permission
numbered-option render. Catches the May 2026 transcript-bug class
(model wrote plan + ExitPlanMode without firing any AUQ) without
needing to fingerprint or navigate past the AUQ.

Why separate from runPlanSkillCounting: plan-mode AUQs render every
option on a single logical line via cursor-positioning escapes that
stripAnsi can't simulate, so parseNumberedOptions returns < 2 options
and never records a fingerprint. Counting tests work on 25-min budgets
because eventually one frame parses cleanly; gate-tier floor tests
need to exit early on the first observation. Trades fingerprint
precision for early-exit reliability.

Also drops COMPLETION_SUMMARY_RE check from this helper — it matches
"GSTACK REVIEW REPORT" anywhere in the buffer including when the
agent does recon by reading existing plan files. plan_ready
(claude's actual "Ready to execute" confirmation) is the reliable
terminal signal for "agent finished without asking."

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(resolvers): generateAntiShortcutClause shared resolver

Adds {{ANTI_SHORTCUT_CLAUSE}} placeholder backed by a single resolver
function in scripts/resolvers/review.ts. Plan-* review skills can now
include the clause via one placeholder line in their .tmpl rather than
cloning the paragraph four times. Future tightening edits one resolver,
all four skills update on next gen-skill-docs.

Wired into the existing RESOLVERS map alongside generateReviewDashboard
and generatePlanFileReviewReport — no gen-skill-docs.ts change needed
because the generator already does generic placeholder substitution
against that map.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(plan-*-review): anti-shortcut clause in all four review skills

Inserts {{ANTI_SHORTCUT_CLAUSE}} placeholder immediately after the
**Anti-skip rule:** paragraph in plan-{eng,ceo,design,devex}-review
SKILL.md.tmpl. The four templates use different surrounding section
headers (eng "Review Sections (after scope is agreed)" vs ceo/design/devex
variants), so anchoring on the paragraph rather than the heading works
across all four.

Closes the May 2026 transcript-bug loophole: existing STOP gates name
forbidden actions only AFTER a per-section finding is identified. The
anti-shortcut clause adds the pre-emptive rule — "the plan file is the
OUTPUT of the interactive review, not a substitute for it" — covering
the case the transcript exhibited (skip per-section walk, dump every
finding into one plan write, call ExitPlanMode).

Regenerated SKILL.md for all hosts via bun run gen:skill-docs --host all.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test: gate-tier AskUserQuestion floor tests for all plan-* review skills

Adds 4 finding-floor tests (one per plan-* skill) that catch the May
2026 transcript-bug class — model wrote a plan and called ExitPlanMode
without firing any review-phase AskUserQuestion. Asserts via
runPlanSkillFloorCheck that ANY non-permission AUQ render fires before
the agent reaches plan_ready.

Verified:
- Eng floor: passed in 59s
- CEO floor: passed in 197s
- Design floor: passed
- Devex floor: passed
- Total ~$2-6 per CI run; only triggers on diff against the 4 plan-*
  templates, the shared resolver review.ts, the seeds fixture, or the
  PTY runner helper.

Fixtures live in test/fixtures/forcing-finding-seeds.ts, one constant
per skill. Each seed is engineered to force at least one obvious
finding under that skill's review focus (architectural smell for eng,
scope-creep for ceo, UI-slop for design, painful onboarding for devex).

Touchfiles wiring:
- E2E_TOUCHFILES: 4 plan-*-finding-floor entries with deps on the
  matching skill template, the shared resolver, the seeds fixture,
  and the PTY runner helper
- E2E_TIERS: all 4 entries marked 'gate'
- touchfiles.test.ts: count assertion bumped 21→22 with explicit
  plan-ceo-finding-floor containment check

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: bump version and changelog (v1.27.1.0)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* v1.28.0.0 feat: browse --headed/--proxy/--navigate + gstack/llms.txt + webdriver-only stealth (#1363)

* feat(browse): SOCKS5 bridge with auth + cred redaction helper

Adds browse/src/socks-bridge.ts: a 127.0.0.1-only SOCKS5 listener that
accepts unauthenticated connections from Chromium and relays them through
an authenticated upstream proxy. Chromium does not prompt for SOCKS5 auth
at launch, so this bridge is the workaround for using auth-required
residential SOCKS5 upstreams.

- startSocksBridge({ upstream, port: 0 }) → ephemeral 127.0.0.1 listener
- testUpstream({ upstream, retries: 3, backoffMs: 500, budgetMs: 5000 })
  pre-flight that connects to a known endpoint (default 1.1.1.1:443)
- Stream-error policy: kill affected client + upstream sockets on any
  error mid-stream; no transport retries (a transport-layer retry can
  corrupt browser traffic)

Adds browse/src/proxy-redact.ts: single source of truth for redacting
credentials in any logged proxy URL or upstream config. Every code path
that prints proxy config goes through this helper.

Adds the socks npm dep (~30KB) and 16 tests covering: 127.0.0.1-only
bind, byte-for-byte round trip through the bridge, auth rejection,
mid-stream upstream drop kills client conn, listener teardown,
testUpstream success + retry-exhaust paths, redaction of every
credential shape.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(browse): --proxy and --headed flags wire bridge into daemon

Adds the global --proxy <url> and --headed flags to the browse CLI.
Resolves cred policy and routes the daemon launch through the SOCKS5
bridge (or pass-through for HTTP/HTTPS) before chromium.launch().

CLI (cli.ts):
- extractGlobalFlags() strips --proxy/--headed from argv, parses URL via
  Node URL class, validates D9 cred-mixing (env BROWSE_PROXY_USER/PASS
  + URL creds → exit 1 with hint), composes canonical proxy URL with
  resolved creds, computes a stable configHash for daemon-mismatch
- ensureServer() now reads existing daemon's configHash from state file
  and refuses (exit 1 with disconnect hint) if --proxy/--headed mismatch
  the existing daemon. No silent restart that would drop tab state.
- All proxy-related stderr lines go through redactProxyUrl

proxy-config.ts (new):
- parseProxyConfig() — URL parser + D9 cred-mixing detector + scheme allowlist
- computeConfigHash() — stable hash of (proxy URL minus creds + headed flag)
- toUpstreamConfig() — map ParsedProxyConfig → socks-bridge.UpstreamConfig

Server (server.ts):
- Reads BROWSE_PROXY_URL at startup; for SOCKS5+auth, runs testUpstream
  pre-flight (5s budget, 3 retries, 500ms backoff) and exits 1 on failure
  with redacted error
- Spawns startSocksBridge() on 127.0.0.1:<ephemeral> and points
  Chromium at it via socks5://127.0.0.1:<port>
- HTTP/HTTPS or unauth SOCKS5 → pass-through to chromium.launch
  proxy.server (with username/password if present)
- State file gains optional configHash for daemon-mismatch check
- Bridge tears down via process.on('exit')

Browser manager (browser-manager.ts):
- New setProxyConfig({ server, username, password }) called by server.ts
  before launch
- chromium.launch() and both launchPersistentContext sites pass the
  proxy config through when set

Tests: 22 new across proxy-config (parse + cred-mixing + hash stability)
and extractGlobalFlags (flag stripping + cred-mixing rejection + cred
rotation hash stability + redaction).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(browse): Xvfb auto-spawn with PID + start-time validation

Adds browse/src/xvfb.ts: a Linux-only Xvfb auto-spawn module for
running headed Chromium in containers without DISPLAY. The module
walks a display range to pick a free one (never hardcodes :99) and
validates orphan PIDs by BOTH /proc/<pid>/cmdline matching 'Xvfb' AND
start-time matching the recorded value before sending any signal.
Defends against PID reuse — refuses to kill anything that doesn't
match both checks.

- shouldSpawnXvfb(env, platform) — pure decision: skip on macOS/Windows,
  on Linux skip when DISPLAY or WAYLAND_DISPLAY is set (codex F2)
- pickFreeDisplay(99..120) — probes via xdpyinfo
- spawnXvfb(display) — returns { pid, startTime, display } handle
- isOurXvfb(pid, startTime) — both-checks validator
- cleanupXvfb(state) — best-effort, validates ownership before SIGTERM

Wired into server.ts startup: when shouldSpawnXvfb says yes, picks a
free display, spawns Xvfb, sets DISPLAY for chromium.launchHeaded, and
records xvfbPid/xvfbStartTime/xvfbDisplay in the state file. Cleanup
runs on process.on('exit'). The CLI's disconnect path also runs
cleanupXvfb() in the force-cleanup branch when the server is dead.

Disconnect now applies to any non-default daemon (headed mode OR
configHash-tagged daemon — i.e. one started with --proxy/--headed),
not just headed mode.

Adds xvfb + x11-utils to .github/docker/Dockerfile.ci so CI exercises
the Linux container --headed path on every run. Without it the most
common production path would go untested.

Tests: 17 new across decision logic, PID validation defenses
(cmdline mismatch, start-time mismatch), no-op safety on bad inputs,
and a Linux+Xvfb-installed gate for the spawn → validate → cleanup
round trip. Tests skip on macOS/Windows automatically.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(browse): webdriver-mask stealth + Chromium-through-bridge e2e

D7 (codex narrowing): mask navigator.webdriver only via addInitScript.
The wintermute approach (fake plugins=[1..5], fake languages=['en-US',
'en'], stub window.chrome) is intentionally NOT applied — modern
fingerprinters check consistency between plugins.length, languages,
userAgent, and platform, and synthesizing fixed values can flag MORE
bot-like, not less. The honest minimum is webdriver, which Chromium
exposes as a known automation tell.

Adds browse/src/stealth.ts: single source of truth for the stealth
init script and launch args. Both browser-manager.launch() (headless)
and launchHeaded() (persistent context with extension) call
applyStealth(context) and pass STEALTH_LAUNCH_ARGS into chromium.launch.

The pre-existing launchHeaded stealth that did fake plugins/languages
is removed for the same reason. The cdc_/__webdriver runtime cleanup
and Permissions API patch are kept — they remove automation-injected
artifacts, not synthesize fake natural-browser values.

Adds bridge-chromium-e2e.test.ts (codex F3): the test that proves the
FEATURE works. Real Chromium with proxy.server = 'socks5://127.0.0.1:
<bridgePort>' navigates to a local HTTP fixture; the auth upstream's
connect counter and the HTTP fixture's hit counter both increment,
proving traffic actually traversed bridge → auth-upstream → destination.
Without this test, we could ship a working byte-relay and a broken
Chromium integration and never know.

Adds bridge-port-restart.test.ts (codex F1, reframed): old test
assumed two daemons coexist, which contradicts D2 single-daemon model.
Reframed as restart-then-restart, asserting fresh ephemeral ports
(never the hardcoded 1090) on each spin-up.

Adds stealth-webdriver.test.ts: navigator.webdriver=false in both
fresh contexts and persistent contexts; navigator.plugins/languages
are NOT replaced with the wintermute fake list (D7 verification).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(gstack): generate llms.txt — single-file capability index for AI agents

Adds scripts/gen-llms-txt.ts: produces gstack/llms.txt at repo root,
indexing every skill (47), every browse command (75), and design
commands when the design CLI is present. Per the llmstxt.org
convention, agents can read one file to learn what gstack offers
instead of crawling 47 SKILL.md files.

Sources:
- skill SKILL.md.tmpl frontmatter (name + description block scalar)
- browse/src/commands.ts COMMAND_DESCRIPTIONS (sorted by category)
- design/src/commands.ts COMMAND_DESCRIPTIONS if present (best-effort)

Wired into scripts/gen-skill-docs.ts as a post-step so it regenerates
on every `bun run gen:skill-docs` (the same script that re-emits all
SKILL.md files). Failures are non-fatal warnings, not build breaks —
the generator never blocks SKILL.md regen.

Strict mode (--strict, also used by tests) throws when a skill is
missing name or description in its frontmatter, catching missing
metadata before it ships.

Tests: shape (top-level sections, sort order, single-line summary
discipline), every-skill-and-command-appears, strict-mode rejection of
incomplete frontmatter, and freshness check that the committed
gstack/llms.txt matches what the generator produces now.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(browse): --navigate flag on download for browser-triggered files

Adds the --navigate strategy from community PR #1355 (originally from
@garrytan-agents). When set, download navigates to the URL with
waitUntil:'commit' and captures the resulting browser download via
page.waitForEvent('download'), then saves via download.saveAs().
Handles URLs that trigger files via Content-Disposition headers,
multi-hop CDN redirects requiring browser cookies, or anti-bot CDN
chains where page.request.fetch() can't follow the auth/redirect
chain.

Defaults still use the existing direct-fetch strategy. --navigate is
opt-in.

Goes through the same validateNavigationUrl SSRF gate as goto, so
download --navigate cannot reach IPv4 metadata endpoints (AWS IMDSv1,
GCP/Azure equivalents) or arbitrary internal hosts.

Inferred content type from suggested filename for common extensions
(epub, pdf, zip, gz, mp3/mp4, jpg/jpeg/png, txt, html, json) — falls
back to application/octet-stream. Same 200MB cap as Strategy 1.

Frames the use case generically (anti-bot CDN, Content-Disposition,
redirect chains) rather than naming any specific site, per project
voice rules.

Co-Authored-By: @garrytan-agents
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs: v1.28.0.0 — browse SKILL section + VERSION + CHANGELOG

VERSION 1.27.1.0 → 1.28.0.0 (MINOR — substantial new capability:
five new flags/features, ~600 LOC added, new socks dep, multiple
new modules).

browse/SKILL.md.tmpl: new "Headed Mode + Proxy + Anti-Bot Sites"
section between User Handoff and Snapshot Flags. Documents
--headed (auto-Xvfb on Linux), --proxy (with embedded SOCKS5
bridge for auth), download --navigate, the cred-mixing policy,
daemon-discipline (refuse-on-mismatch), the narrowed
webdriver-only stealth, container support caveats, and the
fail-fast/no-retry failure modes.

CHANGELOG entry follows the release-summary format from CLAUDE.md:
two-line headline, lead paragraph, "The numbers that matter"
table tied to specific test files that prove each capability,
"What this means for AI agents" closing tied to a real workflow
shift, then itemized Added/Changed/Fixed/For-contributors
sections.

Browse SKILL.md regenerated via bun run gen:skill-docs.
gstack/llms.txt regenerated automatically from the same pipeline.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(browse): integration coverage for daemon mismatch + proxy fail-fast

Adds two integration tests that exercise the full process boundary,
not just the module-level wiring.

daemon-mismatch-refuse.test.ts (D2):
- Stubs a healthy state file with a fake configHash and a fake /health
  HTTP server, runs the actual cli.ts binary with a mismatching
  --proxy, asserts exit 1 + 'different config' / 'browse disconnect'
  hint in stderr.
- Same shape with the plain-daemon-meets---headed case.
- Positive case: matching configHash → CLI does NOT emit the mismatch
  hint (regardless of whether the actual command succeeds).

server-proxy-fail-fast.test.ts:
- Starts the rejecting SOCKS5 upstream, spawns server.ts with
  BROWSE_PROXY_URL pointing at it, BROWSE_HEADLESS_SKIP=1 to skip
  Chromium launch.
- Asserts exit 1, 'FAIL upstream' in stderr (testUpstream pre-flight
  ran), no raw credential leakage in any output (redaction works on
  the failure path), and exit within 30s upper bound.

Both tests use the existing spawn-bun-cli pattern from
commands.test.ts so they run on the same CI infrastructure as the
rest of the bun test suite.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(gen-skill-docs): keep module sync so test require() still works

Two regressions caught by the full test suite after the v1.28.0.0
landing pass:

1) package.json version mismatch — VERSION was bumped to 1.28.0.0
   but package.json still pinned to 1.27.1.0.
   test/gen-skill-docs.test.ts asserts they match.

2) Top-level await in scripts/gen-llms-txt.ts (CLI entry block) and
   scripts/gen-skill-docs.ts (post-step) made gen-skill-docs an
   async module. test/gen-skill-docs.test.ts uses require() to pull
   extractVoiceTriggers/processVoiceTriggers from gen-skill-docs,
   which Bun rejects on async modules with:
     "TypeError: require() async module ... unsupported.
      use 'await import()' instead."

Fix: wrap the await blocks in void IIFEs so the modules remain sync
from a require() perspective.

After fix: all 379 gen-skill-docs tests pass, all 77 new feature
tests pass (3 skipped on macOS — Linux+Xvfb gates).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(browse): apply codex adversarial findings on the new lifecycle

Codex outside-voice review caught five real production-failure modes in
the v1.28.0.0 proxy/headed lifecycle. Fixed:

1) `browse disconnect` skip-graceful for proxy-only daemons
   (browse/src/cli.ts). The graceful /command POST went out with stray
   `domains,` shorthand and (even fixed) the server's disconnect handler
   only tears down headed mode — proxy-only daemons returned 200 "Not
   in headed mode" while leaving the bridge running. Now disconnect
   short-circuits to force-cleanup for non-headed daemons, which kicks
   process.on('exit') in server.ts to close the bridge + Xvfb.

2) sendCommand crash retry preserves --proxy / --headed
   (browse/src/cli.ts). The ECONNRESET retry path called startServer()
   with no extraEnv, silently dropping the proxied flags. A daemon that
   died mid-command would silently restart in default direct/headless
   mode and bypass the SOCKS bridge. Now reapplies BROWSE_PROXY_URL,
   BROWSE_HEADED, and BROWSE_CONFIG_HASH from the resolved global flags.

3) `connect` honors --proxy (browse/src/cli.ts). The headed-mode
   `connect` command built its own serverEnv that didn't include
   BROWSE_PROXY_URL, so `browse --proxy <url> connect` launched headed
   Chromium without the proxy. Now threads proxyUrl + configHash into
   the connect serverEnv.

4) SOCKS5 bridge handles fragmented TCP frames
   (browse/src/socks-bridge.ts). Previously used once('data') and
   parsed each chunk as a complete SOCKS5 frame — TCP doesn't preserve
   message boundaries and split greetings/CONNECT requests caused
   intermittent handshake failures. Replaced with a single state
   machine that buffers chunks and uses size predicates on the SOCKS5
   header to know when a complete frame has arrived. Pauses the client
   socket during upstream connect and replays any remainder bytes
   into the upstream on success.

5) Xvfb cleanup-then-state-delete ordering
   (browse/src/server.ts). emergencyCleanup() previously deleted the
   state file BEFORE any Xvfb cleanup could read it, orphaning Xvfb
   on uncaughtException / unhandledRejection. Now reads the state
   file first, calls cleanupXvfb() (which validates cmdline +
   start-time before kill), then deletes the state file.

Adds a regression test for #4: writes the SOCKS5 greeting + CONNECT
one byte at a time with 5ms ticks, asserts a clean round trip after
the fragmented handshake.

Codex's sixth finding (bridge advertises NO_AUTH on 127.0.0.1, so any
co-located process can use the authenticated upstream) is documented
as a known limitation — gstack's threat model assumes single-user
hosts. Adding bridge-side auth is a separate change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs: update BROWSER.md + TODOS.md for v1.28.0.0

BROWSER.md picks up a "Headed mode + proxy + browser-native downloads
(v1.28.0.0)" subsection inside Real-browser mode plus the new source-map
entries (socks-bridge.ts, proxy-config.ts, proxy-redact.ts, xvfb.ts,
stealth.ts). TODOS.md anti-bot-stealth item updated to reflect the v1.28
narrowing — the "fake plugins" line is no longer accurate.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(ci): include bun.lock in image build for deterministic install

CI evals all failed on PR #1363 with:
  error: Could not resolve: "smart-buffer". Maybe you need to "bun install"?
  error: Could not resolve: "ip-address". Maybe you need to "bun install"?
  at /opt/node_modules_cache/socks/build/client/socksclient.js:15

The cached node_modules layer in the pre-baked Docker image had
`socks` (the new dep) but was missing its transitive deps (smart-buffer,
ip-address). The image build copied only package.json into the build
context — without bun.lock, `bun install` resolved a different tree
than local `bun install` did, dropping required transitive deps.

Reproduces locally as 229 packages (correct) when bun.lock is present
or absent. Why CI diverged isn't fully understood — possibly Docker
layer cache reuse across image rebuilds — but the deterministic fix is
to include the lockfile in the image build context and use
`--frozen-lockfile`, matching what every CI doc recommends.

Changes:
- .github/docker/Dockerfile.ci: COPY bun.lock alongside package.json,
  switch `bun install` → `bun install --frozen-lockfile` so any future
  lockfile drift fails loudly during image build instead of producing
  a partially-installed cache that breaks downstream eval jobs.
- .github/workflows/evals.yml: include bun.lock in the image-tag hash
  so adding/removing a dep invalidates the image, AND copy bun.lock
  into the docker context alongside package.json.
- .github/workflows/evals-periodic.yml: same updates.
- .github/workflows/ci-image.yml: rebuild trigger now fires on bun.lock
  changes too; build context includes bun.lock.

Image hash changes → fresh image gets built on next CI run → install
matches the lockfile exactly → no missing transitive deps.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(ci): use hardlink copy instead of symlink for node_modules cache

After the bun.lock fix landed, the eval matrix STILL failed identically:
  Could not resolve: "smart-buffer" / "ip-address"
  at /opt/node_modules_cache/socks/build/client/socksclient.js

But the hash-tagged image actually contains smart-buffer + ip-address +
socks all flat in /opt/node_modules_cache (verified by pulling and
inspecting the image). 207 packages, all present.

Root cause: the workflow used `ln -s /opt/node_modules_cache node_modules`
to restore deps. Bun build (and Node module resolution generally) walks
a file's realpath to find sibling deps. From the symlinked
/workspace/node_modules/socks/build/client/socksclient.js, realpath
resolves to /opt/node_modules_cache/socks/build/client/socksclient.js,
and walking up to find a node_modules/smart-buffer dir fails — there's
no `node_modules` segment in the realpath.

Switch `ln -s` → `cp -al` (hardlink-copy). Each file in the cache becomes
a hardlink at /workspace/node_modules/<pkg>, sharing inodes (no data
copy). Realpath of /workspace/node_modules/socks/.../socksclient.js
stays inside /workspace/node_modules, so sibling deps resolve correctly.

Speed is comparable to symlink — `cp -al` on ~200 packages on tmpfs is
sub-second. Same caching story preserved.

Both evals.yml and evals-periodic.yml updated.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(ci): cp -r instead of cp -al — /opt and /workspace are different filesystems

The hardlink-copy fix landed and immediately broke with:
  cp: cannot create hard link 'node_modules/<file>' to
      '/opt/node_modules_cache/<file>': Invalid cross-device link

GitHub Actions runners mount the workspace volume at /workspace
(overlay-fs layered onto the runner image), and /opt is the runner
image's own filesystem. Cross-filesystem hardlinks aren't supported.

Switch `cp -al` → `cp -r`. Cost: ~5s for ~200 packages of small JS
files vs ~0s for the broken symlink. Still cheaper than the ~15s
`bun install` fallback. Realpath of /workspace/node_modules/<pkg>/...
stays inside /workspace, so bun build's sibling-dep resolution works.

Both evals.yml and evals-periodic.yml updated.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* v1.29.0.0 feat: worktree-aware gbrain code sources via path-hash IDs and CWD pin (#1382)

* feat: worktree-aware gbrain code sources via path-hash IDs and CWD pin

Conductor sibling worktrees of the same repo no longer collide on a shared
gstack-code-<slug> source ID. /sync-gbrain now derives a path-hashed source
ID per worktree, runs gbrain sources attach to write .gbrain-source in the
worktree root, and removes the legacy unsuffixed source on first new-format
sync to prevent orphan accumulation.

Bug fixes surfaced by /codex during /ship:
- Silent attach failure now treated as stage failure (no more ok:true while
  pin is missing → unqualified code-def hits wrong source).
- Startup preamble checks .gbrain-source in the cwd worktree, not global
  state, so an unsynced worktree no longer claims "indexed" because a
  sibling synced.
- Code stage no longer skipped on remote-MCP (Path 4); the early-exit was
  in the SKILL template, not the orchestrator.
- Source registration routes through lib/gbrain-sources.ts only; deleted
  the near-duplicate ensureSourceRegisteredSync from the orchestrator.

Requires gbrain v0.30.0+ (uses sources attach). Phase 0 spike report:
~/.gstack/projects/garrytan-gstack/2026-05-08-gbrain-split-engine-spike.md

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* chore: bump version and changelog (v1.29.0.0)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

* v1.30.0.0 fix wave: 21 community PRs + Windows CI extension + codex flag-semantics smoke (#1391)

* fix(codex): use resume-compatible flags

* fix: V-001 security vulnerability

Automated security fix generated by Orbis Security AI

* docs: align prompt-injection thresholds to security.ts (v1.6.4.0 catch-up)

CLAUDE.md:290 and ARCHITECTURE.md:159 were missed when WARN was bumped
0.60 → 0.75 in d75402bb (v1.6.4.0, "cut Haiku classifier FP from 44% to
23%, gate now enforced", #1135). browse/src/security.ts:37 has WARN: 0.75
and BROWSER.md:743 was updated alongside that commit; CLAUDE.md and
ARCHITECTURE.md still read 0.60.

Also adds the SOLO_CONTENT_BLOCK: 0.92 entry to CLAUDE.md (already in
security.ts:50 and BROWSER.md:745, missing from CLAUDE.md's threshold
table).

No code change. No behavior change. Pure doc-vs-code alignment.

Verification:
  $ grep -n "WARN" browse/src/security.ts CLAUDE.md ARCHITECTURE.md BROWSER.md
  browse/src/security.ts:37:  WARN: 0.75,
  CLAUDE.md:290: - \`WARN: 0.75\` ...
  ARCHITECTURE.md:159: ...>= \`WARN\` (0.75)...
  BROWSER.md:743: - \`WARN: 0.75\` ...

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: Korean/CJK IME input and rendering in Sidebar Terminal

Fixes #1272

This commit addresses three separate Korean/CJK bugs in the Sidebar Terminal:

**Bug 1 - IME Input**: Korean text typed via IME composition was not
reaching the PTY correctly. Added compositionstart/compositionend event
listeners to suppress partial jamo fragments and only send the final
composed string.

**Bug 2a - Font Rendering**: Added CJK monospace font fallbacks
("Noto Sans Mono CJK KR", "Malgun Gothic") to both the xterm.js
fontFamily config and the CSS --font-mono variable. This ensures
consistent cell-width calculations for Korean characters.

**Bug 2b - UTF-8 Boundary Detection**: Added buffering logic to prevent
multi-byte UTF-8 characters (Korean is 3 bytes) from being split across
WebSocket chunks. This follows the same pattern as PR #1007 which fixed
the sidebar-agent path, but extends it to the terminal-agent path.

Special thanks to @ldybob for the excellent root cause analysis and
proposed solutions in issue #1272.

Tested on WSL2 + Windows 11 with Korean IME.

* fix(ship): tighten Plan Completion gate (VAS-449 remediation)

VAS-446 shipped with a PLAN.md acceptance criterion (domain-hq has
/docs/dashboard.md) silently skipped. /ship's Plan Completion subagent
existed at ship time (added in v1.4.1.0) but the gate let the failure
through. Four structural fixes:

1. Path concreteness rule: items naming a concrete filesystem path MUST
   be classified DONE/NOT DONE via [ -f <path> ], never UNVERIFIABLE.
2. Validator detection: CONTENT-SHAPE items scan target repo's
   package.json for validate-* scripts and run them before falling back
   to UNVERIFIABLE.
3. Per-item UNVERIFIABLE confirmation: replaces blanket "I've checked
   each one" with per-item Y/N/D loop. The blanket-confirm path is the
   exact failure VAS-449 surfaced.
4. Subagent fail-closed: if Plan Completion subagent + inline fallback
   both fail, surface explicit AskUserQuestion instead of silent pass.
   Replaces the prior "Never block /ship on subagent failure" fail-open.

Locked in by test/ship-plan-completion-invariants.test.ts (5 assertions,
no LLM dependency, ~60ms).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(browse): bash.exe wrap for telemetry on Windows

reportAttemptTelemetry() in browse/src/security.ts calls spawn(bin, args)
where bin is the gstack-telemetry-log bash script. On Windows this fails
silently with ENOENT — CreateProcess can't dispatch on shebang lines.

Adopts v1.24.0.0's Bun.which + GSTACK_*_BIN override pattern (from
browse/src/claude-bin.ts:resolveClaudeCommand, introduced in #1252) for
resolving bash.exe. resolveBashBinary() honors GSTACK_BASH_BIN absolute-path
or PATH-resolvable override, falling back to Bun.which('bash') which finds
Git Bash on the standard Windows install.

buildTelemetrySpawnCommand() wraps the script invocation on win32 only;
POSIX path is bit-identical. Returns null when bash can't be resolved on
Windows so caller skips spawn — local attempts.jsonl audit trail keeps
working without surfacing a Windows-only failure.

8 new unit tests cover resolveBashBinary (POSIX bash, absolute override,
quote-stripping, BASH_BIN fallback, empty-PATH null) and buildTelemetrySpawnCommand
(POSIX pass-through, win32 bash wrap, win32 null on unresolvable, arg-array
immutability).

POSIX path is bit-identical — Bun.which('bash') on Linux/macOS returns the
same /bin/bash or /usr/bin/bash that the old hardcoded spawn relied on.

* fix(make-pdf): Bun.which-based binary resolution for browse + pdftotext on Windows

Extends v1.24.0.0's Bun.which + GSTACK_*_BIN override pattern (introduced in
browse/src/claude-bin.ts via #1252) to the two other binary resolvers in the
codebase: make-pdf/src/browseClient.ts:resolveBrowseBin and
make-pdf/src/pdftotext.ts:resolvePdftotext.

Same Windows quirks (fs.accessSync(X_OK) degrades to existence-check; `which`
isn't available outside Git Bash; bun --compile --outfile X emits X.exe), same
Bun.which-based fix shape, same env override convention.

Changes:
  - GSTACK_BROWSE_BIN / GSTACK_PDFTOTEXT_BIN as the v1.24-aligned overrides;
    BROWSE_BIN / PDFTOTEXT_BIN remain as back-compat aliases.
  - Bun.which() replaces execFileSync('which', ...) for PATH lookup. Handles
    Windows PATHEXT natively; no more `where`-vs-`which` branch.
  - findExecutable(base) helper exported from each module, probes .exe/.cmd/.bat
    after the bare-path miss on win32. Linux/macOS behavior is bit-identical
    (isExecutable short-circuits before the win32 branch ever runs).
  - macCandidates renamed posixCandidates (always was — /opt/homebrew, /usr/local,
    /usr/bin). No Windows candidates added; Poppler installs scatter across
    Scoop/Chocolatey/portable zips and guessing causes false positives.
  - Error messages get a Windows install hint (scoop install poppler / oschwartz10612)
    and `setx` example for GSTACK_*_BIN.
  - Pre-existing test 'honors BROWSE_BIN when it points at a real executable'
    was hardcoded /bin/sh — made cross-platform via a REAL_EXE constant
    (cmd.exe on win32, /bin/sh on POSIX). Was a Windows-CI blocker on its own.

Coordination: PR #1094 (@BkashJEE) covered browseClient.ts independently with a
narrower scope; this PR's pdftotext + cross-platform tests + GSTACK_*_BIN naming
are additive. Either order of merge works.

Test plan:
  - bun test make-pdf/test/browseClient.test.ts make-pdf/test/pdftotext.test.ts
    on win32 — 29 pass, 0 fail (12 new assertions: findExecutable POSIX/win32/null,
    resolveBrowseBin GSTACK_BROWSE_BIN + BROWSE_BIN + precedence + quote-strip,
    same shape for resolvePdftotext + Windows install hint in error message).
  - POSIX branch unchanged — fs.accessSync(X_OK) on Linux/macOS short-circuits
    before any win32 logic runs, matching the v1.24 claude-bin.ts pattern.

* fix(browse): NTFS ACL hardening for Windows state files via icacls

gstack's ~/.gstack/ state directory holds bearer tokens, canary tokens, agent
queue contents (with prompt history), session state, security-decision logs,
and saved cookie bundles — all written with { mode: 0o600 } / 0o700. On Windows,
those mode bits are a silent no-op: Node's fs module doesn't translate POSIX
modes to NTFS ACLs, and inherited ACLs leave every "restricted" file readable
by other principals on the machine (verified via icacls — six ACEs, the
intended user is the LAST of six).

Threat model is non-trivial on:
  - Self-hosted CI runners (different service account on the same Windows box
    can read developer tokens, canary tokens, prompt history)
  - Shared development machines (agencies, studios, lab environments)
  - Multi-tenant servers with shared home directories

Orthogonal to v1.24.0.0's binary-resolution work — complementary at the write
side. v1.24's bin/gstack-paths resolves ~/.gstack/ correctly across plugin /
global / local installs; this PR ensures files written into those resolved
paths actually get the POSIX 0o600 semantic translated to NTFS.

The fix:
  - New browse/src/file-permissions.ts (158 LOC, 5 public + 1 test-reset).
    restrictFilePermissions / restrictDirectoryPermissions wrap chmod (POSIX)
    or icacls /inheritance:r /grant:r <user>:(F) (Windows). writeSecureFile /
    appendSecureFile / mkdirSecure are drop-in wrappers for the common patterns.
  - 19 call sites converted across 9 source files: browser-manager.ts,
    browser-skill-write.ts, cli.ts, config.ts, meta-commands.ts,
    security-classifier.ts, security.ts (4 sites), server.ts (5 sites),
    terminal-agent.ts (8 sites), tunnel-denial-log.ts.
  - (OI)(CI) inheritance flags on directories mean files created via fs.write*
    *inside* an mkdirSecure-created dir inherit the owner-only ACL automatically
    — important for tunnel-denial-log.ts where appends use async fsp.appendFile.

Error handling: icacls failures (nonexistent path, missing icacls.exe, hardened
environments) log a one-shot warning to stderr and proceed. Once-per-process
gating prevents log spam if the condition persists. Filesystem stays
functional; the file just ends up with inherited ACLs.

Test plan:
  - bun test browse/test/file-permissions.test.ts — 13 pass, 0 fail (POSIX
    mode-bit assertions, Windows no-throw, mkdir idempotence, recursive
    creation, Buffer payloads, append-creates-then-reapplies-once semantics)
  - bun test browse/test/security.test.ts — 38 pass, 0 fail (existing security
    test suite plus the bash-binary resolution tests added in fix #1119; the
    converted writeFileSync/appendFileSync/mkdirSync sites in security.ts
    integrate cleanly)
  - Empirical icacls before/after on a real file — 6 ACEs → 1 ACE
  - bun build typecheck on all modified files — clean (server.ts has a
    pre-existing playwright-core/electron resolution issue unrelated to this PR)

POSIX behavior is bit-identical to old code — fs.chmodSync(path, 0o6XX) on the
helper's POSIX branch matches the inline { mode: 0o6XX } it replaces. Linux
and macOS see no behavior change.

Inviting pushback on three judgment calls (in PR description):
  1. icacls vs npm library
  2. ACL scope — just user, or user + SYSTEM?
  3. Graceful degradation — once-per-process warn, not silent, not hard-fail.

* fix(browse): declare lastConsoleFlushed to restore console-log persistence

flushBuffers() references a `lastConsoleFlushed` cursor at server.ts:337
and assigns it at :344, but the `let lastConsoleFlushed = 0;`
declaration is missing — only the network and dialog siblings are
declared at lines 327-328.

Result: every 1-second flushBuffers tick (line 376) throws
`ReferenceError: lastConsoleFlushed is not defined`, gets swallowed by
the catch at line 369 ("[browse] Buffer flush failed: ..."), and the
console branch's append never runs. browse-console.log is never
written in any production deployment since this regressed.

Discovered by stress-testing the daemon with 15 concurrent CLIs against
cold state — the race surfaced the buffer-flush error spam in one
spawned daemon's stderr. Verified by running the daemon against a real
file:// page with console.log events: in-memory `browse console`
returns the entries, but `.gstack/browse-console.log` is never created
on disk.

Regression introduced by 1a100a2a "fix: eliminate duplicate command
sets in chain, improve flush perf and type safety" — the flush refactor
switched from `Bun.write` to `fs.appendFileSync` and added the
`lastConsoleFlushed` cursor pattern alongside its network/dialog
siblings, but missed the matching `let` declaration. Tests don't
currently exercise flushBuffers, so the regression shipped silently.

Fix:
  - Declare `let lastConsoleFlushed = 0;` next to `lastNetworkFlushed`
    and `lastDialogFlushed` (browse/src/server.ts:327)
  - Add a source-level guard test
    (browse/test/server-flush-trackers.test.ts) that fails any future
    refactor that adds a fourth `last*Flushed` cursor without the
    matching declaration. Same pattern as terminal-agent.test.ts and
    dual-listener.test.ts — read source as text, assert invariant, no
    daemon required.

Test plan:
  - [x] New regression test fails on current main, passes with the fix
  - [x] `bun run build` clean
  - [x] Manual smoke: spawn daemon -> goto file:// page with
        console.log -> wait 4s -> .gstack/browse-console.log now
        exists with the expected entries (163 bytes vs zero before)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

* fix(browse): per-process state-file temp path to fix concurrent-write ENOENT

The daemon writes `.gstack/browse.json` via the standard atomic-rename
pattern: `writeFileSync(tmp, …) → renameSync(tmp, stateFile)`. Four
sites in server.ts use this pattern (initial daemon-startup state at
:2002, /tunnel/start handler at :1479, BROWSE_TUNNEL=1 inline tunnel
update at :2083, BROWSE_TUNNEL_LOCAL_ONLY=1 update at :2113), and all
four hard-code the same temp filename `${stateFile}.tmp`.

Under concurrent writers the shared filename races on the rename:

    t0  Writer A: writeFileSync(stateFile + '.tmp', payloadA)
    t1  Writer B: writeFileSync(stateFile + '.tmp', payloadB)   // overwrites A
    t2  Writer A: renameSync(stateFile + '.tmp', stateFile)    // moves B's payload
    t3  Writer B: renameSync(stateFile + '.tmp', stateFile)    // ENOENT — file gone

Reproduced empirically with 15 concurrent CLIs against a fresh `.gstack/`:

    [browse] Failed to start: ENOENT: no such file or directory,
    rename '…/.gstack/browse.json.tmp' -> '…/.gstack/browse.json'

Pre-fix success rate: **0 / 15** under cold-start race.
Post-fix success rate: **15 / 15**, zero ENOENT.

Fix:
  - New `tmpStatePath()` helper (server.ts:333) returns
    `${stateFile}.tmp.${pid}.${randomBytes(4).toString('hex')}`
  - All 4 call sites use `tmpStatePath()` instead of the shared literal
  - Atomic rename still gives last-writer-wins semantics on the final
    state.json content; only behavior change is that concurrent writers
    no longer kill each other on the rename step

Source-level guard test (browse/test/server-tmp-state-path.test.ts)
locks two invariants: (1) no remaining `stateFile + '.tmp'` literals,
(2) every state-write `writeFileSync` call uses `tmpStatePath()`. Same
read-source-as-text pattern as terminal-agent.test.ts and
dual-listener.test.ts — no daemon required, runs in tier-1 free.

Test plan:
  - [x] Targeted source-level guard test passes (3 / 0)
  - [x] `bun run build` clean
  - [x] Live regression: 15 concurrent CLIs against cold state →
        15 / 15 healthy, 0 ENOENT (vs 0 / 15 pre-fix)
  - [x] No `.tmp.*` orphans left behind after rename succeeds
  - [x] Related test cluster (server-auth, dual-listener, cdp-mutex,
        findport) — same pre-existing flakes as `main`, no new
        regressions introduced

🤖 Generated with [Claude Code](https://claude.com/claude-code)

* fix(browse): clear refs when iframe auto-detaches in getActiveFrameOrPage

Asymmetric cleanup between two equivalent staleness conditions:

  onMainFrameNavigated()  →  clearRefs() + activeFrame = null  ✓
  getActiveFrameOrPage()  →  activeFrame = null  (refs NOT cleared)  ✗

Both paths see the same staleness condition — refs were captured
against a frame that no longer exists. The main-frame path correctly
clears both pieces of state. The iframe-detach path nulls the frame
but leaves the refMap intact.

The lazy click-time check in `resolveRef` (tab-session.ts:97) partially
saves us — `entry.locator.count()` on a detached-frame locator throws
or returns 0, so the click errors out as "Ref X is stale". But the
user has no signal that frame context silently changed underfoot: the
next `snapshot` runs against `this.page` (main) while old iframe refs
still litter `refMap` with the same role+name keys. New refs collide
with stale ones, the resolver picks one at random, the user clicks
the wrong element.

TODOS.md line 816-820 documents "Detached frame auto-recovery" as a
shipped iframe-support feature in v0.12.1.0. This restores the
documented intent — the recovery should leave the session in a clean
state, not a half-cleared one.

Fix: 1 line — add `this.clearRefs()` next to `this.activeFrame = null`
inside the if-branch.

Test plan:
  - [x] New regression test: 4/4 pass
        - refs cleared when getActiveFrameOrPage detects detached iframe
        - refs preserved when active frame is still attached (no regression)
        - refs preserved when no frame set (page-level path untouched)
        - matches onMainFrameNavigated symmetry — both paths reach the
          same clean end state
  - [x] `bun run build` clean

🤖 Generated with [Claude Code](https://claude.com/claude-code)

* fix(codex): resolve python for JSON parser

* fix: add fail-fast probe for base branch in ship step 12

* fix(plan-devex-review): remove contradictory plan-mode handshake

* fix(design): honor Retry-After header in variants 429 handler

Closes #1244.

The 429 handler in `generateVariant` discarded the `Retry-After` response
header and fell straight through to a local exponential schedule (2s/4s/8s).
In image-generation batches, that burns retry attempts inside the provider's
cooldown window and the request never recovers.

Now we parse `Retry-After` per RFC 7231 — both delta-seconds (`Retry-After: 5`)
and HTTP-date (`Retry-After: Fri, 31 Dec 1999 23:59:59 GMT`). Honored waits
are capped at 60s to bound stalls from hostile or buggy headers. Delta-seconds
are validated as digits-only (rejects `2abc`). When `Retry-After` is honored
(including 0 / past-date "retry now"), the next iteration's leading exponential
sleep is skipped so we don't double-wait. Invalid or missing headers fall
through to the existing exponential schedule unchanged.

Behavior matrix:

| Header                          | Behavior                                  |
|---------------------------------|-------------------------------------------|
| Retry-After: 5                  | wait 5s, skip leading on next attempt     |
| Retry-After: 999999             | capped to 60s, skip leading               |
| Retry-After: 2abc               | invalid, fall through to exponential      |
| Retry-After: 0                  | wait 0, skip leading (retry immediately)  |
| Retry-After: <past HTTP-date>   | wait 0, skip leading                      |
| Retry-After: <future date>      | wait diff capped at 60s, skip leading     |
| no header                       | fall through to existing exponential      |

`generateVariant` now accepts an optional `fetchFn` parameter (defaults to
`globalThis.fetch`) so tests can inject a stub. Production call sites are
unchanged.

Tests cover the five behavior buckets above, asserting both the 1st-to-2nd
call timing gap and call counts. All five pass in ~8s.

Co-Authored-…
mathiasmora2232 pushed a commit to mathiasmora2232/gstack that referenced this pull request May 30, 2026
… rename (garrytan#1351)

* feat: gstack-gbrain-mcp-verify helper for remote MCP probe

Probes a remote gbrain MCP endpoint with bearer auth. POSTs initialize,
classifies failures into NETWORK / AUTH / MALFORMED with one-line
remediation hints, and runs a tools/list capability probe to detect
sources_add MCP support (forward-compat for when gbrain ships URL ingest).

Token consumed from GBRAIN_MCP_TOKEN env, never argv. Required to set
both 'application/json' AND 'text/event-stream' in Accept; that gotcha
costs 10 minutes of debugging when missed (regression-tested).

Live-verified against wintermute (gbrain v0.27.1).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat: gstack-artifacts-init + gstack-artifacts-url helpers

artifacts-init replaces brain-init with provider choice (gh / glab /
manual), per-user gstack-artifacts-$USER repo, HTTPS-canonical storage in
~/.gstack-artifacts-remote.txt, and a "send this to your brain admin"
hookup printout. Always prints the command, never auto-executes — gbrain
v0.26.x has no admin-scope MCP probe (codex Finding garrytan#3).

artifacts-url centralizes HTTPS↔SSH/host/owner-repo conversion so callers
don't each string-mangle (codex Finding garrytan#10). The remote-conflict check in
artifacts-init compares at the canonical level so re-running with HTTPS
input doesn't trip on a stored SSH URL for the same logical repo.

The "URL form not supported" branch prints a two-line clone-then-path
form for gbrain v0.26.x; the supported branch is a one-liner with --url
ready for when gbrain ships URL ingest.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat: extend gstack-gbrain-detect with mcp_mode + artifacts_remote

Adds two new fields to detect's JSON output:

- gbrain_mcp_mode: local-stdio | remote-http | none
  Resolved via 3-tier fallback (codex Finding D3): claude mcp get --json
  → claude mcp list text-grep → ~/.claude.json jq read. If Anthropic moves
  the file format, the first two tiers absorb it.

- gstack_artifacts_remote: HTTPS URL from ~/.gstack-artifacts-remote.txt
  Falls back to ~/.gstack-brain-remote.txt during the v1.27.0.0 migration
  window so detect doesn't return empty between upgrade and migration.

Existing detect tests still pass (15/15). New 19 tests cover every fallback
tier independently, plus a schema regression for /sync-gbrain compat.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat: setup-gbrain Path 4 (remote MCP) + artifacts rename

Path 4 lets users paste an HTTPS MCP URL + bearer token and registers it
as an HTTP-transport MCP without needing a local gbrain CLI install. The
flow:

- Step 2 gains a fourth option (Remote gbrain MCP)
- Step 4 adds Path 4 sub-flow: collect URL, secret-read bearer, verify
  via gstack-gbrain-mcp-verify (NETWORK / AUTH / MALFORMED classifier)
- Step 5 (local doctor), Step 7.5 (transcript ingest), Step 5a's stdio
  branch all skip on Path 4
- Step 5a adds an HTTP+bearer registration form: claude mcp add
  --transport http --header "Authorization: Bearer ..."
- Step 7 renamed "session memory sync" → "artifacts sync" and now calls
  gstack-artifacts-init (which always prints the brain-admin hookup
  command — no auto-execute, codex Finding garrytan#3)
- Step 8 CLAUDE.md block branches: remote-http includes URL + server
  version (never the token); local-stdio keeps engine + config-file
- Step 9 smoke test on Path 4 prints the curl-equivalent for
  post-restart verification (MCP tools aren't visible mid-session)
- Step 10 verdict block has separate templates per mode

Idempotency: re-running with gbrain_mcp_mode=remote-http already in
detect output skips Step 2 entirely and goes to verification.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor: rename gbrain_sync_mode → artifacts_sync_mode (v1.27.0.0 prep)

Hard rename, no dual-read alias (codex Finding D4). The on-disk migration
script (Phase C, separate commit) renames the config key in users'
~/.gstack/config.yaml and any CLAUDE.md blocks.

Touched call sites:
- bin/gstack-config defaults + validation + list/defaults output
- bin/gstack-gbrain-detect (gstack_brain_sync_mode field still emitted
  with the same name for downstream-tool compat; reads new key)
- bin/gstack-brain-sync, bin/gstack-brain-enqueue, bin/gstack-brain-uninstall
- bin/gstack-timeline-log (comment ref)
- scripts/resolvers/preamble/generate-brain-sync-block.ts: renames key,
  branches on gbrain_mcp_mode=remote-http to emit "ARTIFACTS_SYNC:
  remote-mode (managed by brain server <host>)" instead of the local
  mode/queue/last_push line (codex Finding garrytan#11)
- bin/gstack-brain-restore + bin/gstack-gbrain-source-wireup: read
  ~/.gstack-artifacts-remote.txt with ~/.gstack-brain-remote.txt fallback
  during the migration window
- bin/gstack-artifacts-init: tolerant of unrecognized URL forms (local
  paths, file://, self-hosted gitea) so test infrastructure and unusual
  remotes work without canonicalization
- test/brain-sync.test.ts: gstack-brain-init → gstack-artifacts-init
- test/skill-e2e-brain-privacy-gate.test.ts: artifacts_sync_mode keys
- test/gen-skill-docs.test.ts: budget 35K → 36.5K for the new MCP-mode
  probe in the preamble resolver
- health/SKILL.md.tmpl, sync-gbrain/SKILL.md.tmpl: comment + verdict line

Hard delete:
- bin/gstack-brain-init (replaced by bin/gstack-artifacts-init in v1.27.0.0)
- test/gstack-brain-init-gh-mock.test.ts (replaced by gstack-artifacts-init.test.ts)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: regenerate SKILL.md files after artifacts-sync rename

Mechanical regen via \`bun run gen:skill-docs --host all\`. All */SKILL.md
files reflect the renamed config key (gbrain_sync_mode →
artifacts_sync_mode), the renamed remote-helper file
(~/.gstack-artifacts-remote.txt with brain fallback), the renamed init
script (gstack-artifacts-init), and the new ARTIFACTS_SYNC: remote-mode
status line that fires when a remote-http MCP is registered.

Golden fixtures (test/fixtures/golden/*-ship-SKILL.md) refreshed to match
the regenerated default-ship output.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat: v1.27.0.0 migration — gstack-brain → gstack-artifacts rename

Journaled, interruption-safe migration. Six steps, each writes to
~/.gstack/.migrations/v1.27.0.0.journal on success; re-entry resumes
from the next un-done step. On final success, journal is replaced by
~/.gstack/.migrations/v1.27.0.0.done.

Steps:
1. gh_repo_renamed       gh/glab repo rename gstack-brain-$USER →
                         gstack-artifacts-$USER (idempotent: detects
                         already-renamed and skips)
2. remote_txt_renamed    mv ~/.gstack-brain-remote.txt → artifacts file,
                         rewriting URL path to match the new repo name
3. config_key_renamed    sed -i in ~/.gstack/config.yaml flips
                         gbrain_sync_mode → artifacts_sync_mode
4. claude_md_block       sed flips "- Memory sync:" → "- Artifacts sync:"
                         in cwd CLAUDE.md and ~/.gstack/CLAUDE.md
5. sources_swapped       gbrain sources add NEW (verify) → remove OLD
                         (codex Finding garrytan#6: add-before-remove ordering,
                         no downtime window). On remote-MCP mode, prints
                         commands for the brain admin instead of executing.
6. done                  touchfile + delete journal

User opt-out: any "n" or "skip-for-now" answer at the initial prompt
writes a marker file that prevents re-prompting; user can re-invoke
via /setup-gbrain --rerun-migration.

11 unit tests cover: nothing-to-migrate, GitHub happy path, idempotent
re-run, journal-resume mid-flight, remote-MCP print-only path,
add-before-remove ordering verification, add-fail → old source stays
registered, CLAUDE.md field rewrite.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test: regression suite + E2E for v1.27.0.0 rename

Three new regression tests guard the rename's blast radius (per codex
Findings garrytan#1, garrytan#8, garrytan#9, garrytan#12):

- test/no-stale-gstack-brain-refs.test.ts: greps bin/, scripts/, *.tmpl,
  test/ for forbidden identifiers (gstack-brain-init, gbrain_sync_mode);
  fails CI if any non-allowlisted file references them.
- test/post-rename-doc-regen.test.ts: confirms gen-skill-docs output has
  no stale references in any */SKILL.md (the cross-product blind spot).
- test/setup-gbrain-path4-structure.test.ts: structural lint over the
  Path 4 prose contract — STOP gates after verify failure, never-write-
  token rules, mode-aware CLAUDE.md block, bearer always via env-var.

Two new gate-tier E2E tests (deterministic stub HTTP server, fixed inputs):

- test/skill-e2e-setup-gbrain-remote.test.ts: Path 4 happy path. Stubs
  an HTTP MCP server, drives the skill via Agent SDK with a stubbed
  bearer, asserts claude.json gets the http MCP entry, CLAUDE.md gets
  the remote-http block, the secret token NEVER leaks to CLAUDE.md.
- test/skill-e2e-setup-gbrain-bad-token.test.ts: stub server returns 401;
  asserts the AUTH classifier hint surfaces, no MCP registration occurs,
  CLAUDE.md is unchanged. Regression guard for the "verify failed → STOP"
  rule.

touchfiles.ts: setup-gbrain-remote and setup-gbrain-bad-token added at
gate-tier so CI catches Path 4 regressions on every PR.

Plus a few comment refs flipped: bin/gstack-jsonl-merge, bin/gstack-timeline-log
(legacy gstack-brain-init mentions in headers).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* release: v1.27.0.0 — /setup-gbrain Path 4 + brain → artifacts rename

Bumps VERSION 1.26.4.0 → 1.27.0.0 (MINOR per CLAUDE.md scale-aware bump
guidance: ~1500 line net change including a new path in /setup-gbrain,
two new bin helpers, a journaled migration, 59 new tests, and a config
key rename across the codebase).

CHANGELOG entry covers: Path 4 (Remote MCP) end-to-end, the brain →
artifacts rename, the journaled migration, the verify-helper error
classifier, the artifacts-init multi-host provider choice. Includes
the canonical Garry-voice headline + numbers table + audience close
per the release-summary format.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test: demote setup-gbrain Path 4 E2E to periodic-tier

The Agent SDK E2E tests for Path 4 (skill-e2e-setup-gbrain-remote and
skill-e2e-setup-gbrain-bad-token) are inherently non-deterministic —
the model interprets "follow Path 4 only" prompts flexibly and can
skip Step 8 (CLAUDE.md write) or shortcut past the verify helper, which
makes the gate-tier assertions flaky.

The deterministic gate coverage for Path 4 is in
test/setup-gbrain-path4-structure.test.ts: a fast structural lint that
catches AUQ-pacing regressions and prose contract drift in <200ms with
zero token spend. That test is the right tool for catching the failure
mode the gate-tier was meant to guard against.

The Agent SDK E2E tests stay available on-demand for periodic-tier runs
(EVALS=1 EVALS_TIER=periodic bun test test/skill-e2e-setup-gbrain-*.test.ts).
Also tightened the verify-error assertion to the literal field shape
("error_class": "AUTH") instead of a substring match that false-matches
the parent claude session's "needs-auth" MCP discovery markers.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: sync package.json version to 1.27.0.0

VERSION was bumped to 1.27.0.0 in f6ec11e but package.json was not
updated in the same commit. The gen-skill-docs.test.ts assertion
"package.json version matches VERSION file" caught the drift.

This is the DRIFT_STALE_PKG case the /ship Step 12 idempotency check
is designed for; the fix is the documented sync-only repair (no
re-bump, package.json synced to existing VERSION).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
garrytan added a commit that referenced this pull request Jun 14, 2026
…2004)

* feat: add shared call-time isConductor() helper

Single source of truth for Conductor host detection in TS consumers
(CONDUCTOR_WORKSPACE_PATH / CONDUCTOR_PORT). Reads the passed env at
call time, not a module-load snapshot, so unit tests can pin the env
inline without Bun --preload (esm-hoist-breaks-env-pin-bootstrap).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* test: harden question-preference-hook harness against ambient Conductor env

runHook copied all of process.env into the hook subprocess, so running the
suite inside Conductor (CONDUCTOR_WORKSPACE_PATH/PORT set) would leak those
markers. Strip them so the existing cases deterministically characterize
NON-Conductor behavior before the Conductor branch lands. Baseline: 15 pass.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* feat: PreToolUse hook denies AskUserQuestion in Conductor, redirects to prose

Conductor disables native AskUserQuestion and routes through a flaky MCP
variant that returns '[Tool result missing due to internal error]'. The
hook now denies any AUQ call in a Conductor session and instructs the model
to render a prose decision brief instead (transport avoidance, not preference
enforcement) — firing for one-way doors too, with a typed-confirmation
requirement for destructive paths.

Precedence: never-ask auto-decide still wins (user already settled those);
Conductor prose is the fallback for everything else; non-Conductor behavior
is byte-for-byte unchanged. Restructured the per-question loop to compute
eligibility without early-returning so the Conductor branch can run as the
fallback while preserving memoryContext on every exit.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* feat: Conductor renders AskUserQuestion decisions as prose by default

In Conductor, native AskUserQuestion is disabled and the MCP variant is
flaky, so skills now render every decision as a plain-text prose brief the
user answers by typing a letter — proactively, not as a failure reaction.

- Preamble emits CONDUCTOR_SESSION, gated on != headless so eval/CI inside
  Conductor still BLOCKs instead of rendering prose to nobody.
- AskUserQuestion Format gains a Conductor-default-prose rule (auto-decide
  preferences still apply first; prose decisions log via gstack-question-log
  since PostToolUse never fires), a one-way/destructive typed-confirmation
  rule, and a typed-reply continuation protocol for split chains.
- Regenerated all SKILL.md + ship golden fixtures; bumped affected carve
  skeleton caps to absorb the always-loaded additions.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* feat: deploy the Conductor AskUserQuestion hook (setup + upgrade migration)

The PreToolUse hook only delivers its Conductor-prose guarantee if it's
installed, but setup skips hook registration in non-interactive (conductor/CI)
setups. Two fixes so layer 3 actually deploys:

- setup: treat a Conductor workspace as an implicit opt-in for the PreToolUse
  hook on the silent fall-through (never overriding an explicit opt-out).
- migration v1.58.0.0: re-register the hook for existing Conductor installs on
  /gstack-upgrade, idempotent and respecting plan_tune_hooks=no.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* test: E2E for Conductor prose + fix auto-decide-preserved GSTACK_HOME bug

- New skill-e2e-conductor-prose (periodic): Conductor env + plan-eng-review
  surfaces a prose decision brief, not a silent skip. Header documents this is
  end-to-end behavior coverage; the deterministic Conductor guard is the
  question-preference-hook unit test (the PTY harness can't register the MCP
  variant — Codex #10).
- Fix the pre-existing bug in auto-decide-preserved: it seeded the never-ask
  preference under GSTACK_HOME=tmpHome but never passed GSTACK_HOME into the
  PTY run, so the spawned claude read the real ~/.gstack and the preference
  was inert (Codex #9). Now passes GSTACK_HOME + CONDUCTOR_WORKSPACE_PATH to
  prove auto-decide still wins over the Conductor prose redirect.
- Register both in touchfiles (periodic tier).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* v1.58.0.0 feat: Conductor renders AskUserQuestion decisions as prose

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* test: strip ambient Conductor env in memory-cache-injection hook harness

Same dev-in-Conductor leak fixed for question-preference-hook: this suite's
runHook copies process.env, so running it inside Conductor flipped the
defer-path memoryContext assertions into the [conductor] prose deny. Strip
CONDUCTOR_* so the cases characterize non-Conductor behavior. (CI is headless,
so this only bit local Conductor runs.)

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* feat: gstack-detach — run agent eval/bench jobs in their own session

Long agent-run jobs (30-60 min evals, benchmarks) die when the harness sends
SIGTERM to a background task's process group on turn boundaries / monitor
stops / interruptions (observed: 'script test:gate terminated by signal
SIGTERM'). gstack-detach runs the command in a fresh session (python3
os.setsid, or setsid on Linux, nohup fallback) so a group SIGTERM can't reach
it, and wraps it in caffeinate -i on macOS so idle-sleep can't kill it either.
Returns immediately; caller polls the logfile. Secrets stay in env, never argv.

The guard test pins the contract: the command runs in a different process
group than the caller and outlives the launching shell.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* feat: eval:bg* scripts — detached eval runs for agents

Agent-facing convenience scripts that launch the eval suites through
gstack-detach so a harness SIGTERM can't kill a long run. eval:bg (diff-based),
eval:bg:all, eval:bg:gate, eval:bg:periodic — each returns immediately and
streams to /tmp/gstack-evals.log for polling. The plain test:evals / test:e2e
scripts stay foreground for humans.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* docs: CLAUDE.md — agents must run long evals via gstack-detach

Codifies the detached-execution default: agent-launched eval/benchmark runs go
through bin/gstack-detach (or the eval:bg* scripts) so a harness SIGTERM or
macOS idle-sleep can't kill a 30-60 min run, then poll the log with a
death-aware watcher. Humans keep foreground scripts.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* feat: harden gstack-detach against all four eval-infra killers

The basic bash detach fixed SIGTERM but a real run on a shared dev box hit
three more killers: cross-worktree API saturation (15-way concurrency x a
sibling worktree mass-timed-out the suite), a silent hang (periodic bun died
with no exit marker), and shared-/tmp log contamination (a concurrent
worktree's agent output bled into the log). Rewrite as a portable python3 tool
that bakes in all four fixes:

- fork + setsid: SIGTERM-proof (own session, survives harness polite-quit)
- caffeinate -i on macOS: no idle-sleep death
- --lock NAME (fcntl, machine-wide): concurrent worktrees SERIALIZE instead of
  saturating the shared model API
- run-scoped default log (~/.gstack-dev/eval-runs/<label>-<slug>-<branch>-<ts>-<pid>):
  no cross-worktree collision/contamination
- --timeout watchdog + a guaranteed '### gstack-detach EXIT=<code> ###' sentinel
  on every terminal path: no silent hang, finished-vs-died always detectable

Guard test pins all four: detached pgid differs + outlives launcher, run-scoped
log path, watchdog EXIT=timeout, and lock serialization (second run WAITS).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* feat: eval:bg* use run-scoped logs + machine lock + watchdog

Drop the shared /tmp/gstack-evals.log path (the cross-worktree collision that
contaminated a live run) for gstack-detach's run-scoped default, and add the
machine-wide gstack-evals lock (concurrent worktrees serialize, no API
saturation) plus per-tier watchdog timeouts (60/90/120 min). Each eval:bg*
prints its run-scoped log path to poll.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* docs: wire detached-eval guidance into /ship + correct CLAUDE.md flags

- /ship eval step (sections/tests.md): long eval suites launch via gstack-detach
  (own session, machine lock, EXIT sentinel) so a turn boundary can't kill a
  30+ min run mid-ship — the exact failure observed during this branch's ship.
- CLAUDE.md: correct the now-stale /tmp reference; document the --lock (serialize
  worktrees, no API saturation), --timeout watchdog, run-scoped log, and the
  guaranteed EXIT sentinel the poller breaks on.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* refactor: extract pure promotedEnv() from conductor-env-shim

Single source of truth for GSTACK_* key promotion semantics. The ambient
promoteConductorEnv() becomes a wrapper; behavior-preserving. Needed by the
hermetic env builder which must not mutate process.env.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* feat: hermetic child-env builder for E2E runners

Allowlist scrub (basics/network/named-auth kept; CONDUCTOR_*, CLAUDE_*,
GSTACK_*, MCP_*, GBRAIN_*, operator credentials dropped), per-runner
extraAllow, overrides merge last, EVALS_HERMETIC=0 byte-identical escape
hatch read at call time (ESM-hoist safe). Sync memoized singleton temp dirs
(<runRoot>/.claude keeps the extractPlanFilePath contract), seeded
.claude.json for non-interactive first run, pid-aware GC of crashed runs.
19 free unit tests.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* feat: session-runner spawns hermetic children + isolation canaries

claude -p children now get the allowlist-scrubbed env and a gated
--strict-mcp-config (EVALS_HERMETIC=0 restores operator env AND args).
Two gate-tier canaries make the clean room falsifiable: hermetic-canary
asserts env redirect + scrub + zero MCP servers + nonzero API-key cost
from the Bash tool_result (never model prose); hermetic-sentinel plants a
poisoned operator config (user CLAUDE.md + MCP server) and proves the
child cannot see it. Empirically verified on claude 2.1.175: print mode
needs no seed config (the seed serves the PTY path); the child CLI sets
CLAUDECODE for its own tools, so that scrub is pinned in unit tests, not
E2E. hermetic-env.ts joins GLOBAL_TOUCHFILES.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* feat: PTY runner spawns hermetic claude sessions

launchClaudePty children get the allowlist-scrubbed env, a gated
--strict-mcp-config, and the session exposes hermeticConfigDir for
forensics (hermetic plan files live under <dir>/plans/ and still match
extractPlanFilePath via the /.claude dir-name contract). Seeded trust
state covers repo-cwd sessions; the 15s trust-watcher stays as fallback.
Verified foreground via the plan-mode-no-op gate test.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* feat: codex/gemini runners spawn hermetic children

Same allowlist scrub as the claude runners, with each provider's auth
surface re-admitted via extraAllow (codex: OPENAI_API_KEY/CODEX_* plus
its tempHome .codex copy; gemini: GEMINI_*/GOOGLE_* with real HOME for
~/.gemini auth). The gemini spawn previously inherited the full operator
env with no env property at all.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* feat: agent-sdk-runner spawns hermetic children via complete Options.env

The historical 'env: breaks SDK auth' failure was partial-env replacement:
Options.env replaces the child's entire environment, so objects lacking
ANTHROPIC_API_KEY killed auth. Passing the complete hermetic env (key +
PATH + redirected CLAUDE_CONFIG_DIR/GSTACK_HOME) works — validated live
via query() with a Bash tool call (success, real cost, Conductor vars
scrubbed). Per-test opts.env merges last; ambient key mutation still
works because the builder reads process.env at call time.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* test: static tripwire pins hermetic wiring in all five runners

Free-tier invariants: every runner builds child env via hermeticChildEnv,
no raw ...process.env spread at any spawn site, --strict-mcp-config gated
on isHermeticEnabled in both claude runners, and no test callsite passes
the operator env into a runner's override parameter (scoped to runner
calls — unit tests spawning gstack bin scripts directly are exempt).
Mirrors the terminal-agent-pid-identity / server-embedder-terminal-port
tripwire idiom.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* test: refresh codex/factory ship goldens with detached-eval block

a38089a added the gstack-detach guidance to the ship template and
updated the claude golden; the codex and factory goldens missed the same
16-line block. Regenerated via bun run gen:skill-docs.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* docs: hermetic local E2E is the default; retire stale SDK env warning

CLAUDE.md now documents the hermetic clean room (allowlist scrub, fresh
seeded CLAUDE_CONFIG_DIR, temp GSTACK_HOME, --strict-mcp-config),
EVALS_HERMETIC=0 as the debug escape hatch, and replaces the 'never pass
env: to runAgentSdkTest' rule with the verified mechanism (partial-env
replacement was the failure; complete env is safe).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* fix: operational-learning fixture copies lib/jsonl-store.ts with the bin

gstack-learnings-log imports $SCRIPT_DIR/../lib/jsonl-store.ts (hasInjection,
v1.57.5.0) — copying only the bin scripts into the temp fixture broke the
script with exit 1 since then. Latent because diff-based selection rarely
runs this test; surfaced when hermetic-env.ts joined GLOBAL_TOUCHFILES and
selected everything. Reproduced outside the hermetic env to confirm blame.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* fix: ios-qa daemon scenarios use unique pidfiles under --concurrent

All scenarios shared join(workDir, 'daemon.pid') through a module-scope
workDir binding that beforeEach reassigns mid-flight under bun --concurrent.
First daemon claims; siblings get already_running against the test process's
own always-alive pid and fail in milliseconds — the failure mode seen at
15-way gate concurrency. Per-claim unique pidfiles keep the single-instance
semantics under test.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* fix: workflow judge re-appends body-carved sections after the marker slice

runWorkflowJudge appended sections/*.md before slicing startMarker..endMarker.
That handles skills that moved their MARKERS into sections (plan-eng,
plan-design) but not document-release, which keeps its markers in the
skeleton and carved the workflow BODY (Steps 2-9 -> sections/release-body.md)
AFTER the endMarker — so the slice dropped it and the judge scored
completeness 2 ('Steps 2-9 are in an external file'). Now any carved section
the marker window excluded is re-appended, so the judge sees the full
workflow the agent executes. document-release: completeness 2->5, clarity
3->4. ship/plan-ceo/plan-eng/plan-design judges unchanged (their section
content is already inside the slice, so the head-dedup skips re-append).

Pre-existing since the v1.57.0.0 carve (#1907); surfaced now because
hermetic-env.ts is a global touchfile that selects every llm-judge test.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* harden: hermetic temp-dir GC grace window + half-seed cleanup

Codex adversarial review (ship) flagged two temp-dir lifecycle edges:
- GC deleted any dead-pid dir; PID reuse could delete a freshly-created dir
  whose original pid exited and was recycled to a live process. Now requires
  BOTH a dead pid AND mtime older than a 1h floor.
- A seed-write failure after mkdir left an unseeded dir named with our live
  pid that this process's GC skips, leaking until exit. Now the partial dir
  is torn down before the (still loud) rethrow.

Two findings left as-is by design: HOME stays allowlisted (CLAUDE_CONFIG_DIR
wins for claude; codex/gemini need ~/.codex|~/.gemini auth; FS sandbox is
TODOS.md:454 scope; the hermetic-sentinel canary proves config isolation),
and PTY extraArgs --mcp-config is a deliberate caller opt-in like env overrides.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* docs: document hermetic-by-default E2E + eval:bg detached runs in CONTRIBUTING

The Testing & evals section now tells contributors that local E2E runners
spawn children through a sealed clean room (allowlist-scrubbed env, seeded
CLAUDE_CONFIG_DIR, temp GSTACK_HOME, --strict-mcp-config) so local signal
matches CI, with EVALS_HERMETIC=0 as the escape hatch. The eval-tools list
gains the eval:bg* detached-run scripts (gstack-detach: SIGTERM-proof,
caffeinate-wrapped, machine-locked, run-scoped logs, EXIT= sentinel).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* chore: sync package.json to 1.58.1.0

The merge took main's package.json (1.58.0.0); gstack-version-bump repair
fixed the working tree but the change was left uncommitted. Without this the
committed tree disagrees with VERSION and CI's version-match test fails.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* docs: regenerate diagram SKILL.md with Conductor prose preamble

The diagram skill (new from main) was missing the Conductor-session prose
AskUserQuestion blocks that gen-skill-docs propagates to every SKILL.md.
Pure generated output; reproduced by bun run gen:skill-docs.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Fable 5 <noreply@anthropic.com>
garrytan added a commit that referenced this pull request Jun 18, 2026
* feat: Layer C stealth — chrome.*, Notification, per-install hardware, toString Proxy (gbrowser T1+T3+D6)

Three additions stacked into the existing applyStealth() init script
to close the visible automation tells that today push GBrowser users
into Google's /sorry/index captcha and similar:

T1 — Strip Playwright's automation default args:
  --enable-automation                              (kills "Chrome is being
                                                    controlled" infobar)
  --disable-popup-blocking, --disable-component-update,
  --disable-default-apps                           (Patchright's list — each
                                                    is a documented tell)

  Now centralized in STEALTH_IGNORE_DEFAULT_ARGS export, used by BOTH
  launchHeaded() and handoff() (the headless → headed re-launch path).

D6 — Drop "GStackBrowser" UA branding suffix:
  Real Chrome's UA ends `Safari/537.36`, not `Safari/537.36 GStackBrowser`.
  The branded suffix was a high-entropy classifier for any vendor that
  grep'd UA for known automation/test-browser strings. Branding still
  lives in the wrapper .app name + Dock icon + tray — does not need
  to leak via the UA string for the product to be "GBrowser." Resolves
  the "looks like Chrome but identifies as GStackBrowser" contradiction
  codex review #18 flagged.

T3 — Layer C init-script additions in stealth.ts:

  1. Function.prototype.toString Proxy (must run first). Wraps every
     patched getter / function in a WeakSet so they report
     `function NAME() { [native code] }` at every recursion depth,
     defeating the depth-3+ integrity check
     (fn.toString.toString.toString().includes('[native code]')).

  2. window.chrome.runtime / chrome.app / chrome.csi / chrome.loadTimes
     restoration with full enum shape (OnInstalledReason, PlatformArch,
     PlatformOs, etc.) + method bodies. Real Chrome ships these; their
     absence is universally checked. Vendor research (gbrowser plan
     deep-dive on Cloudflare + DataDome) confirmed both vendors probe
     this shape directly.

  3. Notification.permission aligned to 'default'. The existing inline
     addInitScript already spoofs permissions.query({name:'notifications'})
     to return 'prompt' — Notification.permission being 'denied' while
     Permissions returns 'prompt' is a cross-source inconsistency that
     detectors flag specifically.

  4. Per-install hardware values via GSTACK_HW_CONCURRENCY /
     GSTACK_DEVICE_MEMORY env vars (set by gbd's host_profile.go from
     system_profiler + sysctl). Reporting real host values within the
     Chrome shape avoids the cross-user GBrowser fingerprint cluster
     that hardcoded defaults would create. Codex review #10 flagged
     hardcoding as creating contradictions across Apple Silicon / Intel
     / UA-CH architecture.

  5. Selenium 25-global cleanup + PhantomJS + NightmareJS + Watir +
     Playwright (__pwInitScripts, __playwright__binding__) static-name
     deletion. The inline block continues to handle the dynamic
     cdc_/__webdriver/__selenium/__driver prefixes.

D7 (codex correction) kept: still do NOT fake navigator.plugins or
navigator.languages. Synthesizing those triggers MORE consistency
flags from modern fingerprinters than letting Chromium surface them
natively.

Test coverage:
- 15 new tests in stealth-layer-c.test.ts covering: launch-flag
  exports, script structure, toString-Proxy installs first, every
  spoof present, hardware values interpolated from input (not
  hardcoded), Selenium global cleanup spot-check, no GStackBrowser
  leak in stealth payload, backwards-compat exports preserved.
- All 8 existing stealth-webdriver tests still pass.
- All 2 existing browser-manager-unit tests still pass.

For GBrowser specifically: this is the gstack-side half of Phase 1 / T1
+ T3 + D6 in the anti-detection plan. The gbrowser repo's submodule
pointer bump will land alongside this.

* feat: buildGStackLaunchArgs — Pack 1 cmdline-switch construction for gbrowser

New stealth.ts export that turns the GSTACK_* env vars (already populated
by gbrowser's gbd from host_profile.go) into the --gstack-* cmdline
switches the Pack 1 Chromium patches read at WebGL getParameter,
NavigatorUA::userAgentData, NavigatorConcurrentHardware::hardwareConcurrency,
and NavigatorDeviceMemory::deviceMemory time.

Wired into all three launchArgs sites: launch() (headless), launchHeaded()
(real product path), and handoff() (headless → headed re-launch).

Mapping:
  GSTACK_GPU_VENDOR      → --gstack-gpu-vendor
  GSTACK_GPU_RENDERER    → --gstack-gpu-renderer
  GSTACK_PLATFORM        → --gstack-ua-platform (with mapping:
                            MacARM/MacIntel → macOS, Win32 → Windows,
                            Linux x86_64 → Linux)
  GSTACK_GPU_CHIPSET     → --gstack-ua-model
  GSTACK_HW_CONCURRENCY  → --gstack-hw-concurrency
  GSTACK_DEVICE_MEMORY   → --gstack-device-memory

Each switch is emitted only when its env var is non-empty — empty
values fall through to the patch's "no override" path, which returns
the real Chromium native value. Safe to ship on Chromium builds
without the Pack 1 patches applied (zero behavior change).

The patches themselves live in the gbrowser repo at chromium/patches/
{webgl-vendor-spoof,ua-client-hints-stealth,worker-navigator-stealth}.patch.
Both halves (gstack arg construction + gbrowser C++ patches) must
land + Chromium rebuild before the spoof reaches the WebGL/UA-CH/
hardware accessors. Currently dormant until then.

Tests (browse/test/stealth-layer-c.test.ts):
  7 new buildGStackLaunchArgs cases — empty env, all-populated, partial,
  platform mapping (MacARM/MacIntel/Win32/Linux), unrecognized platform
  fallthrough, vendor-with-spaces escape-safety.
  All 32 stealth/browser-manager tests pass.

For GBrowser specifically: gstack-side half of the Pack 1 flag plumbing.
gbrowser repo will bump the submodule pointer to this commit, then re-run
bun run test/anti-bot/evidence-run.ts to verify creepjs's "33% headless"
score drops after Pack 1 + Chromium rebuild.

* feat: buildGStackLaunchArgs adds --gstack-suppress-prepare-stack-trace

Pack 2 / B11 flag plumbing for the new
error-preparestacktrace-stealth.patch in gbrowser/chromium/patches/.

Always emit --gstack-suppress-prepare-stack-trace unless the caller
explicitly sets GSTACK_CDP_STEALTH=off in the environment. Off by
default in patch behavior (no-op without the C++ patch), so this is
safe on stock Playwright Chromium too.

Closes the Cloudflare canary trick where a page sets
Error.prepareStackTrace and watches for it to fire during CDP
serialization of a logged Error object.

Tests:
  All 33 stealth/browser-manager tests pass. New cases:
  - GSTACK_CDP_STEALTH=off disables suppression
  - empty env still emits the always-on flag (count=1)
  - all-populated env now emits 7 flags (was 6)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(browse): enable Chromium sandbox on headed launchPersistentContext

Mirrors v1.40.0.1 from main lineage (PR #1617). Cherry-picked onto
gbrowser-anti-detection so the GBrowser submodule can consume the fix
without waiting for main to merge.

Playwright auto-adds --no-sandbox whenever chromiumSandbox !== true
(playwright-core/lib/server/chromium/chromium.js:291-292). The headless
chromium.launch() site set the option; the two headed sites
(launchHeaded() and handoff()) did not. Every headed launch on macOS
and Linux showed Chromium's yellow "unsupported command-line flag:
--no-sandbox" infobar.

shouldEnableChromiumSandbox() centralizes the Win32 / CI / CONTAINER /
root heuristic that previously lived only in the headless path's
explicit --no-sandbox push at :225. All three launch sites now use the
helper, and six unit tests pin the policy across darwin, linux, win32,
CI, CONTAINER, and root.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* v1.40.0.2 fix(browse): Cmd+Q on managed Chromium stops triggering supervisor respawn

Three browser.on('disconnected') handlers in browse/src/browser-manager.ts
(launch, launchHeaded, handoff) each exited with a non-zero code on every
disconnect, regardless of cause. Process supervisors that consume our exit
code (gbrowser's gbd HealthMonitor in cmd/gbd/health.go) treated user
Cmd+Q identical to a Chromium crash and respawned with exponential
backoff, so the visible browser kept reappearing after the user closed it.

Add resolveDisconnectCause(browser) that reads the underlying ChildProcess
exitCode + signalCode (waiting up to 1s for the exit event if the
disconnected event fired first). Exit code 0 + no signal = clean user
quit; anything else = crash, signal-kill, or OOM.

Wire the resolver into all three disconnect handlers:
- launch() (headless): clean → exit 0, crash → exit 1 (was always 1)
- launchHeaded() (headed): clean → exit 0, crash → exit 2 (was always 2)
  onDisconnect() cleanup callback still runs in both cases.
- handoff() (re-launch): same as launch() via the helper.

Preserve the per-path crash codes (1 vs 2) so any supervisor that
differentiated headed vs headless crashes keeps working.

Seven new unit tests in browse-manager-unit.test.ts cover the resolver
across already-exited, signal-killed (SIGSEGV / SIGKILL), async exits,
and null-browser inputs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(browse): apply stealth on every launch path + share automation-artifact cleanup

handoff() built cmdline args but never called applyStealth, so a handed-off
browser had no JS stealth (no webdriver mask, no chrome.* shape, no toString
proxy). And the cdc_/Permissions cleanup shim lived inline in launchHeaded()
only, so headless launch() reported Notification.permission='default' without
the matching permissions.query='prompt' answer — the exact cross-source
inconsistency the shim exists to prevent.

Move the cleanup into AUTOMATION_ARTIFACT_CLEANUP_SCRIPT inside applyStealth so
all three launch paths (launch, launchHeaded, handoff) get identical stealth,
and call applyStealth(newContext) in handoff() before restoreState() navigates.

A static tripwire in browser-manager-unit.test.ts fails CI if any launch path
drops the applyStealth call again.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix(browse): make --gstack-suppress-prepare-stack-trace opt-in, not default-on

buildGStackLaunchArgs() pushed the flag unless GSTACK_CDP_STEALTH=off, i.e.
on-by-default — contradicting its own comment ("off by default, only for
gbrowser builds"). The switch is read by a C++ patch that only exists in
gbrowser; on stock Playwright Chromium it is an unknown switch.

Flip to opt-in: emit only when GSTACK_CDP_STEALTH is on/1/true. gbd opts in by
exporting GSTACK_CDP_STEALTH=on; stock installs leave it unset so the flag
never reaches a Chromium that wouldn't understand it. Comment now matches code.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* docs(browse): correct stale stealth comments

The file-level stealth.ts docstring claimed "we DON'T fake navigator.plugins"
while the same file now ships EXTENDED_STEALTH_SCRIPT, which does fake plugins
when GSTACK_STEALTH=extended. Clarify that Layer C (the always-on default)
doesn't fake plugins and the opt-in extended mode does, as the documented
"actively lies, may break sites" escape hatch.

Also fix the launch()/launchHeaded() comments that said "mask navigator.webdriver
only" — applyStealth (Layer C) also restores window.chrome.*, aligns
Notification.permission, and sets per-install hardware.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* test(browse): runtime + extended-mode coverage for the stealth blend

The stealth tests were all static string-shape assertions; nothing executed
the script in a real page. Add real-Chromium runtime checks via applyStealth +
page.evaluate:

- Layer C runtime: window.chrome.* rich shape, Notification.permission='default'
  paired with permissions.query notifications='prompt' (guards the shim now
  running on every path), and patched getters reporting [native code].
- Per-install hardware: navigator.hardwareConcurrency/deviceMemory reflect the
  GSTACK_* env profile.
- Extended-mode blend: navigator.plugins is faked when GSTACK_STEALTH=extended,
  Layer C still wins window.chrome.runtime, and navigator.webdriver stays false
  (own-prop getter survives extended's prototype delete).
- Persistent-context (launchHeaded/handoff) parity now uses a page created
  AFTER applyStealth — the old test checked pages()[0], which predates the init
  script, so webdriver was false only via the launch arg, not Layer C.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix(browse): handoff() + launchHeaded() spread the shared STEALTH_LAUNCH_ARGS

handoff() built its launch args from only ['--hide-crash-restore-bubble',
...buildGStackLaunchArgs()], omitting STEALTH_LAUNCH_ARGS — so a handed-off
browser kept the --disable-blink-features=AutomationControlled tell that
launch() and launchHeaded() strip. launchHeaded() also hardcoded the flag as a
literal. Both now spread the shared constant, so the AutomationControlled flag
lives in one place across all three launch paths.

Tripwires: STEALTH_LAUNCH_ARGS spread into >= 3 sites (no inline literal) and
STEALTH_IGNORE_DEFAULT_ARGS wired into both persistent-context paths.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* refactor(browse): drop dead HostProfile.platform, export test internals

HostProfile.platform was set by readHostProfile but never read by
buildStealthScript — the platform spoof is owned by the UA-CH cmdline switch in
buildGStackLaunchArgs (which reads GSTACK_PLATFORM directly). Remove the dead
field. Export readHostProfile and AUTOMATION_ARTIFACT_CLEANUP_SCRIPT so their
clamp/shape invariants can be unit-tested. Correct the stale "25 Selenium
globals" count comment and note the extended cdc_ scan is redundant-but-retained
for standalone use.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* test(browse): cover readHostProfile clamp, toString depth-3, chrome.* calls

Pre-landing review coverage gaps:
- readHostProfile clamps 0/negative/NaN/missing env to 8 (a deviceMemory=0 or
  NaN would be a glaring bot tell) — now asserted.
- toString proxy survives the depth-3 recursion trick
  (fn.toString.toString.toString().includes('[native code]')), the headline
  claim that was only tested at depth-1.
- chrome.csi() and chrome.loadTimes() are invoked (not just typeof-checked) and
  runtime.connect() throws the native-shaped "No matching signature" error.
- AUTOMATION_ARTIFACT_CLEANUP_SCRIPT static shape (cdc_/__webdriver strip +
  notifications->prompt) as a hermetic backup for the live-Chromium pairing test.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix(browse): recreateContext() re-applies stealth (closes 4th un-stealth path)

useragent and viewport --scale route through recreateContext(), which rebuilds
the BrowserContext via newContext() — a fresh context with no init scripts. It
never called applyStealth, so a routine useragent/viewport-scale command
silently dropped webdriver masking, window.chrome.* shape, hardware spoof, and
the cdc/Permissions cleanup on every restored page. Caught by the cross-model
adversarial review (Codex) after the Claude pass and eng review missed it.

Both the main and fallback paths now call applyStealth before any page is
created. The launch-path tripwire is raised to >= 4 sites and now asserts the
recreateContext() body specifically, so the regression class can't recur.

Also documents the load-bearing trust assumption on buildGStackLaunchArgs /
readHostProfile (GSTACK_* must be gbd-sourced, never page/remote data — the
injection-safety argument depends on it) and the notifications-permission
spoof tradeoff.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* chore: bump version and changelog (v1.58.3.0)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* docs: sync browser stealth docs to Layer C (v1.58.3.0)

BROWSER.md "Stealth scope" still described the default as navigator.webdriver
masking only; Layer C is now the always-on default across all four
context-creation paths. Update the stealth-scope prose, the "What GStack
Browser means" blurb (stock-Chrome UA, no GStackBrowser suffix, captchas can
still get through at the CDP layer), the stealth.ts source-map line, and the
env-vars table (GSTACK_STEALTH, GSTACK_CDP_STEALTH, GSTACK_GPU_*, GSTACK_PLATFORM,
GSTACK_HW_CONCURRENCY/GSTACK_DEVICE_MEMORY + the explicit --gstack-* switches and
ignoreDefaultArgs stripping). Correct the stale "narrows to navigator.webdriver
masking only" premise on the open CDP-patch TODO (the TODO itself stays open —
the CDP-protocol layer is still unaddressed).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants