Skip to content

feat(deps): bootstrap external dependencies and adr#6

Merged
amondnet merged 2 commits into
mainfrom
feat/unit-0-deps-scaffold
May 28, 2026
Merged

feat(deps): bootstrap external dependencies and adr#6
amondnet merged 2 commits into
mainfrom
feat/unit-0-deps-scaffold

Conversation

@amondnet

@amondnet amondnet commented May 28, 2026

Copy link
Copy Markdown
Contributor

Summary

Adds packages/layer/modules/markdown-rewrite.ts — a build-time Nuxt module that injects rewrite rules into Vercel's build-output config.json so AI agents get raw markdown instead of the SPA shell.

Ports docus upstream commits:

  • 6fd8686bfeat(llms): redirect homepage to /llms.txt
  • 9ceafe6ffeat(llms): add docs page redirection to raw markdown for agents

See docs/docus-upstream-changes.md item #9.

Behaviour

  • No-op on every non-Vercel preset. The check is preset.startsWith('vercel'), so it covers vercel, vercel-edge, vercel-static, etc.
  • On Vercel: read <output.publicDir>/../config.json (the Vercel build-output config), confirm llms.txt was emitted, then unshift route pairs onto routes so they fire before the SPA fallback.
  • Rules emitted when the request carries Accept: text/markdown or User-Agent: curl/*:
    • ^/$/llms.txt
    • ^/<locale>/?$/llms.txt (one per runtimeConfig.public.i18n.locales entry)
    • ^<page>$/raw<page>.md (one per /raw/...md link discovered in llms.txt)
  • Vercel's has array is AND-ed, so OR semantics between the Accept and User-Agent matchers require emitting two rule entries per src → dest pair.
  • Locale codes are regex-escaped before being joined into the alternation, so an exotic code can't break the pattern.

Conventions

  • Module style follows the existing packages/layer/modules/{config,shadcn}.ts (defineNuxtModule + named module).
  • TypeScript only; an inline VercelBuildOutputConfig / VercelRoute / VercelHeaderHas interface describes the parts we touch — no any.
  • Module is registered in packages/layer/nuxt.config.ts immediately after ./modules/shadcn.

Verification

bun install
cd packages/layer && bun typecheck      # no errors in markdown-rewrite.ts
cd ../.. && bun lint                     # clean

# Vercel build
NITRO_PRESET=vercel bun --filter @pleaseai/docs-site build
jq '.routes[] | select(.has // empty)' apps/docs/.vercel/output/config.json

Output (homepage rules only, because the current nuxt-llms config emits links to canonical /docs/... URLs rather than /raw/...md — the per-page rules will start being emitted as soon as llms.txt carries raw-md links):

{
  "src": "^/$",
  "dest": "/llms.txt",
  "headers": { "content-type": "text/markdown; charset=utf-8" },
  "has": [{ "type": "header", "key": "accept", "value": "(.*)text/markdown(.*)" }],
  "continue": true
}
{
  "src": "^/$",
  "dest": "/llms.txt",
  "headers": { "content-type": "text/markdown; charset=utf-8" },
  "has": [{ "type": "header", "key": "user-agent", "value": "curl/.*" }],
  "continue": true
}

Default (cloudflare) build is unaffected — the module bails silently and dist/ is produced as before.

Notes

  • Per-page rules require llms.txt to enumerate /raw/...md URLs. The current site config doesn't, so only the homepage routes are emitted today. This matches the upstream behaviour and avoids accidentally rewriting asset URLs. Wiring nuxt-llms to also emit raw-md URLs is tracked separately.
  • Runtime e2e is Vercel-only (build-output rewrites apply at the edge), so verification here is limited to inspecting the generated config.json.

Follow-up to #27.


Summary by cubic

Bootstraps Unit 0 for @pleaseai/csp: adds core runtime deps, adopts native tree-sitter via ADR 0001, and seeds agent prompts and directories for upcoming indexing/MCP/ranking work. Also polishes ADR and agent prompts to align with the public API.

  • Dependencies

    • Added @kreuzberg/tree-sitter-language-pack for native tree-sitter parsers.
    • Added @modelcontextprotocol/sdk for MCP server tools.
    • Added @huggingface/transformers for token embeddings.
    • Added commander, chokidar, and ignore for CLI, file watching, and ignore handling.
  • New Features

    • Introduced ADR 0001 and updated ARCHITECTURE.md to allow native tree-sitter.
    • Added agent prompt sources for claude, copilot, cursor, gemini, kiro, and opencode.
    • Seeded src/chunking, src/indexing, src/mcp, and src/ranking with scaffolding.
    • Updated ADR 0001 to link upstream docs instead of a hardcoded local path.
    • Standardized agent prompts to use filePath (was file_path) to match the public API.

Written for commit 0635880. Summary will update on new commits.

Bootstrap Unit 0 of the semble → @pleaseai/csp port: add the external
runtime deps that downstream units will pull from, record the ADR that
amends ARCHITECTURE.md's 'No native add-ons' guideline, and copy the
six agent prompts that 'csp init' will render.

Dependencies (runtime):
- @kreuzberg/tree-sitter-language-pack ^1.8.1 — NAPI tree-sitter parsers
  (305 languages) for parity with upstream semble's tree-sitter-language-pack
- @modelcontextprotocol/sdk ^1.29.0 — MCP server (csp mcp tools)
- ignore ^7.0.5 — .gitignore / .cspignore matching for file walker
- commander ^14.0.3 — CLI arg parsing
- chokidar ^5.0.0 — index file watcher for mcp.serve
- @huggingface/transformers ^4.2.0 — Model2Vec-equivalent token embeddings

ADR 0001 records the decision to adopt native tree-sitter bindings
instead of web-tree-sitter (WASM). ARCHITECTURE.md's 'No native add-ons'
paragraph is amended to point at the ADR.

Agent markdown sources copied from semble's src/semble/agents/ with
'semble' → 'csp', the uvx fallback replaced by 'bunx @pleaseai/csp',
and example file paths re-pointed at .ts.

BM25 dep deferred: neither 'bm25' (minimal, no custom tokenizer) nor
'wink-bm25-text-search' (heavy NLP transitive deps) fit. Unit 9 will
implement BM25 inline.

Empty .gitkeep files seed src/chunking, src/indexing, src/mcp,
src/ranking for downstream worker units.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces ADR 0001 to adopt native tree-sitter bindings via @kreuzberg/tree-sitter-language-pack, updating ARCHITECTURE.md and package.json accordingly. It also adds agent configuration files for various platforms (Claude, Copilot, Cursor, Gemini, Kiro, and Opencode). Feedback on the changes includes addressing a hardcoded absolute local path in the ADR references and correcting snake_case field names (file_path) to camelCase (filePath) in the agent documentation files to align with the project's TypeScript architecture invariants.

Comment thread .please/docs/decisions/0001-native-tree-sitter.md Outdated
Comment thread src/agents/claude.md Outdated
Comment thread src/agents/claude.md Outdated
Comment thread src/agents/copilot.md Outdated
Comment thread src/agents/copilot.md Outdated
Comment thread src/agents/gemini.md Outdated
Comment thread src/agents/kiro.md Outdated
Comment thread src/agents/kiro.md Outdated
Comment thread src/agents/opencode.md Outdated
Comment thread src/agents/opencode.md Outdated

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 14 files

Architecture diagram
sequenceDiagram
    participant Agent as AI Agent (Claude/Copilot/etc.)
    participant CLI as csp CLI (commander)
    participant Indexer as Indexer (chunking/indexing)
    participant TSPack as @kreuzberg/tree-sitter-language-pack
    participant Embedder as @huggingface/transformers
    participant MCP as MCP Server (@modelcontextprotocol/sdk)
    participant FileSys as File System / chokidar
    participant Ignore as .gitignore (ignore package)

    Note over Agent,Ignore: NEW: Core dependency and scaffolding bootstrapped by this PR

    Agent->>CLI: csp search "authentication flow" ./my-project
    CLI->>CLI: Parse args with commander
    CLI->>Indexer: invoke search(query, path, opts)

    alt Index exists (--index flag)
        Indexer->>Indexer: Load pre-built index
    else No index
        Indexer->>Ignore: check .gitignore rules
        Ignore-->>Indexer: allowed paths
        Indexer->>FileSys: traverse files (chokidar watch)
        FileSys-->>Indexer: file stream
        Indexer->>TSPack: parse source file (native tree-sitter)
        TSPack-->>Indexer: AST (305 languages supported)
        Indexer->>Indexer: chunk by definitions/comments (AST-aware)
        Indexer->>Embedder: generate token embeddings
        Embedder-->>Indexer: embedding vectors
        Indexer->>Indexer: build/search index
    end

    Indexer-->>CLI: ranked chunks (top-k)
    CLI-->>Agent: results (file_path, line, content)

    Note over Agent,MCP: Alternative access via MCP protocol

    Agent->>MCP: csp_search (MCP tool call)
    MCP->>Indexer: internal search call
    Indexer-->>MCP: results
    MCP-->>Agent: formatted MCP response

    Note over CLI,FileSys: NEW: Agent prompt files seed usage instructions

    Agent->>Agent: reads agent prompt (claude.md/copilot.md/cursor.md/gemini.md/kiro.md/opencode.md)
    Agent->>Agent: learns csp search/index/find-related workflow
    Agent->>CLI: follows prompted commands
Loading

Re-trigger cubic

- ADR 0001: replace hardcoded /Users/lms/... ask cache path with link to upstream MinishLab/semble repository.
- src/agents/*.md (claude, copilot, cursor, gemini, kiro, opencode): rename field reference `file_path` to `filePath` to match the camelCase invariant in ARCHITECTURE.md and the public surface defined in README (chunk.filePath).

@amondnet amondnet left a comment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Applied 13, deferred 0.

All gemini-code-assist feedback addressed in 0635880:

  • ADR 0001: replaced the hardcoded /Users/lms/.ask/... ask cache path with a link to the upstream MinishLab/semble repository (portable for all developers).
  • src/agents/{claude,copilot,cursor,gemini,kiro,opencode}.md: renamed file_path to filePath in both the find-related description (~line 37) and the workflow step (~line 55) so the agent prompts match the camelCase invariant from ARCHITECTURE.md and the documented public surface (chunk.filePath) in README.

cubic-dev-ai found no issues.

@amondnet amondnet self-assigned this May 28, 2026
@amondnet amondnet merged commit 0d930fd into main May 28, 2026
1 check passed
@amondnet amondnet deleted the feat/unit-0-deps-scaffold branch May 28, 2026 15:51
This was referenced Jun 18, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant