Skip to content

docs(references): add cocoindex-code and model2vec reference analyses#36

Merged
amondnet merged 3 commits into
mainfrom
amondnet/cocoindex
Jun 19, 2026
Merged

docs(references): add cocoindex-code and model2vec reference analyses#36
amondnet merged 3 commits into
mainfrom
amondnet/cocoindex

Conversation

@amondnet

@amondnet amondnet commented Jun 19, 2026

Copy link
Copy Markdown
Contributor

Summary

Adds two reference analyses under .please/docs/references/ and registers them in index.md. These are reference materials only (not contracts), per the references/index.md preamble — they inform design decisions but do not constrain the implementation.

Changes

  • .please/docs/references/cocoindex.md — Prior-art comparison of cocoindex-code, an independent AST-based code-search MCP server. Documents its dense-only retrieval approach (OpenAI embeddings), architecture, and a benchmark row with caveat footnote on methodology differences vs csp.
  • .please/docs/references/model2vec.md — Analysis of model2vec and its Rust port model2vec-rs, the dense-retrieval dependency csp uses (potion-code-16M). Includes published MTEB and CoIR benchmark figures.
  • .please/docs/references/index.md — Registered both new analyses in the Documents table and Sync baselines table.

Key Finding

model2vec's own published card shows that dense+BM25 hybrid (CoIR 40.41) beats dense-only (37.05). This directly validates csp's hybrid design choice over cocoindex-code's dense-only path.

Test Plan

  • Reference docs render correctly in GitHub
  • index.md links resolve to the new files
  • No functional code changed

Summary by cubic

Docs-only: add reference analyses for cocoindex-code and model2vec, register them in .please/docs/references/index.md, fix [[...]] wiki-links (use backticked code), and clarify that the Rust dense path already uses model2vec-rs with a deterministic stub only as a fallback.
The model2vec card shows potion-code-16M + BM25 hybrid (CoIR 40.41) beats dense-only (37.05), validating csp’s hybrid design over cocoindex-code’s dense-only path.

Written for commit e22e340. Summary will update on new commits.

Summary by CodeRabbit

  • Documentation
    • Added a new reference page comparing CocoIndex Code with csp/semble, covering retrieval approach, embeddings, chunking, indexing, and ranking/reranking behavior, plus guidance on where it differs.
    • Added a new Model2Vec reference page explaining how csp/semble uses potion-code-16M, including the model lineup and benchmark context.
    • Expanded the references index with entries for CocoIndex Code and Model2Vec, including sync baseline tracking notes.

- cocoindex-code: prior-art comparison (dense-only MCP vs csp hybrid); includes
  published CoIR benchmark row and a caveat footnote on methodology differences
- model2vec: analysis of the model2vec / model2vec-rs dependency used by csp's
  dense retrieval leg (potion-code-16M); includes MTEB and CoIR benchmark figures
- index.md: registered both analyses in the Documents table and Sync baselines table

Key finding: model2vec's own card shows dense+BM25 hybrid (40.41 CoIR) beats
dense-only (37.05), directly validating csp's hybrid design over cocoindex's
dense-only approach.
@coderabbitai

coderabbitai Bot commented Jun 19, 2026

Copy link
Copy Markdown

Review Change Stack

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 395ce829-9bc9-4fbc-8935-1a9a4d13b6cc

📥 Commits

Reviewing files that changed from the base of the PR and between 11d9469 and e22e340.

📒 Files selected for processing (2)
  • .please/docs/references/cocoindex.md
  • .please/docs/references/model2vec.md

📝 Walkthrough

Walkthrough

Adds two new reference documentation pages: cocoindex.md contrasting CocoIndex Code against csp/semble, and model2vec.md describing the Model2Vec/model2vec-rs integration using potion-code-16M. Updates the reference index to register both documents with sync-baseline tracking rows.

Changes

Reference Documentation Additions

Layer / File(s) Summary
New reference pages: CocoIndex Code and Model2Vec
.please/docs/references/cocoindex.md, .please/docs/references/model2vec.md
cocoindex.md documents CocoIndex Code as a prior-art comparator, covering retrieval signals, chunking, delta indexing, and CLI/MCP surface mappings against csp/semble. model2vec.md documents the potion-code-16M embedding model, CoIR benchmark figures (dense and BM25 hybrid), and how model2vec-rs is wired in csp.
Reference index registration
.please/docs/references/index.md
Adds rows to the Documents table for cocoindex-code and model2vec, and adds corresponding Sync baseline rows with analyzed-at details, dependency classifications, and drift-recheck instructions.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~4 minutes

Possibly related PRs

  • pleaseai/code-search#35: Adds semble.md reference document to the same .please/docs/references/ system, with overlapping documentation of model2vec as a shared dense-embedding dependency.

Poem

🐇 Hop hop, the docs appear,
Two references crystal clear!
CocoIndex and Model2Vec in sight,
Indexed and tracked with pure delight.
The rabbit stamps the baseline right! ✨

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: adding reference documentation for cocoindex-code and model2vec to the .please/docs/references/ directory.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch amondnet/cocoindex

Comment @coderabbitai help to get the list of available commands and usage tips.

@codacy-production

codacy-production Bot commented Jun 19, 2026

Copy link
Copy Markdown

Up to standards ✅

🟢 Issues 0 issues

Results:
0 new issues

View in Codacy

NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer
TIP This summary will be updated as you push new changes.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds reference analysis documentation for CocoIndex Code (cocoindex.md) and Model2Vec (model2vec.md), and updates the main reference index. The review feedback correctly identifies broken wiki-style links ([[dense-embedding-is-a-stub]]) in both files that will not render properly in standard GitHub Markdown, and suggests replacing them with code blocks or relative links, along with fixing a minor typo (an extra double quote) in model2vec.md.

Comment thread .please/docs/references/cocoindex.md Outdated
Comment thread .please/docs/references/model2vec.md Outdated
Comment thread .please/docs/references/model2vec.md Outdated
@codecov

codecov Bot commented Jun 19, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

… wiki-links

Apply AI code review suggestions (gemini-code-assist): GitHub markdown does
not render [[wiki-link]] syntax, and the target (dense-embedding-is-a-stub) is
a memory note with no in-repo path. Switch the 3 occurrences to backtick code
format.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
.please/docs/references/index.md (1)

24-31: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Soften the baseline wording for reference-only analyses.

These new entries are intentionally source-based (web docs / HF cards) and not commit-pinned, so “each analysis records the exact upstream commit” is now too strict. Please switch this to “revision or source set,” or add an explicit non-code analysis escape hatch.

📝 Suggested wording change
- Each analysis records the exact upstream commit it was written against.
+ Each analysis records the exact upstream revision or source set it was written against.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.please/docs/references/index.md around lines 24 - 31, The opening statement
about analyses recording the exact upstream commit is too strict for entries
like cocoindex-code and model2vec which use web docs and HF cards instead of
commit-pinned sources. Update the introductory sentence (before the table) to
accommodate both commit-pinned and source-based analyses by using broader
language like "revision or source set" or by explicitly noting that some
analyses are based on non-code sources like documentation and HF cards, ensuring
the baseline description accurately reflects the mixed analysis approach shown
in the table rows below.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.please/docs/references/cocoindex.md:
- Around line 40-41: The Embeddings row in the CSP column of the comparison
table currently states that TS/Rust "ship a deterministic stub until
integration", which misleadingly implies the Model2Vec integration is still
pending. Update the wording to clarify that the real Model2Vec static embeddings
integration is already implemented and loaded from
`crates/csp/src/indexing/dense.rs` via `model2vec-rs::StaticModel`, with the
deterministic stub serving only as a fallback when the model loading fails.

In @.please/docs/references/model2vec.md:
- Around line 5-8: The current wording in the model2vec documentation suggests
the model2vec-rs integration is still pending ("until integration lands"), but
the actual implementation in crates/csp/src/indexing/dense.rs already wires the
model2vec-rs StaticModel directly. Rephrase the sentence about the deterministic
stub fallback to clarify that the StaticModel is already integrated and the stub
is only used as a fallback when model loading fails, removing any language that
implies the integration is incomplete or pending.

---

Outside diff comments:
In @.please/docs/references/index.md:
- Around line 24-31: The opening statement about analyses recording the exact
upstream commit is too strict for entries like cocoindex-code and model2vec
which use web docs and HF cards instead of commit-pinned sources. Update the
introductory sentence (before the table) to accommodate both commit-pinned and
source-based analyses by using broader language like "revision or source set" or
by explicitly noting that some analyses are based on non-code sources like
documentation and HF cards, ensuring the baseline description accurately
reflects the mixed analysis approach shown in the table rows below.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: d73f0be4-de3f-4ac2-a7db-21d9afd493ea

📥 Commits

Reviewing files that changed from the base of the PR and between 6e31708 and 199cdd0.

📒 Files selected for processing (3)
  • .please/docs/references/cocoindex.md
  • .please/docs/references/index.md
  • .please/docs/references/model2vec.md

Comment thread .please/docs/references/cocoindex.md Outdated
Comment thread .please/docs/references/model2vec.md Outdated

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found and verified against the latest diff

Reply with feedback, questions, or to request a fix.

Re-trigger cubic

Comment thread .please/docs/references/model2vec.md Outdated
@amondnet amondnet enabled auto-merge (squash) June 19, 2026 18:55
Apply AI review (coderabbitai, cubic-dev-ai): the prior wording ('stub until
integration lands') implied the model2vec-rs wiring was still a TODO. Verified
against crates/csp/src/indexing/dense.rs — it already loads StaticModel via
StaticModel::from_pretrained; the deterministic stub is only a fallback on load
failure (offline/missing weights/bad path) or in tests. Reword cocoindex.md and
model2vec.md to reflect this, and keep the accurate note that the TS port's
dense signal is still a stub until Rust reaches parity.
@amondnet amondnet merged commit beae45d into main Jun 19, 2026
4 of 6 checks passed
@amondnet amondnet deleted the amondnet/cocoindex branch June 19, 2026 18:58
This was referenced Jun 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant