Skip to content

feat(utils): port isGitUrl/resolveChunk/formatResults from semble#2

Merged
amondnet merged 1 commit into
mainfrom
feat/unit-3-utils
May 28, 2026
Merged

feat(utils): port isGitUrl/resolveChunk/formatResults from semble#2
amondnet merged 1 commit into
mainfrom
feat/unit-3-utils

Conversation

@amondnet

@amondnet amondnet commented May 28, 2026

Copy link
Copy Markdown
Contributor

Port `src/semble/utils.py` to TypeScript as part of the parallel @pleaseai/csp port effort (Unit 3).

What changed

  • `src/utils.ts` — exports:
    • `isGitUrl(path)`: detects remote git URLs by scheme prefix (`https://`, `http://`, `ssh://`, `git://`, `git+ssh://`, `file://`) or scp-style `user@host:repo` (regex `/^[\w.-]+@[\w.-]+:(?!/)/` — excludes `user@host:/abs/path`).
    • `resolveChunk(chunks, filePath, line)`: returns the chunk containing `line`. Strict inner match (`line < endLine`) wins immediately; boundary match (`line === endLine`) is kept only as a fallback for end-of-file chunks. Mirrors Python logic exactly.
    • `formatResults(query, results)`: wraps `SearchResult.toDict()` outputs as `{ query, results }`.
  • `src/utils.test.ts` — 25 unit tests covering scheme detection, scp-style URLs, the `user@host:/path` exclusion, inner/boundary chunk resolution, multi-fallback ordering, cross-file filtering, and `formatResults` shape/ordering.

Stopgap types

`src/types.ts` doesn't exist yet (Unit 1's territory). Per the task brief, structural `Chunk` / `SearchResult` types are defined inline using camelCase fields (`filePath`, `startLine`, `endLine`) per the @pleaseai/csp public-API convention. These should be replaced with re-exports from `./types.ts` once Unit 1 merges.

Verification

```
bun test src/utils.test.ts
25 pass
0 fail
25 expect() calls
```

Source

`src/semble/utils.py`


Summary by cubic

Ports isGitUrl, resolveChunk, and formatResults from Python to TypeScript with full test coverage. This delivers Unit 3 utils for the @pleaseai/csp port and keeps behavior 1:1 with the original.

  • New Features
    • src/utils.ts exports:
      • isGitUrl(path): detects git URLs by scheme or scp-style user@host:repo (excludes user@host:/abs/path).
      • resolveChunk(chunks, filePath, line): strict inner match wins; endLine boundary is a fallback.
      • formatResults(query, results): returns { query, results } via toDict().
    • Stopgap Chunk/SearchResult interfaces inline; replace with ./types.ts when available.
    • src/utils.test.ts: 25 tests for URL detection, chunk resolution, and result formatting.

Written for commit fb5cb4e. Summary will update on new commits.

Port src/semble/utils.py to TypeScript:

- isGitUrl: detects remote git URLs by scheme prefix
  (https/http/ssh/git/git+ssh/file) or scp-style user@host:repo
  (excludes user@host:/abs/path via negative lookahead).
- resolveChunk: returns the chunk containing line in filePath, with
  a strict inner match (line < endLine) winning over a boundary
  match (line === endLine) which is kept only as a fallback for
  end-of-file lines.
- formatResults: wraps SearchResult.toDict outputs as
  { query, results }.

Stopgap structural Chunk/SearchResult types are defined inline
until src/types.ts lands from Unit 1.

Ref: src/semble/utils.py

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request ports utility functions and their corresponding tests from Python to TypeScript, introducing helper functions for identifying Git URLs, resolving code chunks by line number, and formatting search results. The review feedback suggests improving cross-platform compatibility in resolveChunk by normalizing path separators to handle Windows backslashes and adding a unit test to verify this behavior.

Comment thread src/utils.ts
Comment thread src/utils.test.ts

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 2 files

Architecture diagram
sequenceDiagram
    participant Client as Caller (e.g. Search API)
    participant Utils as utils.ts
    participant Types as types.ts (future)
    participant Python as Original semble Python

    Note over Client,Python: Port of `src/semble/utils.py` to TypeScript

    Client->>Utils: isGitUrl(path)
    alt Scheme match (https://, http://, ssh://, git://, git+ssh://, file://)
        Utils->>Utils: Check path.startsWith(scheme)
        Utils-->>Client: true
    else SCP-style match (user@host:repo)
        Utils->>Utils: Test /^[\w.-]+@[\w.-]+:(?!\/)/
        Utils-->>Client: true / false
    else No match
        Utils-->>Client: false
    end

    Client->>Utils: resolveChunk(chunks, filePath, line)
    Utils->>Utils: Iterate chunks matching filePath
    alt line < endLine (strict inner match)
        Utils->>Utils: Return chunk immediately
    else line === endLine (boundary match)
        Utils->>Utils: Store as fallback (first only)
    end
    Note over Utils: After loop, return fallback or null

    Client->>Utils: formatResults(query, results)
    Utils->>Utils: Map results via toDict()
    Utils-->>Client: { query, results }

    Note over Types: Stopgap inline types until Unit 1 merges
    Note over Python: Behavior identical to Python original
Loading

Re-trigger cubic

@amondnet amondnet left a comment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Applied 0, deferred 2.

Both gemini-code-assist comments asked to add backslash→slash path normalization inside resolveChunk plus a matching test. Deferring both:

  • Upstream semble.utils.resolve_chunk uses strict chunk.file_path == file_path equality — no normalization. Parity with semble is load-bearing for this port (per CLAUDE.md).
  • resolveChunk is a pure key lookup over chunks produced by the indexer. Cross-platform path canonicalization is a file-walker / indexing concern (Unit 8), where both stored and queried paths can share one canonical form. Normalizing only at the lookup site would mask, not fix, inconsistent emission.

cubic-dev-ai found no issues.

@amondnet amondnet self-assigned this May 28, 2026
@amondnet amondnet merged commit 7a15929 into main May 28, 2026
1 check passed
@amondnet amondnet deleted the feat/unit-3-utils branch May 28, 2026 16:05
This was referenced Jun 18, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant