Skip to content

feat(types): port Chunk/SearchResult/ContentType from semble#5

Merged
amondnet merged 3 commits into
mainfrom
feat/unit-1-types
May 28, 2026
Merged

feat(types): port Chunk/SearchResult/ContentType from semble#5
amondnet merged 3 commits into
mainfrom
feat/unit-1-types

Conversation

@amondnet

@amondnet amondnet commented May 28, 2026

Copy link
Copy Markdown
Contributor

Port of src/semble/types.py to TypeScript — Unit 1 of the parallel semble → csp port effort.

What's ported

  • ContentType — string-literal const (Code = 'code', Docs = 'docs', Config = 'config'). Values match Python str enum so CLI flags and persisted indices round-trip.
  • CallType — string-literal const (Search = 'search', FindRelated = 'find_related'). Values match Python str enum for ~/.csp/savings.jsonl telemetry parity.
  • Chunk interface — content, filePath, startLine, endLine, language?. Public fields are camelCase, per ARCHITECTURE.md ("Public field names are camelCase, not snake_case"). All readonly to mirror the Python frozen=True dataclass.
  • SearchResult interface — { chunk, score }.
  • IndexStats interface — { indexedFiles, totalChunks, languages }.
  • EmbeddingMatrix = Float32Array (flat row-major) + EmbeddingShape = { rows, dim } companion. Comment explains the rationale: dense retrieval is one contiguous BLAS-style sweep, so a flat buffer beats Float32Array[] for cache locality and persistence simplicity.

Helper functions

  • chunkLocation(chunk)filePath:startLine-endLine (port of Python Chunk.location @property; kept as a free function because Chunk is a plain interface).
  • chunkToDict(chunk)ChunkDict including location. Emits language: null (not omitted) to mirror Python dataclasses.asdict JSON shape.
  • chunkFromDict(data)Chunk. Strips location before reconstruction (it's derived; trusting it would let a malformed payload desync from the line range). Accepts null | undefined | string for language (wire-format tolerant).
  • searchResultToDict(result)SearchResultDict.

Tests (src/types.test.ts)

9 tests covering:

  • ContentType / CallType enum-value parity with Python.
  • chunkLocation formatting (multi-line and single-line).
  • chunkToDictchunkFromDict roundtrip with language set and omitted.
  • chunkFromDict strips location (verified by passing a deliberately-wrong location).
  • chunkFromDict accepts language: null (wire format).
  • searchResultToDict shape.

Verification

  • bun test src/types.test.ts — 9 pass, 0 fail.
  • tsc --noEmit — clean (strict + noUncheckedIndexedAccess + exactOptionalPropertyTypes + verbatimModuleSyntax).

Notes

  • No package.json changes.
  • File uses extensionless relative imports (from './types') to match the existing cli.ts convention; tsconfig does not have allowImportingTsExtensions enabled.
  • Source: /Users/lms/.ask/github/github.com/MinishLab/semble/main/src/semble/types.py.

Summary by cubic

Ports core types and helpers from semble (Python) to TypeScript to keep wire-format and telemetry parity. Adds enums, interfaces, embeddings, serialization helpers, and runtime validation for untrusted JSON, including rejecting non-finite line numbers.

  • New Features
    • ContentType and CallType string-literal consts matching Python enums.
    • Interfaces: Chunk (camelCase, readonly), SearchResult, IndexStats.
    • Embeddings: EmbeddingMatrix = Float32Array and EmbeddingShape { rows, dim }.
    • Helpers: chunkLocation, chunkToDict (emits language: null), chunkFromDict (ignores location, accepts language: null, validates input and throws TypeError on malformed payloads; rejects NaN/±Infinity for startLine/endLine), searchResultToDict.
    • Tests: 12 cases covering enum parity, chunkLocation, roundtrips, error handling for bad input (including non-finite line numbers), and result shape.

Written for commit 7dc025d. Summary will update on new commits.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request ports Python types and serialization helpers to TypeScript in src/types.ts, along with corresponding unit tests in src/types.test.ts. The feedback suggests adding runtime validation to chunkFromDict to safely handle malformed or untrusted input data, and adding unit tests to verify that invalid inputs correctly throw errors.

Comment thread src/types.ts
Comment thread src/types.test.ts

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 2 files

Architecture diagram
sequenceDiagram
    participant Client as Client/CLI
    participant Types as Types Module
    participant Storage as Index Storage
    participant Telemetry as Telemetry Logger

    Note over Client,Telemetry: NEW: Type definitions imported from semble

    Client->>Types: Import ContentType/CallType
    Types-->>Client: String literal values ('code'/'docs'/'config', 'search'/'find_related')

    Note over Client,Types: Chunk lifecycle

    Client->>Types: chunkToDict(chunk)
    alt language is undefined
        Types->>Types: Emit language: null
    else language is set
        Types->>Types: Emit language value
    end
    Types->>Types: Compute location from filePath:startLine-endLine
    Types-->>Client: ChunkDict (includes location)

    alt Persisting to storage
        Client->>Storage: Write ChunkDict (JSON)
        Storage-->>Client: Confirmation
    end

    alt Reading from storage
        Client->>Storage: Read ChunkDict (JSON)
        Storage-->>Client: ChunkDict data
        Client->>Types: chunkFromDict(data)
        alt location present
            Types->>Types: Strip location (derived, not trusted)
        end
        alt language is null
            Types->>Types: Convert to undefined
        end
        alt language is string
            Types->>Types: Keep as string
        end
        Types-->>Client: Chunk (immutable, readonly)
    end

    Note over Client,Types: SearchResult serialization

    Client->>Types: searchResultToDict(result)
    Types->>Types: chunkToDict(chunk) for nested chunk
    Types-->>Client: SearchResultDict (chunk + score)

    Note over Client,Telemetry: Telemetry parity

    Client->>Telemetry: Log with CallType ('search'/'find_related')
    Telemetry-->>Client: Acknowledged

    Note over Client,Types: Embedding matrix operations

    Client->>Types: Create EmbeddingMatrix (Float32Array)
    Types-->>Client: Flat row-major buffer
    Client->>Types: Create EmbeddingShape { rows, dim }
    Types-->>Client: Shape descriptor
    Client->>Client: Compute embeddings @ query (contiguous BLAS sweep)
Loading

Re-trigger cubic

Add runtime validation to chunkFromDict to fail loudly with TypeError on
malformed JSON / untrusted payloads, since TypeScript's compile-time
ChunkDictInput is bypassed at the JSON boundary. Cover null/non-object,
missing fields, wrong-typed fields, and bad language with tests.

Identified by gemini-code-assist.

@amondnet amondnet left a comment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Applied 2 (both gemini-code-assist findings), deferred 0.

  • src/types.ts: chunkFromDict now validates input at runtime — null/non-object, missing required fields, and wrong-typed language all throw TypeError. The compile-time ChunkDictInput is bypassed at the JSON boundary, so failing loudly here prevents bad data (e.g. NaN line numbers) from polluting the index.
  • src/types.test.ts: added 3 new tests covering invalid inputs (null/non-object, missing/wrong-typed required fields, wrong-typed language).

cubic found no issues. All 12 tests pass under bun test.

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 2 files (changes from recent commits).

Reply with feedback, questions, or to request a fix.

Re-trigger cubic

Comment thread src/types.ts
@amondnet amondnet self-assigned this May 28, 2026
cubic-dev-ai P2 follow-up: typeof === 'number' permits NaN, Infinity,
-Infinity, which would propagate as broken line numbers downstream
(file-saturation math, location strings, find-related boundary checks).

Add Number.isFinite() checks alongside the existing typeof guard and
cover NaN, +Infinity, -Infinity in the test matrix.
@amondnet amondnet merged commit 94ba6aa into main May 28, 2026
1 check passed
@amondnet amondnet deleted the feat/unit-1-types branch May 28, 2026 15:57
This was referenced Jun 18, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant