Summary
Semantic vector embeddings for report content via Gemini gemini-embedding-001 + pgvector, enabling natural-language search across all persisted memoranda and deliverables.
Implementation (merged 2026-03-13)
- Feature flag:
EMBEDDING_PERSISTENCE=false (default OFF)
- Dependencies:
HOOK_DB_PERSISTENCE=true, GEMINI_API_KEY, PostgreSQL with pgvector extension
- New dependencies:
@google/genai ^1.45.0, pgvector ^0.2.1
Key Files
| File |
Purpose |
src/utils/embeddingService.js |
Core service — chunk, embed, store, search |
src/db/postgres.js |
report_embeddings + citation_embeddings DDL |
src/utils/hookDBBridge.js |
Fire-and-forget integration via setImmediate |
src/server/dbFrontendRouter.js |
GET /api/db/search-semantic endpoint |
src/config/featureFlags.js |
EMBEDDING_PERSISTENCE flag |
Architecture
- Chunks markdown by
## headers (max 8192 chars/chunk)
- Batch embeds via Gemini (1536 dimensions, RETRIEVAL_DOCUMENT task type)
- Transactional DELETE + batch INSERT (no partial states)
- HNSW index for fast cosine similarity search
- Fire-and-forget via
setImmediate — never blocks hook chain
- Dynamic import keeps service unloaded when flag is OFF
REST Endpoint
GET /api/db/search-semantic?q=...&limit=10&threshold=0.3&sessionId=...
Test Coverage
- Unit: chunkByHeaders, normalizeReportType, markdown preservation
- Integration: pgvector schema + CRUD with fabricated vectors
- Cloud SQL: end-to-end with live Gemini API
Commits
0b22761 feat: add embedding-persistence feature flag and dependencies
903f993 feat: add pgvector embedding schema DDL and ensureEmbeddingSchema()
e48ee00 feat: add embeddingService.js — Gemini embedding API + pgvector storage
5c0aade feat: wire embedding generation into persistReport via setImmediate
3945971 feat: add embedding startup init + GET /api/db/search-semantic endpoint
babd0b0 test: add embedding unit, integration, and cloud-sql tests
2c323f7 refactor: use dynamic import for embedAndStore in hookDBBridge
ec4167d fix: close 4 gaps — transactional writes, UUID validation, batch INSERT, schema guard
Related
Summary
Semantic vector embeddings for report content via Gemini
gemini-embedding-001+ pgvector, enabling natural-language search across all persisted memoranda and deliverables.Implementation (merged 2026-03-13)
EMBEDDING_PERSISTENCE=false(default OFF)HOOK_DB_PERSISTENCE=true,GEMINI_API_KEY, PostgreSQL with pgvector extension@google/genai^1.45.0,pgvector^0.2.1Key Files
src/utils/embeddingService.jssrc/db/postgres.jsreport_embeddings+citation_embeddingsDDLsrc/utils/hookDBBridge.jssetImmediatesrc/server/dbFrontendRouter.jsGET /api/db/search-semanticendpointsrc/config/featureFlags.jsEMBEDDING_PERSISTENCEflagArchitecture
##headers (max 8192 chars/chunk)setImmediate— never blocks hook chainREST Endpoint
GET /api/db/search-semantic?q=...&limit=10&threshold=0.3&sessionId=...Test Coverage
Commits
0b22761feat: add embedding-persistence feature flag and dependencies903f993feat: add pgvector embedding schema DDL and ensureEmbeddingSchema()e48ee00feat: add embeddingService.js — Gemini embedding API + pgvector storage5c0aadefeat: wire embedding generation into persistReport via setImmediate3945971feat: add embedding startup init + GET /api/db/search-semantic endpointbabd0b0test: add embedding unit, integration, and cloud-sql tests2c323f7refactor: use dynamic import for embedAndStore in hookDBBridgeec4167dfix: close 4 gaps — transactional writes, UUID validation, batch INSERT, schema guardRelated