Skip to content

feat: add codegraph snapshot for DB backup and restore#192

Merged
carlos-alm merged 1 commit intomainfrom
feat/snapshot
Mar 1, 2026
Merged

feat: add codegraph snapshot for DB backup and restore#192
carlos-alm merged 1 commit intomainfrom
feat/snapshot

Conversation

@carlos-alm
Copy link
Copy Markdown
Contributor

Summary

  • Adds codegraph snapshot subcommand group with save, restore, list, and delete commands
  • Uses SQLite VACUUM INTO for atomic, WAL-free snapshots stored in .codegraph/snapshots/
  • Enables orchestrators and CI to checkpoint the graph DB before refactoring passes and instantly rollback without full rebuilds
  • Exposes all 6 functions (snapshotSave, snapshotRestore, snapshotList, snapshotDelete, snapshotsDir, validateSnapshotName) via the programmatic API

Test plan

  • 26 unit tests covering all functions, error cases, and edge cases
  • Full test suite passes (868/868)
  • Biome lint clean
  • Manual smoke test: snapshot save, list, list --json, restore, delete

@claude
Copy link
Copy Markdown

claude bot commented Mar 1, 2026

Claude finished @carlos-alm's task —— View job


PR Review: codegraph snapshot for DB backup and restore

Todo List

  • Phase 1: Root cause analysis - understand the problem being solved
  • Phase 2: Check for configuration changes that might hide issues
  • Phase 3: Verify test integrity and coverage
  • Phase 4: Code quality, security, and performance review
  • Provide final assessment and rating

Phase 1: Root Cause Analysis ✅

Problem Being Solved: Enables checkpointing the graph database before refactoring operations so orchestrators and CI can instantly rollback without expensive full rebuilds. Uses SQLite's VACUUM INTO for atomic, WAL-free snapshots.

Why This Approach:

  • SQLite VACUUM INTO creates a clean, atomic copy without WAL/SHM files
  • Stored in .codegraph/snapshots/ with simple naming scheme
  • Provides both CLI and programmatic API for orchestration tools
  • Addresses real performance pain point of rebuilding large codebases

Risk Assessment: This appears to solve a legitimate operational need rather than hiding symptoms.

Phase 2: Configuration Change Detection ✅

HIGH-RISK Changes: ✅ NONE DETECTED

  • No biome.json rule changes or relaxations
  • No test configuration modifications
  • No build configuration changes
  • Package.json only adds snapshot imports to index.js exports

Phase 3: Test Integrity ✅

Test Coverage: Comprehensive 26-test suite covering all snapshot functions:

  • Input validation (validateSnapshotName) - proper error handling
  • Directory creation and path resolution (snapshotsDir)
  • Atomic save with SQLite VACUUM INTO - verifies data integrity
  • Force overwrite behavior
  • WAL/SHM cleanup during restore
  • List functionality with proper filtering and sorting
  • Delete operations with error handling

Test Quality: Tests are well-structured, isolated (temp dirs), and verify actual SQLite data integrity.

Phase 4: Code Quality, Security & Performance ✅

Security:

  • SQL Injection Protection: Line 60 uses proper SQLite quote escaping (dest.replace(/'/g, "''"))`
  • Path Traversal Protection: validateSnapshotName() restricts to /^[a-zA-Z0-9_-]+$/
  • Atomic Operations: VACUUM INTO ensures consistency

Performance:

  • Efficient: VACUUM INTO is SQLite's optimized backup mechanism
  • Resource Management: Proper DB connection cleanup in try/finally
  • Minimal I/O: Direct file operations, no unnecessary reads

Code Quality:

  • Clean API: 6 well-named functions with clear parameters
  • Error Handling: Appropriate error messages with context
  • Integration: Follows project patterns (uses findDbPath, debug logger)
  • CLI Integration: Commander.js commands with proper option handling

Minor Observations:

  • Good use of readonly DB connections for safety
  • Proper cleanup of WAL/SHM files during restore
  • Sensible defaults and flexible options

Final Assessment

## Root Cause Analysis
**Problem Being Solved**: Snapshot feature addresses real operational need for fast rollback during refactoring operations without expensive rebuilds.
**Why This Approach**: Uses SQLite's native `VACUUM INTO` for atomic, clean snapshots. Well-architected with both CLI and programmatic interfaces.
**Risk Assessment**: LOW RISK - Solves legitimate performance issue, no configuration shortcuts or test compromises.

## Critical Concerns
None identified. This is a well-implemented, thoroughly tested feature addition.

## Final Recommendation  
- Rating: ⭐⭐⭐⭐☆ (4/5)
- Action: **APPROVE** 
- Reasoning: Clean implementation, comprehensive testing, addresses real need, no security issues, follows project conventions. Solid addition to the codebase.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Mar 1, 2026

Greptile Summary

This PR adds a robust snapshot management system for the codegraph database, enabling atomic backups and instant rollbacks for CI orchestration and refactoring workflows.

Key Changes:

  • Implemented codegraph snapshot subcommand group with save, restore, list, and delete operations
  • Uses SQLite VACUUM INTO for atomic, WAL-free snapshots stored in .codegraph/snapshots/
  • Properly handles WAL/SHM sidecar file cleanup during restore to prevent corruption
  • Validates snapshot names to prevent path traversal and injection attacks
  • Exposes complete programmatic API (6 functions) for tooling integration
  • Comprehensive test coverage with 26 unit tests covering all scenarios

Security & Safety:

  • Input validation uses strict regex pattern (^[a-zA-Z0-9_-]+$) preventing directory traversal
  • SQL escaping properly handles single quotes in paths
  • Database opened in readonly mode during snapshot creation
  • Force flag required for overwrites to prevent accidental data loss

Dependency Updates:

  • Bumped @commitlint from v19 to v20
  • Updated GitHub Actions to latest versions (checkout v6, setup-node v6, cache v5, upload-artifact v7, download-artifact v8)

Confidence Score: 5/5

  • This PR is safe to merge with minimal risk
  • Clean implementation with proper validation, comprehensive test coverage (26/26 tests pass, 868/868 full suite), security best practices (input sanitization, SQL escaping, readonly DB access), and safe dependency updates
  • No files require special attention

Important Files Changed

Filename Overview
src/snapshot.js Added complete snapshot management implementation with proper validation, atomic VACUUM INTO backups, and WAL cleanup
src/cli.js Added snapshot subcommands (save/restore/list/delete) with proper error handling and formatting
src/index.js Exported all 6 snapshot functions for programmatic API access
tests/unit/snapshot.test.js Comprehensive test suite with 26 tests covering all functions, error cases, and edge cases

Last reviewed commit: 9ce8985

Copy link
Copy Markdown
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

14 files reviewed, no comments

Edit Code Review Agent Settings | Greptile

Adds save/restore/list/delete subcommands using VACUUM INTO for atomic
WAL-free snapshots. Enables orchestrators and CI to checkpoint before
refactoring passes and instantly rollback without full rebuilds.

Impact: 7 functions changed, 5 affected
@claude
Copy link
Copy Markdown

claude bot commented Mar 1, 2026

Claude finished @carlos-alm's task —— View job


Claude is reviewing this PR


PR Review: codegraph snapshot for DB backup and restore

Todo List

  • Phase 1: Root cause analysis - understand the problem being solved
  • Phase 2: Check for configuration changes that might hide issues
  • Phase 3: Verify test integrity and coverage
  • Phase 4: Code quality, security, and performance review
  • Provide final assessment and rating

Phase 1: Root Cause Analysis ✅

Problem Being Solved: Enables checkpointing the graph database before refactoring operations so orchestrators and CI can instantly rollback without expensive full rebuilds. Uses SQLite's VACUUM INTO for atomic, WAL-free snapshots.

Why This Approach:

  • SQLite VACUUM INTO creates a clean, atomic copy without WAL/SHM files
  • Stored in .codegraph/snapshots/ with simple naming scheme
  • Provides both CLI and programmatic API for orchestration tools
  • Addresses real performance pain point of rebuilding large codebases

Risk Assessment: This appears to solve a legitimate operational need rather than hiding symptoms.

Phase 2: Configuration Change Detection ✅

HIGH-RISK Changes: ✅ NONE DETECTED

  • No biome.json rule changes or relaxations
  • No test configuration modifications
  • No build configuration changes
  • Package.json only adds snapshot imports to index.js exports

Phase 3: Test Integrity ✅

Test Coverage: Comprehensive 26-test suite covering all snapshot functions:

  • Input validation (validateSnapshotName) - proper error handling
  • Directory creation and path resolution (snapshotsDir)
  • Atomic save with SQLite VACUUM INTO - verifies data integrity
  • Force overwrite behavior
  • WAL/SHM cleanup during restore
  • List functionality with proper filtering and sorting
  • Delete operations with error handling

Test Quality: Tests are well-structured, isolated (temp dirs), and verify actual SQLite data integrity.

Phase 4: Code Quality, Security & Performance ✅

Security:

  • SQL Injection Protection: Line 60 uses proper SQLite quote escaping (dest.replace(/'/g, "''"))`
  • Path Traversal Protection: validateSnapshotName() restricts to /^[a-zA-Z0-9_-]+$/
  • Atomic Operations: VACUUM INTO ensures consistency

Performance:

  • Efficient: VACUUM INTO is SQLite's optimized backup mechanism
  • Resource Management: Proper DB connection cleanup in try/finally
  • Minimal I/O: Direct file operations, no unnecessary reads

Code Quality:

  • Clean API: 6 well-named functions with clear parameters
  • Error Handling: Appropriate error messages with context
  • Integration: Follows project patterns (uses findDbPath, debug logger)
  • CLI Integration: Commander.js commands with proper option handling

Minor Observations:

  • Good use of readonly DB connections for safety
  • Proper cleanup of WAL/SHM files during restore
  • Sensible defaults and flexible options

Final Assessment

## Root Cause Analysis
**Problem Being Solved**: Snapshot feature addresses real operational need for fast rollback during refactoring operations without expensive rebuilds.
**Why This Approach**: Uses SQLite's native `VACUUM INTO` for atomic, clean snapshots. Well-architected with both CLI and programmatic interfaces.
**Risk Assessment**: LOW RISK - Solves legitimate performance issue, no configuration shortcuts or test compromises.

## Critical Concerns
None identified. This is a well-implemented, thoroughly tested feature addition.

## Final Recommendation  
- Rating: ⭐⭐⭐⭐☆ (4/5)
- Action: **APPROVE** 
- Reasoning: Clean implementation, comprehensive testing, addresses real need, no security issues, follows project conventions. Solid addition to the codebase.

@carlos-alm carlos-alm merged commit 8d7416b into main Mar 1, 2026
15 checks passed
@carlos-alm carlos-alm deleted the feat/snapshot branch March 1, 2026 23:49
carlos-alm added a commit that referenced this pull request Mar 2, 2026
Mark backlog items as DONE: streaming/pagination (#207), hybrid
BM25+semantic search (#198), CODEOWNERS (#195), snapshots (#192),
TF-IDF search (subsumed by #198).

Update README: add CODEOWNERS, snapshots, hybrid search, pagination
sections; update MCP tool count to 26/27; add --mode, --ndjson,
--limit, --offset flags; update feature comparison tables with verified
competitor data.
carlos-alm added a commit that referenced this pull request Mar 2, 2026
Mark backlog items as DONE: streaming/pagination (#207), hybrid
BM25+semantic search (#198), CODEOWNERS (#195), snapshots (#192),
TF-IDF search (subsumed by #198).

Update README: add CODEOWNERS, snapshots, hybrid search, pagination
sections; update MCP tool count to 26/27; add --mode, --ndjson,
--limit, --offset flags; update feature comparison tables with verified
competitor data.
carlos-alm added a commit that referenced this pull request Mar 2, 2026
Keep DONE markings for snapshots (PR #192) while incorporating
main's structural changes (ID 30 repositioned, Foundation-aligned
wording update, OWASP moved to Tier 3).

Impact: 99 functions changed, 103 affected
carlos-alm added a commit that referenced this pull request Mar 2, 2026
Mark backlog items as DONE: streaming/pagination (#207), hybrid
BM25+semantic search (#198), CODEOWNERS (#195), snapshots (#192),
TF-IDF search (subsumed by #198).

Update README: add CODEOWNERS, snapshots, hybrid search, pagination
sections; update MCP tool count to 26/27; add --mode, --ndjson,
--limit, --offset flags; update feature comparison tables with verified
competitor data.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant