Conversation
* feat: port roundtrip fidelity fixes and git import improvements
Port all 7 commits from chore/roundtirp-fidelity-test onto dev's
restructured codebase:
- Fix hunk consolidation: collapse any combination of Replace, Delete,
or middle-Insert hunks into a single whole-file Replace to prevent
N× content duplication in the graph
- Remove NeedsReplace fallback from globalize_hunk; middle insertions
are now caught by upstream consolidation
- Add GitDiffLine type and build_crdt_ops_from_git_diff for exact
git diff parity in CRDT ops
- Rewrite git import write_commit to use in-memory working copies
(Memory) instead of git checkout_tree, eliminating filesystem I/O
- Add FileMove GraphOp support in record, globalize, and insert
pipelines for proper rename tracking
- Fix binary file modifications (create Replace hunk when binary
content differs)
- Fix deleted file cleanup (explicit repo.remove after insert)
- Add double-deletion guard for FileMove in insert_change
- Use Phase 1 old_content instead of repo.get_file_content to avoid
O(N) graph scans per commit during import
- Add content roundtrip fidelity tests
- Add harness suites 11-13 (diff parity, full repo parity, import
fidelity)
- Extend harness suite 10 with status parity tests
Known issue: assemble_and_hash calls retrieve_graph per file per
commit, which is O(graph_size). For large imports (1000+ commits),
this causes progressive slowdown. Tracked for separate optimization
(in-memory content vertex cache).
* Add InodeGraphOps trait implementation to ViewGraph wrapper
The ViewGraph wrapper now delegates InodeGraphOps methods to the
underlying transaction, enabling O(m) file-local graph traversal within
view-filtered contexts. Update globalize module to accept InodeGraphOps
and use the fast INODE_GRAPH path for content vertex discovery.
* perf: use INODE_GRAPH for O(m) file-scoped traversal in globalize + batch imports
Two performance optimizations for git import:
1. Wire find_content_vertices to INODE_GRAPH secondary index
- find_content_vertices now uses the file-scoped INODE_GRAPH
B-tree instead of retrieve_graph (global GRAPH DFS)
- O(m) where m = edges for this file, not O(n) where n = all
edges in the repository
- Adds InodeGraphOps trait bound through globalize pipeline
- Implements InodeGraphOps delegation for ViewGraph wrapper
- Falls back to global GRAPH scan if INODE_GRAPH not populated
Before: assemble time grew linearly (40ms → 550ms by commit 300)
After: assemble time constant (3-14ms regardless of position)
hyperfine (1017 commits): Phase 2 dropped from O(N²) minutes
to 21.6 seconds flat
2. Batch import pipeline for memory-bounded processing
- Commits processed in parse→write batches instead of
parse-all→write-all
- Batch sizes tiered by repo size:
<5K: 250, 5-10K: 500, 10-20K: 1000, 20K+: 2500
- Keeps memory bounded for large repos (35K+ commits)
- Shows incremental progress during import
* perf: use eventual durability for redb write transactions
Skip fsync on every transaction commit. Data is still written to the
database file but the OS may buffer it. This reduces per-commit
overhead for all write operations.
Safe for most use cases — a crash could lose the last few transactions
but the repository can be rebuilt from change files.
* Bump version to 0.5.3
* Bump version to 0.5.3
* fix: use global retrieve_graph in find_content_vertices
The INODE_GRAPH fast path bypasses ViewGraph's change filter,
returning vertices from ALL changes regardless of view visibility.
This caused content duplication in the record() path which uses
ViewGraph wrapping.
Always use retrieve_graph (which respects whatever GraphTxnT impl
is provided) for content vertex discovery. The INODE_GRAPH
optimization remains available for callers that use a bare &txn
(like assemble_and_hash for git import).
Fixes 5 of 6 content roundtrip fidelity test failures.
Remaining: test_three_sequential_modifications (content truncation
from a separate issue in dev's record pipeline).
* fix: harness error handling and merge commit test
- Fix error handling section: capture exit code with || status=$?
instead of letting set -e kill the harness on expected failures
- Fix merge commit test: detect default branch name instead of
hardcoding master/main
* fix: restore del_edge_with_reverse in apply pipeline
Dev's refactoring removed the old-edge deletion from write_new_edge,
leaving both the original alive edge AND the new DELETED edge in the
B-tree multimap. This caused is_vertex_alive to find the stale alive
parent edge and incorrectly report dead vertices as alive.
Result: the second Replace on a file couldn't find the correct content
vertex, breaking sequential modifications (text and binary).
Restore del_edge_with_reverse before add_edge_with_reverse so the
superseded edge is removed. This matches the old branch's behavior
where apply_new_edge always deleted the previous edge first.
Fixes: all 9 content roundtrip fidelity tests now pass.
Fixes: suite 13 'Multiple Binary Modifications' (46/46 pass).
Closes #41.
* fix: remove Durability::Eventual — causes data visibility issues on Windows
Redb's Eventual durability skips fsync, which on Windows can cause
a read transaction opened immediately after a write commit to not
see the written data. This broke 6 of 9 content fidelity tests on
Windows while passing on macOS/Linux.
Revert to default durability (Immediate) for correctness across
all platforms.
* fix: use platform-aware paths in fidelity tests for Windows compatibility
Add test_path() helper that flattens subdirectory paths on Windows
(src/main.rs → src_main.rs) while keeping them as-is on Unix.
Windows has a known issue with subdirectory path tracking — this
workaround lets the fidelity tests exercise the record pipeline
without hitting the path normalization bug.
Git show commands still use the real repo path (src/main.rs) for
fetching content from the cloned hyperfine repo.
* fix: normalize path separators for Windows compatibility
On Windows, walkdir returns paths with backslash separators (src\main.rs)
but the TREE table stores forward slashes (src/main.rs). Status comparison
between tracked and disk paths failed because PathBuf uses byte-level
comparison — src/main.rs != src\main.rs.
Fix: normalize all paths to forward slashes in both the working copy scan
(collect_working_copy_files_with_rules) and the tracked path sets
(status). This ensures consistent comparison across platforms.
Also revert the test_path() workaround in fidelity tests — tests now use
real subdirectory paths (src/main.rs) since the underlying bug is fixed.
* fix: normalize path separators for Windows compatibility
On Windows, walkdir returns paths with backslash separators (src\main.rs)
but the TREE table stores forward slashes (src/main.rs). Status comparison
between tracked and disk paths failed because PathBuf uses byte-level
comparison — src/main.rs != src\main.rs.
Fix: normalize all paths to forward slashes in both the working copy scan
(collect_working_copy_files_with_rules) and the tracked path sets
(status). This ensures consistent comparison across platforms.
Also revert the test_path() workaround in fidelity tests — tests now use
real subdirectory paths (src/main.rs) since the underlying bug is fixed.
* perf: skip O(N) change filter scan for shared root views For shared views with no parent (the common case after git import), all changes are visible. Skip the expensive collect_view_change_ids scan (which loads every change file from disk to expand dependencies) in both status() and get_file_content(). Also skip content comparison in status for shared root views when there's no mtime cache entry — files are clean by construction after import. This helps but doesn't fully solve the large-repo status problem. With 35K+ changes and 40K files, status still hangs — likely due to two full TREE table scans plus filesystem walk. Needs a fundamentally different approach (index-based status like git). * perf: populate mtime cache during git import for instant status After git import, every file's mtime+size is stored in the pristine database. This lets 'atomic status' compare file metadata (stat) instead of reconstructing graph content for every file. Before: status on a 1017-commit repo hung (O(files × graph_traversal)) After: status completes in 0.1s (O(files × stat)) Also adds Repository::update_file_mtimes() public API for batch mtime cache population, and fast-paths get_file_content() and status() to skip O(N) change filter scan for shared root views. * perf: populate mtime cache after materialize Both materialize() and materialize_prefix() now stat all written files and store their mtime+size in the pristine database. This makes 'atomic status' after a server-side push + materialize instantaneous (stat comparison instead of graph traversal). Same pattern as the git import mtime population — best-effort, errors silently ignored. * fix: handle AllEmpty gracefully during git import for pure-deletion commits Commits that only delete files can produce RecordedFiles with hunks that globalization strips (e.g., when find_content_vertices returns empty for files whose graph content was already removed by prior commits in the same batch). Instead of failing with 'All recorded files are empty (no hunks)', fall back to an empty change and still run the explicit repo.remove() cleanup for deleted file TREE entries. The commit metadata (message, author, timestamp) is preserved in the change. This eliminates the 'Failed to write' warnings during large git imports (e.g., terraform with pure-deletion merge commits). * perf: replace FILE_MTIMES with FILE_INDEX — content hash eliminates graph traversal in status Schema change: FILE_MTIMES (20 bytes: mtime+size) → FILE_INDEX (52 bytes: mtime+size+blake3_hash). Status no longer calls get_file_content (which traverses the graph to reconstruct file content). Instead: 1. If mtime+size match index → clean (O(1) per file) 2. If mtime+size differ → hash disk file, compare with stored hash (O(file_size) per file, no graph traversal) This makes status O(files × stat) instead of O(files × graph). Before: status on 35K-commit terraform repo = 27 seconds After: status on 1017-commit hyperfine repo = 111ms Also updated all callers: - git import computes blake3 hash during mtime population - materialize computes blake3 hash after writing files - record stores content hash after recording - status uses content_hash for comparison instead of get_file_content * fix: remove deleted files from FILE_INDEX during git import After repo.remove() cleans up the TREE entry for deleted files, also remove the FILE_INDEX entry so status doesn't show them as stale deleted entries. Note: the deletion is still not recorded as a GraphOp::FileDel in the change — it's only a TREE+INDEX cleanup. Proper graph-level deletion recording is a follow-up. * fix: reconcile orphaned TREE entries after git import Merge commits can implicitly delete files by not including them from a second parent. Our per-commit diff only detects explicit deletions (FileOperation::Deleted), so files dropped during merge resolution leave orphaned TREE entries that show up as 'deleted' in status. Fix: after all import batches complete, compare every tracked path against the working copy. Files in TREE that don't exist on disk are removed from both TREE and FILE_INDEX. tokio: 3 orphaned entries cleaned up in <0.1s Result: status shows 'nothing to record, working tree clean' * fix: reconcile untracked files after git import (reverse direction) Merge commits can also implicitly ADD files from a second parent without an explicit FileOperation::Added in the first-parent diff. These files exist on disk but have no TREE entry, showing as 'untracked' or 'new file' in status. After removing orphaned TREE entries (direction 1), also run repo.status() to find untracked files and add them to tracking with FILE_INDEX entries. This completes the bidirectional reconciliation: - Direction 1: TREE entries without disk files → removed - Direction 2: Disk files without TREE entries → added * fix: reconcile untracked files after git import (reverse direction) Merge commits can also implicitly ADD files from a second parent without an explicit FileOperation::Added in the first-parent diff. These files exist on disk but have no TREE entry, showing as 'untracked' or 'new file' in status. After removing orphaned TREE entries (direction 1), also run repo.status() to find untracked files and add them to tracking with FILE_INDEX entries. This completes the bidirectional reconciliation: - Direction 1: TREE entries without disk files → removed - Direction 2: Disk files without TREE entries → added
- Use `sort_by_key` with `Reverse` instead of manual comparison - Replace `match` with `.is_some()` for simple existence checks - Implement `FromStr` trait instead of custom `from_str` method - Use `std::slice::from_ref` to avoid allocating single-element vec - Dereference hash instead of cloning - Simplify heading detection logic with `find` instead of `skip_while` - Reorder variable initialization for clarity
- Add 'Vault — Shared Project Brain' section with usage examples - Add 'Knowledge Graph Queries with LLM Assist' section with RAG pipeline - Add Vault Commands table to CLI Reference - Add experiment/** branch pattern to CI push trigger - Fix 10 clippy -D warnings across atomic-core, atomic-repository, atomic-cli
- Add 'Vault — Shared Project Brain' section with usage examples - Add 'Knowledge Graph Queries with LLM Assist' section with RAG pipeline - Add Vault Commands table to CLI Reference - Add experiment/** branch pattern to CI push trigger - Fix 10 clippy -D warnings across atomic-core, atomic-repository, atomic-cli
- Escape square brackets in KG table doc comments (broken intra-doc links) - Wrap generic types in backticks in query.rs doc comment (invalid HTML tags) - Remove broken intra-doc link to vault_store_kg
Tests asserted 2-char tokens (rs, in, am) were included, but tokenize_for_fts filters at >= 3 chars. Also fixed assertion for 'the' which is a stop word.
- Fix 3 vault deflation tests: add trailing newlines to default vault source files (MEMORY.md, system_prompt.md, code-intelligence.md) so content hash roundtrips correctly through materialize/scan cycle - Fix test_options_new_returns_defaults: assert record_empty_files is true (matching the Default impl) - Fix test_status_clean_after_record: status() is an exception-reporter that omits clean files; assert absence from non-clean lists instead - Fix test_diff_end_to_end_multiple_files: same status().clean() pattern - Fix atomic-cli compile errors: add missing Init and Promote match arms in vault/mod.rs and view/mod.rs test helpers Pre-existing failures not addressed: - test_three_sequential_modifications (graph overlay content retrieval) - test_log_run_json_empty (test isolation / working directory issue)
- Fix test_empty_file: record_empty_files defaults to true, so empty file recording succeeds; test both true and false behaviors - Fix test_log_run_json_empty: add missing #[serial] to CWD-mutating tests in init.rs and status.rs that raced with serial log tests
Introduces `atomic-teams` crate with organization and team management, plus workspace/project storage APIs. New CLI commands for managing orgs (create, list, show, update, delete, upgrade, switch) and teams (list, create, show, update, delete), with member and permission grant management. Feature-gated behind the `teams` flag in the CLI. Key additions: - `atomic-teams`: Domain logic for org/team CRUD, member management, grants, domain aliases - `atomic-remote`: `StorageClient` HTTP API for workspace/project operations - CLI commands: `atomic org`, `atomic team`, `atomic workspace`, `atomic project` - Knowledge graph expansion: `PART_OF` and `INCLUDES` edge kinds - Infrastructure: Content search index powered by syntext, agentic LLM tool-use system
Tests the full CLI ↔ atomic-storage round-trip for organizations, workspaces, projects, teams, members, and identity resolution. Includes server availability check, multi-user workflows, resource CRUD operations, and cleanup. Also hardens helpers.sh against unbound variable errors on bash <4.4 when arrays are empty under `set -u`.
Backfill and use a new CHANGE_DEPS index to expand view dependency closures without loading .change files. Add MutTxn::put_change_deps, pristine table support, reader/writer implementations and tests. Wire repository to populate the index on insert and expose repo.repair_change_dependency_index with a new `doctor repair-dependency-index` CLI command. Also add a fast mtime-only status check in the agent to skip unnecessary records.
Add FileIndexEntry/FileIndexMetadata aliases and export them from pristine traits; update implementations and callers to use the new types. Change the AI repo tool to return a structured file outline for large files instead of erroring when no line range is provided. Misc minor cleanups and formatting tweaks (hidden-dir check, small refactors)
Remove push branch triggers from ci.yml so CI runs on PRs to dev. Replace actions-rs/toolchain@v1 with dtolnay/rust-toolchain@stable and adjust inputs (use 'targets' for matrix builds) in release.yml.
Update CI toolchain usages, refresh Cargo.lock to pick roaring 0.11.4, and remove the atomic-repository pin on roaring 0.11.3 so syntext's dependency can upgrade.
Point CLI internal-error/help messages to the atomicdotdev GitHub org. Remove the 'test' job from .github/workflows/release.yml and make the 'build' job depend only on 'version' so the release workflow skips the matrix test run
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This pull request makes significant improvements to the documentation and configuration for the Atomic project, focusing on clarifying core concepts, updating the CI/CD setup, and adding support for new workspace members and features. The most important changes include major documentation enhancements to explain new features like workspace shelving, the vault, and the knowledge graph; updates to the GitHub Actions workflows for Rust toolchain management; and the addition of new crates and dependencies.
Documentation improvements:
README.mdto introduce and explain new concepts such as workspace shelving, the vault (project knowledge store), and the knowledge graph, including detailed usage examples and command references. Updated terminology from "stack" to "view" throughout, and added tables comparing features and commands. [1] [2] [3] [4] [5]AGENTS.mdto document workspace shelving, including references to relevant code and a summary of how ignored files and exposed paths are handled per view.CI/CD workflow updates:
actions-rs/toolchainwithdtolnay/rust-toolchainin all GitHub Actions workflows (ci.yml,release.yml), and updated toolchain versions to use Rust 1.90 where appropriate. Simplified the test step to run tests for all workspace members at once. [1] [2] [3] [4] [5] [6] [7] [8] [9]Workspace and dependency additions:
atomic-semantic,atomic-teams) and updated the workspace Rust version to 1.90 inCargo.toml. [1] [2]uuid) and content search indexing (syntext). [1] [2]Minor codebase improvements:
atomic-agent/src/event.rs.These changes collectively improve developer experience, documentation clarity, and CI reliability, while laying the groundwork for new features related to workspace management and code intelligence.