feat(agent): add Hermes discovery adapter (#27)#68
Open
jeanfbrito wants to merge 3 commits into
Open
Conversation
Introduces the read-path counterpart to the existing AgentHook write-path: discovery adapters that scan agent storage already on disk and produce normalized traces and events for the provenance import pipeline. - `discovery/types.rs`: `DiscoveredTrace`, `DiscoveredEvent`, `DiscoveredEventType`, `StorageKind` with serde round-trip. - `discovery/mod.rs`: object-safe `TraceDiscovery` trait and `DiscoveryRegistry` mirroring the `AgentRegistry` surface. - `error.rs`: new `AgentError::AdapterNotFound` variant with a discovery- appropriate `suggestion()`, keeping the hook-side `AgentNotFound` intact. - `lib.rs`: re-exports + module doc bullet flagged "foundation only". - `docs/ATOMIC-AGENT-TASKS.md`: Phase 14.4 entry listing what landed and what is deferred. Foundation only. Concrete adapters and reader helpers land in atomicdotdev#14, atomicdotdev#18–atomicdotdev#28.
Adds shared storage-backend readers for the discovery subsystem:
- `read_jsonl` / `read_jsonl_since`: streaming line-by-line JSONL with
skip+warn for malformed lines and invalid UTF-8; cursor-resumable via
byte offset (clamped to file size to keep past-EOF cursors valid).
- `read_json`: full-document read with optional UTF-8 BOM strip and a
64 MiB size cap to bound memory.
- `open_sqlite_readonly`: opens via `OpenFlags::SQLITE_OPEN_READ_ONLY`
only — never creates a sidecar or empty DB. Doc notes the
`Connection: !Sync` constraint for `TraceDiscovery` adapter authors.
Also:
- new `rusqlite = "0.31"` (bundled) dependency on atomic-agent
- new `AgentError::DiscoveryReadFailed { path, reason }` variant
- 17 unit tests with `tempfile` fixtures covering happy/edge/error paths
including BOM in both JSONL and JSON, UTF-8 invalid-byte skipping,
past-EOF offset clamp, oversize JSON rejection, and SQLite write-block
Deferred:
- File truncation/rotation detection (cursor invariant flagged in docs;
proper handling lands with the first polling adapter — Hermes, atomicdotdev#27)
- Cargo feature-gating for the discovery subsystem (callable today via
the always-on path; revisit if a lean atomic-cli build is needed)
Implements `TraceDiscovery` for the Hermes agent (NousResearch/hermes-agent, schema version 6). Reads conversation history from `~/.hermes/state.db` and wires Hermes into `DiscoveryRegistry::with_defaults()` so the import pipeline picks it up automatically. Canonical schema (quoted from upstream `hermes_state.py`): - `sessions(id TEXT PK, source, model, parent_session_id, started_at REAL, ended_at REAL, ..., title TEXT, ...)` - `messages(id INTEGER PK AUTOINCREMENT, session_id, role TEXT NOT NULL, content TEXT, tool_call_id, tool_calls TEXT, tool_name, timestamp REAL, reasoning, reasoning_details, codex_reasoning_items, ...)` There is no `part` table — tool calls live as a JSON array on `messages.tool_calls`, and tool results arrive as `messages` rows with `role = 'tool'`. Timestamps are seconds since epoch with fractional precision (REAL). Adapter behavior: - `list_traces()`: reads from `sessions`, filters `parent_session_id IS NULL` (matching upstream `list_sessions_rich`), pulls the title from the sessions row, and builds the preview via a correlated subquery against the first user message (capped to 120 chars). - `read_events()`: reads `messages` under a `BEGIN DEFERRED` snapshot to keep WAL-mode concurrent writers from splitting the view. For each row, emits `AssistantThinking` before the role-derived event when `reasoning` or `reasoning_details` is set; parses `tool_calls` JSON into per-call `ToolCall` events; maps `role = 'tool'` rows to `ToolResult`. Defensive consecutive-equivalence dedup runs after the merge (Hermes already dedupes on write via `_last_flushed_db_idx`). - `seconds_to_datetime(f64)` helper preserves fractional-second precision and clamps non-finite/negative values to an epoch sentinel with a `log::warn!`. - `HermesDiscovery::new()` is infallible (falls back to a relative `.hermes/` path if the home directory cannot be resolved); `is_available()` is a cheap probe (`root.is_dir() && db.is_file()`). Tests: 17 unit tests cover the canonical schema end-to-end — trait shape, availability probing, session listing (empty, parent-only filtering, title source, chronological preview, MAX(timestamp) trace time), event reading (ordering, AssistantThinking-before-AssistantText, tool-call JSON parsing, tool-role → ToolResult, malformed-JSON tolerance, dedup, unknown session).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Implements `TraceDiscovery` for the Hermes agent (NousResearch/hermes-agent, schema version 6) and wires it into `DiscoveryRegistry::with_defaults()` so the import pipeline picks it up automatically. Built on the discovery foundation (#66) and reader helpers (#67).
Canonical schema
Verified against the upstream source — see `hermes_state.py` lines 36–90 (schema version 6, May 2026):
```sql
CREATE TABLE sessions (
id TEXT PRIMARY KEY,
source TEXT NOT NULL,
user_id TEXT,
model TEXT,
model_config TEXT,
system_prompt TEXT,
parent_session_id TEXT,
started_at REAL NOT NULL,
ended_at REAL,
...,
title TEXT,
...
);
CREATE TABLE messages (
id INTEGER PRIMARY KEY AUTOINCREMENT,
session_id TEXT NOT NULL REFERENCES sessions(id),
role TEXT NOT NULL,
content TEXT,
tool_call_id TEXT,
tool_calls TEXT, -- JSON array of OpenAI-style tool calls
tool_name TEXT,
timestamp REAL NOT NULL,
token_count INTEGER,
finish_reason TEXT,
reasoning TEXT,
reasoning_details TEXT,
codex_reasoning_items TEXT
);
```
Two important deltas from the issue body's initial description:
Adapter behavior
Test plan
Deferred (planned follow-ups)
Stacked on #66 and #67. Closes #27.