feat(tree-summarizer): hierarchical summary tree module with CLI#423
feat(tree-summarizer): hierarchical summary tree module with CLI#423senamakel merged 10 commits intotinyhumansai:mainfrom
Conversation
…d event handling - Introduced a new `tree_summarizer` module to manage hierarchical time-based summaries, organizing data into a tree structure (root → year → month → day → hour). - Added functionality to ingest raw content, summarize it into hour leaves, and propagate summaries upward through the tree. - Implemented event handling for summarization completion and tree rebuild events, enhancing observability and modularity. - Created RPC operations for ingesting content, triggering summarization, querying the tree, and retrieving tree status. - Added comprehensive tests to ensure the reliability of the summarization process and event handling. This update significantly enhances the summarization capabilities of the system, allowing for efficient data organization and retrieval.
- Introduced a new `tree-summarizer` command to the CLI, allowing users to ingest content, run summarization jobs, query the summary tree, check status, and rebuild the tree. - Updated the CLI help documentation to include the new command and its subcommands. - Added a new module `tree_summarizer_cli` to encapsulate the tree summarization functionality. This enhancement improves the usability of the summarization features, providing a streamlined interface for managing hierarchical summaries directly from the command line.
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds a new tree_summarizer namespace: types, storage, engine, RPCs, CLI entry, controller schemas, event definitions/subscriber, and startup/registry integration across multiple modules and tests. Changes
Sequence Diagram(s)sequenceDiagram
actor User
participant CLI as "CLI Handler"
participant Ops as "RPC Ops"
participant Engine as "Engine"
participant Store as "Store"
participant EventBus as "Event Bus"
User->>CLI: tree-summarizer ingest --content "..."
CLI->>Ops: tree_summarizer_ingest(namespace, content)
Ops->>Store: buffer_write(content)
Ops-->>CLI: RpcOutcome (buffer metadata)
User->>CLI: tree-summarizer run
CLI->>Ops: tree_summarizer_run(namespace)
Ops->>Engine: run_summarization(namespace, ts)
Engine->>Store: buffer_drain(namespace)
Engine->>Engine: summarize hour node
Engine->>Store: write_node(hour_node)
Engine->>EventBus: emit TreeSummarizerHourCompleted
loop propagate upward
Engine->>Store: read_children(parent_id)
Engine->>Engine: combine/summarize children
Engine->>Store: write_node(parent_node)
Engine->>EventBus: emit TreeSummarizerPropagated
end
Ops-->>CLI: RpcOutcome (result JSON)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~55 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 10
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/openhuman/tree_summarizer/bus.rs`:
- Around line 1-4: bus.rs is only a stub and lacks a subscriber, so events
TreeSummarizerHourCompleted, TreeSummarizerPropagated, and
TreeSummarizerRebuildCompleted (defined in event_bus/events.rs and emitted from
engine.rs) are never handled; implement a TreeSummarizerEventSubscriber struct
in src/openhuman/tree_summarizer/bus.rs that implements the EventHandler trait
(matching the handler signature used by other domain subscribers) and processes
those three event types, then register it at startup by calling
event_bus::subscribe_global(Box::new(TreeSummarizerEventSubscriber::new())) in
the startup registration code (follow the pattern used by CronDeliverySubscriber
in channels/runtime/startup.rs) so the subscriber is active on application boot.
In `@src/openhuman/tree_summarizer/engine.rs`:
- Around line 117-129: The rebuild flow deletes the whole tree
(store::delete_tree) after collect_hour_leaves_recursive (which skips the
"tree/buffer"), causing unsummarized buffered inputs to be lost; before calling
store::delete_tree, locate the buffer directory under the tree base (from
store::tree_dir(config, namespace) + "/buffer"), atomically move or copy its
contents to a temporary safe location, then proceed with delete and rebuild, and
finally replay or restore the buffered files back into the namespace (using your
ingestion/replay function or reinsert logic) so buffered inputs are preserved
and reprocessed after the rebuild.
- Around line 35-73: The code currently uses the run timestamp (ts) and
derive_node_ids(&ts) to make a single hour leaf for all buffered entries, which
causes wrong bucketing and overwrites; change the flow to group the buffered
entries returned by store::buffer_drain(...) by each entry's own timestamp
(e.g., entry.timestamp or equivalent), call derive_node_ids(...) per-group/hour
to get that hour_id, produce a combined text per-hour and call
summarize_to_limit(provider, &combined_per_hour, NodeLevel::Hour.max_tokens(),
"hour", &hour_id).await for each group, then build a TreeNode per hour_id and
call store::write_node(config, &hour_node) per-hour (before writing, read
existing hour node if any and merge/accumulate summaries or child_count to avoid
overwriting). Ensure references to hour_id, derive_parent_id, TreeNode,
summarize_to_limit, and store::write_node are updated to operate per-hour group
rather than once for the global ts.
- Around line 283-303: After receiving the LLM output from
provider.chat_with_system (used with SUMMARIZATION_MODEL and
SUMMARIZATION_TEMP), validate the response against the allowed summary size (max
tokens/characters) and enforce a postcondition: compute the response size (using
your tokenizer if available, otherwise a character limit), and if it exceeds the
configured maximum (e.g., SUMMARIZATION_MAX_TOKENS or a MAX_SUMMARY_CHARS
constant), either truncate the response to that limit or return an
error/contextual failure including node_id and level_name; update the code path
around the response variable in engine.rs to perform this check before
logging/tracing and returning Ok(response).
In `@src/openhuman/tree_summarizer/ops.rs`:
- Around line 84-96: Validate the user-controlled node_id before calling
store::read_node and store::read_children: reject or normalize any input that is
not exactly "root" or matching the canonical date-path patterns (e.g., "YYYY",
"YYYY/MM", "YYYY/MM/DD", "YYYY/MM/DD/HH") and disallow path traversal characters
like "..", leading slashes, or empty segments; perform this check where node_id
is read (the node_id / target_id handling) and return a clear error before
invoking read_node/read_children so invalid IDs never reach filesystem code. Use
a strict whitelist regex for the allowed formats and trim namespace separately
(namespace.trim()) as currently done, and ensure the same validation is applied
to target_id before passing it to read_node and read_children.
- Around line 12-38: The ingest path drops optional metadata from the public
IngestRequest; update tree_summarizer_ingest to accept and validate the metadata
field from the request (match the optional metadata shape declared in schemas.rs
/ IngestRequest), then pass it through to storage instead of discarding
it—either extend store::buffer_write to take a metadata parameter (or call a new
store function that persists content+metadata) and include metadata in the
returned RpcOutcome JSON; ensure you also update any callers such as
handle_ingest to forward metadata and adjust store::buffer_write usage
accordingly so metadata is not lost silently.
In `@src/openhuman/tree_summarizer/schemas.rs`:
- Around line 124-147: The ControllerSchema entries for the "status" and
"rebuild" handlers currently declare an output field named "status" but the
handlers actually return RpcOutcome::single_log(...) serialized via
into_cli_compatible_json(), which produces an envelope { "result": ..., "logs":
[...] }; update both ControllerSchema blocks ("status" and "rebuild") to rename
the output FieldSchema name from "status" to "result" (keep ty: TypeSchema::Json
and the comment but adjust wording if desired) so the published contract matches
the actual CLI/RPC response envelope.
In `@src/openhuman/tree_summarizer/store.rs`:
- Around line 42-47: The current sanitize(namespace: &str) function
lossy-replaces reserved chars which can collapse distinct namespaces (e.g.,
"a/b" vs "a:b"); change it to either (A) perform reversible encoding (e.g.,
percent-encoding or url-safe Base64) so the on-disk folder maps 1:1 back to the
original namespace, or (B) reject namespaces containing invalid characters by
returning a Result and propagating validation to callers; update sanitize's
signature and all call sites accordingly (reference: sanitize function in
store.rs) so that callers handle the Result or use the reversible encoding
output instead of the previous underscore-mapped string.
- Around line 315-318: The cleanup loop currently ignores errors from
std::fs::remove_file causing buffer_drain to report success even when files
weren't deleted; update buffer_drain (or the function that iterates over
entries) to check the Result of remove_file and return/propagate an Err when any
removal fails instead of discarding it—i.e., replace the throwaway let _ =
std::fs::remove_file(path) with proper error handling (map_err or ?), so
buffer_drain fails on cleanup errors and prevents duplicate reads on the next
run.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 0518e168-93bb-44c9-91fa-0bb0a2f72b2e
📒 Files selected for processing (13)
src/core/all.rssrc/core/cli.rssrc/core/mod.rssrc/core/tree_summarizer_cli.rssrc/openhuman/event_bus/events.rssrc/openhuman/mod.rssrc/openhuman/tree_summarizer/bus.rssrc/openhuman/tree_summarizer/engine.rssrc/openhuman/tree_summarizer/mod.rssrc/openhuman/tree_summarizer/ops.rssrc/openhuman/tree_summarizer/schemas.rssrc/openhuman/tree_summarizer/store.rssrc/openhuman/tree_summarizer/types.rs
| fn parse_opts(args: &[String]) -> Result<(CliOpts, Vec<String>)> { | ||
| let mut verbose = false; | ||
| let mut content: Option<String> = None; | ||
| let mut file: Option<String> = None; | ||
| let mut node_id: Option<String> = None; | ||
| let mut rest = Vec::new(); | ||
| let mut i = 0; | ||
|
|
||
| while i < args.len() { | ||
| match args[i].as_str() { | ||
| "--content" | "-c" => { | ||
| let val = args | ||
| .get(i + 1) | ||
| .ok_or_else(|| anyhow::anyhow!("missing value for --content"))?; | ||
| content = Some(val.clone()); | ||
| i += 2; | ||
| } | ||
| "--file" | "-f" => { | ||
| let val = args | ||
| .get(i + 1) | ||
| .ok_or_else(|| anyhow::anyhow!("missing value for --file"))?; | ||
| file = Some(val.clone()); | ||
| i += 2; | ||
| } | ||
| "--node-id" | "--node" => { | ||
| let val = args | ||
| .get(i + 1) | ||
| .ok_or_else(|| anyhow::anyhow!("missing value for --node-id"))?; | ||
| node_id = Some(val.clone()); | ||
| i += 2; | ||
| } | ||
| "-v" | "--verbose" => { | ||
| verbose = true; | ||
| i += 1; | ||
| } | ||
| "-h" | "--help" => { | ||
| rest.push(args[i].clone()); | ||
| i += 1; | ||
| } | ||
| _ => { | ||
| rest.push(args[i].clone()); | ||
| i += 1; | ||
| } | ||
| } | ||
| } | ||
|
|
||
| Ok(( | ||
| CliOpts { | ||
| verbose, | ||
| content, | ||
| file, | ||
| node_id, | ||
| }, | ||
| rest, | ||
| )) |
There was a problem hiding this comment.
Unknown flags and extra args are silently ignored.
At Line 84, unrecognized options are treated as positional args; then subcommands (e.g., Lines 185, 270, 305) consume only rest[0] and ignore leftovers. This can mask user mistakes (--verbsoe, stray args) and execute the wrong command shape.
🔧 Suggested fix (strict option/arity validation)
fn parse_opts(args: &[String]) -> Result<(CliOpts, Vec<String>)> {
@@
- _ => {
- rest.push(args[i].clone());
- i += 1;
- }
+ other if other.starts_with('-') => {
+ return Err(anyhow::anyhow!("unknown option: {other}"));
+ }
+ _ => {
+ rest.push(args[i].clone());
+ i += 1;
+ }
}
}
@@
fn run_ingest(args: &[String]) -> Result<()> {
let (opts, rest) = parse_opts(args)?;
@@
- let namespace = &rest[0];
+ if rest.len() != 1 {
+ return Err(anyhow::anyhow!(
+ "unexpected positional args: {:?}",
+ &rest[1..]
+ ));
+ }
+ let namespace = &rest[0];
@@
fn run_summarize(args: &[String]) -> Result<()> {
let (opts, rest) = parse_opts(args)?;
@@
- let namespace = &rest[0];
+ if rest.len() != 1 {
+ return Err(anyhow::anyhow!(
+ "unexpected positional args: {:?}",
+ &rest[1..]
+ ));
+ }
+ let namespace = &rest[0];
@@
fn run_query(args: &[String]) -> Result<()> {
let (opts, rest) = parse_opts(args)?;
@@
+ if rest.len() > 2 {
+ return Err(anyhow::anyhow!(
+ "unexpected positional args: {:?}",
+ &rest[2..]
+ ));
+ }
let namespace = &rest[0];
@@
fn run_status(args: &[String]) -> Result<()> {
let (opts, rest) = parse_opts(args)?;
@@
- let namespace = &rest[0];
+ if rest.len() != 1 {
+ return Err(anyhow::anyhow!(
+ "unexpected positional args: {:?}",
+ &rest[1..]
+ ));
+ }
+ let namespace = &rest[0];
@@
fn run_rebuild(args: &[String]) -> Result<()> {
let (opts, rest) = parse_opts(args)?;
@@
- let namespace = &rest[0];
+ if rest.len() != 1 {
+ return Err(anyhow::anyhow!(
+ "unexpected positional args: {:?}",
+ &rest[1..]
+ ));
+ }
+ let namespace = &rest[0];Also applies to: 107-325
| //! Event bus integration for tree_summarizer. | ||
| //! | ||
| //! Stub module — no active subscribers yet. Events are defined in | ||
| //! `event_bus/events.rs` and published from `engine.rs`. |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Verify whether tree_summarizer has a concrete event subscriber.
rg -n --type rust -C3 'impl\s+EventHandler\s+for|struct\s+\w+Subscriber|fn\s+name\(&self\)\s*->\s*&str' src/openhuman/tree_summarizer || true
# Verify startup/global registration points.
rg -n --type rust -C3 'subscribe_global\(|\.subscribe\(' src || true
# Verify any explicit tree_summarizer subscriber registration.
rg -n --type rust -C3 'tree_summarizer.*Subscriber|Subscriber.*tree_summarizer|tree_summarizer.*subscribe' src || trueRepository: tinyhumansai/openhuman
Length of output: 8876
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Check if tree_summarizer has any event variants in DomainEvent enum
rg -n --type rust 'TreeSummarizer|tree_summarizer' src/openhuman/event_bus/events.rs || echo "No tree_summarizer events found"
# Show the DomainEvent enum definition to understand current structure
rg -A 50 'pub enum DomainEvent' src/openhuman/event_bus/events.rs | head -70Repository: tinyhumansai/openhuman
Length of output: 2322
Complete event-bus integration for tree_summarizer: Add subscriber implementation to bus.rs and register it in startup.
Tree_summarizer defines three events (TreeSummarizerHourCompleted, TreeSummarizerPropagated, TreeSummarizerRebuildCompleted) in events.rs, but the bus.rs stub has no handler. Events emitted from engine.rs will not trigger any domain logic until a subscriber is registered. Create a <Purpose>Subscriber struct (e.g., TreeSummarizerEventSubscriber) that implements EventHandler, then register it via event_bus::subscribe_global() during startup—typically in channels/runtime/startup.rs following the pattern used for CronDeliverySubscriber and other domain handlers.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/openhuman/tree_summarizer/bus.rs` around lines 1 - 4, bus.rs is only a
stub and lacks a subscriber, so events TreeSummarizerHourCompleted,
TreeSummarizerPropagated, and TreeSummarizerRebuildCompleted (defined in
event_bus/events.rs and emitted from engine.rs) are never handled; implement a
TreeSummarizerEventSubscriber struct in src/openhuman/tree_summarizer/bus.rs
that implements the EventHandler trait (matching the handler signature used by
other domain subscribers) and processes those three event types, then register
it at startup by calling
event_bus::subscribe_global(Box::new(TreeSummarizerEventSubscriber::new())) in
the startup registration code (follow the pattern used by CronDeliverySubscriber
in channels/runtime/startup.rs) so the subscriber is active on application boot.
| fn sanitize(namespace: &str) -> String { | ||
| namespace | ||
| .replace(['/', '\\', ':', '*', '?', '"', '<', '>', '|'], "_") | ||
| .trim() | ||
| .to_string() | ||
| } |
There was a problem hiding this comment.
Lossy namespace sanitization can merge unrelated trees.
Replacing reserved characters with _ makes distinct namespaces collide on disk ("a/b" and "a:b" both become a_b). The hourly scanner then only sees the sanitized directory name, so logs/events can drift from the original namespace as well. Reject invalid characters up front or use a reversible encoding.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/openhuman/tree_summarizer/store.rs` around lines 42 - 47, The current
sanitize(namespace: &str) function lossy-replaces reserved chars which can
collapse distinct namespaces (e.g., "a/b" vs "a:b"); change it to either (A)
perform reversible encoding (e.g., percent-encoding or url-safe Base64) so the
on-disk folder maps 1:1 back to the original namespace, or (B) reject namespaces
containing invalid characters by returning a Result and propagating validation
to callers; update sanitize's signature and all call sites accordingly
(reference: sanitize function in store.rs) so that callers handle the Result or
use the reversible encoding output instead of the previous underscore-mapped
string.
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (1)
src/core/tree_summarizer_cli.rs (1)
45-87:⚠️ Potential issue | 🟠 MajorReject unknown flags and unexpected extra positional args.
Line 84 currently treats unknown options as positional args, and later handlers (Line 124, Line 185, Line 270, Line 305) consume only the first positional value. Typos like
--verbsoeor stray args are silently accepted and can run the wrong command shape.🔧 Suggested fix (strict option + arity validation)
fn parse_opts(args: &[String]) -> Result<(CliOpts, Vec<String>)> { @@ - _ => { + other if other.starts_with('-') => { + return Err(anyhow::anyhow!("unknown option: {other}")); + } + _ => { rest.push(args[i].clone()); i += 1; } } } @@ fn run_ingest(args: &[String]) -> Result<()> { let (opts, rest) = parse_opts(args)?; @@ - let namespace = &rest[0]; + if rest.len() != 1 { + return Err(anyhow::anyhow!("unexpected positional args: {:?}", &rest[1..])); + } + let namespace = &rest[0]; @@ fn run_summarize(args: &[String]) -> Result<()> { @@ - let namespace = &rest[0]; + if rest.len() != 1 { + return Err(anyhow::anyhow!("unexpected positional args: {:?}", &rest[1..])); + } + let namespace = &rest[0]; @@ fn run_query(args: &[String]) -> Result<()> { @@ + if rest.len() > 2 { + return Err(anyhow::anyhow!("unexpected positional args: {:?}", &rest[2..])); + } let namespace = &rest[0]; @@ fn run_status(args: &[String]) -> Result<()> { @@ - let namespace = &rest[0]; + if rest.len() != 1 { + return Err(anyhow::anyhow!("unexpected positional args: {:?}", &rest[1..])); + } + let namespace = &rest[0]; @@ fn run_rebuild(args: &[String]) -> Result<()> { @@ - let namespace = &rest[0]; + if rest.len() != 1 { + return Err(anyhow::anyhow!("unexpected positional args: {:?}", &rest[1..])); + } + let namespace = &rest[0];Also applies to: 124-125, 185-186, 230-235, 270-271, 305-306
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/core/tree_summarizer_cli.rs` around lines 45 - 87, In parse_opts, stop treating unknown options as positional: change the default branch so any arg starting with '-' returns an Err with a clear message (e.g., "unknown option: ...") instead of pushing into rest, and keep allowing non-flag tokens to be collected into rest; after the loop validate rest.len() against the expected arity (reject unexpected extra positional args) and return an Err if there are too many; refer to parse_opts and the local vars rest, content, file, node_id, verbose when making these changes so flag/arity validation is enforced consistently.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/core/tree_summarizer_cli.rs`:
- Around line 214-215: The CLI help text claims the query returns a node and its
direct children, but the backend RPC in tree_summarizer::ops
(src/openhuman/tree_summarizer/ops.rs) currently returns only {"node":
node.summary}; fix by either updating the help text in tree_summarizer_cli (the
println lines) to accurately state that only the node summary is returned, or
extend the RPC response in tree_summarizer::ops (e.g., add a "children" field
with direct children summaries) and update any associated serialization/response
types so the CLI text remains correct.
- Around line 339-342: The code currently swallows config load/init errors by
calling unwrap_or_default on crate::openhuman::config::Config::load_or_init(),
which can hide I/O/parse/permission problems; change the handling to propagate
the error instead of falling back to defaults (e.g., remove unwrap_or_default
and use the ? operator or return a Result) so that the failure from
Config::load_or_init() surfaces, and then call config.apply_env_overrides() only
after a successful load.
---
Duplicate comments:
In `@src/core/tree_summarizer_cli.rs`:
- Around line 45-87: In parse_opts, stop treating unknown options as positional:
change the default branch so any arg starting with '-' returns an Err with a
clear message (e.g., "unknown option: ...") instead of pushing into rest, and
keep allowing non-flag tokens to be collected into rest; after the loop validate
rest.len() against the expected arity (reject unexpected extra positional args)
and return an Err if there are too many; refer to parse_opts and the local vars
rest, content, file, node_id, verbose when making these changes so flag/arity
validation is enforced consistently.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 376c2f80-ed5f-4d6f-8ea4-7a0e35a1dec0
📒 Files selected for processing (3)
src/core/cli.rssrc/core/mod.rssrc/core/tree_summarizer_cli.rs
✅ Files skipped from review due to trivial changes (2)
- src/core/mod.rs
- src/core/cli.rs
| println!("Read a summary tree node and its direct children."); | ||
| println!(); |
There was a problem hiding this comment.
Query help text over-promises response shape.
Line 214 says query reads “direct children,” but src/openhuman/tree_summarizer/ops.rs currently returns only {"node": node.summary}. Please align the help text or expand the RPC response.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/core/tree_summarizer_cli.rs` around lines 214 - 215, The CLI help text
claims the query returns a node and its direct children, but the backend RPC in
tree_summarizer::ops (src/openhuman/tree_summarizer/ops.rs) currently returns
only {"node": node.summary}; fix by either updating the help text in
tree_summarizer_cli (the println lines) to accurately state that only the node
summary is returned, or extend the RPC response in tree_summarizer::ops (e.g.,
add a "children" field with direct children summaries) and update any associated
serialization/response types so the CLI text remains correct.
| let mut config = crate::openhuman::config::Config::load_or_init() | ||
| .await | ||
| .unwrap_or_default(); | ||
| config.apply_env_overrides(); |
There was a problem hiding this comment.
Don’t silently fall back to default config on load/init failure.
Line 341 (unwrap_or_default) hides real config errors (I/O/parse/permission) and proceeds with defaults, which can cause unintended writes/behavior. Return the error instead.
🔧 Suggested fix
async fn load_config() -> Result<crate::openhuman::config::Config> {
- let mut config = crate::openhuman::config::Config::load_or_init()
- .await
- .unwrap_or_default();
+ let mut config = crate::openhuman::config::Config::load_or_init()
+ .await
+ .map_err(|e| anyhow::anyhow!("failed to load/init config: {e}"))?;
config.apply_env_overrides();
Ok(config)
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| let mut config = crate::openhuman::config::Config::load_or_init() | |
| .await | |
| .unwrap_or_default(); | |
| config.apply_env_overrides(); | |
| let mut config = crate::openhuman::config::Config::load_or_init() | |
| .await | |
| .map_err(|e| anyhow::anyhow!("failed to load/init config: {e}"))?; | |
| config.apply_env_overrides(); |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/core/tree_summarizer_cli.rs` around lines 339 - 342, The code currently
swallows config load/init errors by calling unwrap_or_default on
crate::openhuman::config::Config::load_or_init(), which can hide
I/O/parse/permission problems; change the handling to propagate the error
instead of falling back to defaults (e.g., remove unwrap_or_default and use the
? operator or return a Result) so that the failure from Config::load_or_init()
surfaces, and then call config.apply_env_overrides() only after a successful
load.
…servability logging - Added a new `TreeSummarizerEventSubscriber` to log events related to tree summarization, enhancing observability. - Updated the `start_channels` function to register the new subscriber. - Refactored the `run_summarization` function to group buffered entries by hour and publish events upon completion of summarization. - Improved documentation and added tests for the new subscriber functionality. This update aims to provide better insights into the summarization process and facilitate future cross-module workflows.
…ature - Simplified the buffer backup process by consolidating the rename operation with context handling for better error reporting. - Cleaned up the function signature of `derive_node_ids_from_hour_id` for improved readability. These changes enhance code clarity and maintainability within the tree summarization engine.
- Updated the `tree_summarizer_ingest` function to accept an optional metadata parameter, allowing users to include additional context during content ingestion. - Refactored related functions to validate and handle metadata, improving the overall robustness of the summarization process. - Adjusted the buffer write functionality to store metadata alongside content, enhancing the data structure for future retrieval and processing. These changes aim to enrich the summarization capabilities and provide more context for ingested content.
…D validation - Enhanced the formatting of error messages in the `validate_node_id` function for better readability and consistency. - Adjusted the string formatting to use multi-line syntax, improving clarity in error reporting. - Minor formatting changes in the `strip_buffer_frontmatter` function to enhance code readability. These changes aim to improve the maintainability and clarity of error handling within the tree summarization module.
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (2)
src/openhuman/tree_summarizer/engine.rs (2)
173-196:⚠️ Potential issue | 🔴 CriticalMove the buffer backup outside
tree/before callingdelete_tree().
buffer_path.with_file_name("buffer_backup")resolves to<tree>/buffer_backup, butstore::delete_tree()removes the whole<tree>directory. The backup is gone before the restore branch runs, so unsummarized buffered input is still lost during rebuild.💡 Proposed fix
// Preserve the buffer directory by moving it to a temp location let buffer_path = store::buffer_dir(config, namespace); - let buffer_backup = buffer_path.with_file_name("buffer_backup"); + let buffer_backup = base + .parent() + .unwrap_or(&base) + .join(".tree_buffer_backup"); let buffer_existed = buffer_path.exists();🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/openhuman/tree_summarizer/engine.rs` around lines 173 - 196, The buffer backup is being created inside the tree directory so store::delete_tree(namespace) removes it; change the backup location to be outside the tree before calling store::delete_tree() by computing buffer_backup as a path outside the tree (e.g. sibling or temp directory) instead of buffer_path.with_file_name("buffer_backup"), then perform the rename from store::buffer_dir(config, namespace) to that external buffer_backup, call store::delete_tree(config, namespace), and finally rename back from that external buffer_backup to store::buffer_dir(config, namespace) in the restore branch; update references to buffer_path, buffer_backup and the restore logic in engine.rs accordingly.
365-379:⚠️ Potential issue | 🟠 MajorThe per-node response cap is effectively disabled here.
Line 366 compares against
max_chars.max(MAX_SUMMARY_CHARS), which is always the global 80k ceiling for the current hour/day/month/year/root budgets. Non-root summaries can therefore exceed their own cap without truncation and keep inflating parent nodes.💡 Proposed fix
- let response = if response.len() > max_chars.max(MAX_SUMMARY_CHARS) { + let hard_limit = max_chars.min(MAX_SUMMARY_CHARS); + let response = if response.len() > hard_limit { tracing::warn!( "[tree_summarizer] LLM response for node {} (level={}) was {} chars, truncating to {} chars", node_id, level_name, response.len(), - max_chars + hard_limit ); // Truncate at a char boundary - let truncated = &response[..response.floor_char_boundary(max_chars)]; + let truncated = &response[..response.floor_char_boundary(hard_limit)]; truncated.to_string() } else { response };🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/openhuman/tree_summarizer/engine.rs` around lines 365 - 379, The per-node cap is effectively disabled because the code uses max_chars.max(MAX_SUMMARY_CHARS) (which yields the global MAX_SUMMARY_CHARS) when checking/truncating the LLM response for node_id/level_name; change the logic to compute an enforced limit as the minimum of max_chars and MAX_SUMMARY_CHARS (e.g., let limit = std::cmp::min(max_chars, MAX_SUMMARY_CHARS)) and then compare response.len() against that limit, use response.floor_char_boundary(limit) for truncation, and update the tracing::warn message to report the applied limit rather than max_chars.max(...).
🧹 Nitpick comments (1)
src/openhuman/channels/runtime/startup.rs (1)
429-432: Useevent_bus::subscribe_global()for startup registration.This works because
busis the global singleton here, but new startup subscribers are supposed to register through the global helper rather than the instance method. Keeping this one onsubscribe_global()makes the tree-summarizer wiring match the repo convention and keeps new startup code from depending on the local bus handle.Based on learnings: Register subscribers in startup (e.g.,
channels/runtime/startup.rs) via the singletonevent_bus::subscribe_global()function🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/openhuman/channels/runtime/startup.rs` around lines 429 - 432, Replace the instance subscription call using the local bus handle with the global helper: instead of calling bus.subscribe(Arc::new(crate::openhuman::tree_summarizer::bus::TreeSummarizerEventSubscriber::new())), register the subscriber via event_bus::subscribe_global(Arc::new(...)). Locate the use of TreeSummarizerEventSubscriber and swap the subscribe invocation to event_bus::subscribe_global so startup registration follows the repo convention and stops depending on the local bus variable.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/openhuman/tree_summarizer/engine.rs`:
- Around line 476-480: The code currently silently skips parse failures for hour
files causing malformed leaves to be lost; change the handling around
parse_node_markdown_pub so that instead of ignoring Err, you return an error
(propagate) including the node_id context (e.g., format!("failed to parse hour
leaf {}: {}", node_id, err)) — replace the if let Ok(...) block in the loop that
fills leaves with a match or ?-style propagation so parse_node_markdown_pub
errors abort the rebuild_tree, ensuring hour_leaves are not written when a file
is unreadable.
- Around line 41-45: buffer_drain() currently reads-and-deletes buffer files
causing permanent loss if subsequent writes fail; change the flow so
buffer_drain only reads non-destructively (or provides a read_without_delete
API) and delete each buffer file or batch only after its corresponding hour leaf
is durably written (i.e., after the hour-leaf write/flush/rename succeeds in the
tree-summarizer logic). In rebuild_tree(), stop creating the backup with
with_file_name() under the tree directory (which delete_tree() later removes);
instead move/rename the buffer directory to a sibling path outside the tree
(e.g., parent.join("buffer_backup")) before calling delete_tree(), so the backup
survives deletion; update references to buffer_drain, rebuild_tree, delete_tree,
and the with_file_name() backup creation to implement these two safeguards
(read-without-delete + post-write delete, and moving backup outside the tree).
---
Duplicate comments:
In `@src/openhuman/tree_summarizer/engine.rs`:
- Around line 173-196: The buffer backup is being created inside the tree
directory so store::delete_tree(namespace) removes it; change the backup
location to be outside the tree before calling store::delete_tree() by computing
buffer_backup as a path outside the tree (e.g. sibling or temp directory)
instead of buffer_path.with_file_name("buffer_backup"), then perform the rename
from store::buffer_dir(config, namespace) to that external buffer_backup, call
store::delete_tree(config, namespace), and finally rename back from that
external buffer_backup to store::buffer_dir(config, namespace) in the restore
branch; update references to buffer_path, buffer_backup and the restore logic in
engine.rs accordingly.
- Around line 365-379: The per-node cap is effectively disabled because the code
uses max_chars.max(MAX_SUMMARY_CHARS) (which yields the global
MAX_SUMMARY_CHARS) when checking/truncating the LLM response for
node_id/level_name; change the logic to compute an enforced limit as the minimum
of max_chars and MAX_SUMMARY_CHARS (e.g., let limit = std::cmp::min(max_chars,
MAX_SUMMARY_CHARS)) and then compare response.len() against that limit, use
response.floor_char_boundary(limit) for truncation, and update the tracing::warn
message to report the applied limit rather than max_chars.max(...).
---
Nitpick comments:
In `@src/openhuman/channels/runtime/startup.rs`:
- Around line 429-432: Replace the instance subscription call using the local
bus handle with the global helper: instead of calling
bus.subscribe(Arc::new(crate::openhuman::tree_summarizer::bus::TreeSummarizerEventSubscriber::new())),
register the subscriber via event_bus::subscribe_global(Arc::new(...)). Locate
the use of TreeSummarizerEventSubscriber and swap the subscribe invocation to
event_bus::subscribe_global so startup registration follows the repo convention
and stops depending on the local bus variable.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: b2d48290-f2b1-419a-a9c5-a7419a058d6f
📒 Files selected for processing (3)
src/openhuman/channels/runtime/startup.rssrc/openhuman/tree_summarizer/bus.rssrc/openhuman/tree_summarizer/engine.rs
…n process - Replaced the buffer draining mechanism with a non-destructive read approach, allowing for safer data handling during summarization. - Introduced a new `buffer_delete` function to explicitly manage the deletion of buffer entries after successful processing. - Updated the `run_summarization` function to reflect these changes, ensuring that buffer entries are only deleted after durable writes are confirmed. - Improved the backup process for the buffer directory during tree rebuilds, ensuring it is preserved outside the tree structure. These modifications aim to improve data integrity and clarity in the summarization workflow.
…dling - Updated the `parse_node_markdown` function to trim trailing whitespace from the body after splitting frontmatter, enhancing data cleanliness. - Modified test cases to use specific timestamps instead of the current time, ensuring consistent and predictable test results. - Adjusted assertions in tests to reflect the new timestamp-based ordering of entries. These changes aim to improve the robustness of markdown parsing and the reliability of test outcomes in the tree summarization module.
There was a problem hiding this comment.
Actionable comments posted: 3
♻️ Duplicate comments (3)
src/core/tree_summarizer_cli.rs (2)
338-343:⚠️ Potential issue | 🟠 MajorPropagate config load/init errors instead of defaulting.
Falling back to
Config::default()on anyload_or_init()error masks parse/I/O/permission problems and can send the CLI to a different workspace than the user configured.🩹 Proposed fix
async fn load_config() -> Result<crate::openhuman::config::Config> { let mut config = crate::openhuman::config::Config::load_or_init() .await - .unwrap_or_default(); + .map_err(|e| anyhow::anyhow!("failed to load/init config: {e}"))?; config.apply_env_overrides(); Ok(config) }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/core/tree_summarizer_cli.rs` around lines 338 - 343, The current load_config function swallows load/init errors by using unwrap_or_default on crate::openhuman::config::Config::load_or_init(), which can hide parse/I/O/permission problems; change that call to propagate errors (use ? on the awaited Result from Config::load_or_init()) so the function returns the actual error instead of falling back to Config::default(), keep calling config.apply_env_overrides() afterwards and return Ok(config) as before.
45-99:⚠️ Potential issue | 🟠 MajorFail fast on unknown flags and stray positional args.
parse_opts()still pushes every unrecognized token intorest, and each subcommand then reads only the first positional argument. Typos like--verbsoe, duplicate namespaces, or a conflicting positionalnode_idcan silently run with the wrong inputs instead of erroring.Also applies to: 124-125, 185-185, 230-234, 270-270, 305-305
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/core/tree_summarizer_cli.rs` around lines 45 - 99, parse_opts currently accepts any unknown token into rest which hides typos and stray args; change it to fail fast: inside parse_opts (function parse_opts and the CliOpts handling) update the default match arm so that if args[i] starts with '-' you return Err(anyhow::anyhow!("unknown flag: {}", args[i])); otherwise push positional args into rest but after parsing validate rest.len() (e.g. ensure at most one positional or whatever the CLI expects) and return Err(...) if there are unexpected extra positional arguments; apply the same change pattern to the other parse functions/locations referenced (lines around 124-125, 185, 230-234, 270, 305) so unknown flags and extra positional args error out instead of being silently accepted.src/openhuman/tree_summarizer/store.rs (1)
24-30:⚠️ Potential issue | 🟠 MajorNamespace-to-path mapping is still lossy.
sanitize()still collapses distinct namespaces onto the same on-disk folder (a/b,a:b,a.b→a_b), andvalidate_namespace()lets those inputs through. That can merge unrelated trees and make status/delete operate on the wrong namespace. Reject reserved characters up front or switch to a reversible encoding before joining the path.Also applies to: 43-65
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/openhuman/tree_summarizer/store.rs` around lines 24 - 30, The namespace-to-path mapping is lossy because sanitize() collapses distinct inputs (e.g. '/', ':', '.') into the same folder; update validate_namespace() to explicitly reject reserved/path characters (at minimum '/', '\\', ':', '.', and any OS path separator or null) so those inputs are refused rather than sanitized, and add the same validation checks where namespaces are accepted; keep tree_dir() (and the other namespace-to-path helpers in the same file around lines 43-65) unchanged in path construction but rely on the strengthened validate_namespace() to prevent ambiguous, colliding namespace values.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/openhuman/tree_summarizer/store.rs`:
- Around line 259-319: The current nested loops over std::fs::read_dir use
into_iter().flatten().flatten() which silently swallows IO and DirEntry errors;
change each loop (the one iterating over base, year_dir, month_dir, day_dir, and
hour entries) to explicitly iterate over the Result returned by
std::fs::read_dir (for entry_res in std::fs::read_dir(&some_dir) { let entry =
entry_res?; ... }) and similarly handle file_type() calls using ? where they
return io::Result so filesystem errors propagate instead of being ignored;
update the blocks that reference name, month_entry.file_name(),
day_entry.file_name(), hname, timestamp_from_hour_path, oldest, newest and depth
to use the unwrapped entry variables after the ? propagation so any
permission/corruption error bubbles up as an Err from this function rather than
producing a partial TreeStatus.
- Around line 370-376: buffer_write currently writes metadata into a YAML
frontmatter block but buffer_drain blindly strips the first frontmatter it
finds, causing store-owned metadata to be discarded and also removing legitimate
user frontmatter from ingested files; update buffer_drain (and the shared
frontmatter helper used around buffer_write/buffer_drain) to detect and parse a
store-specific marker (e.g., a unique key like "op_store_metadata" or similar)
in the frontmatter, return that parsed metadata separately instead of removing
arbitrary frontmatter, and only strip the frontmatter when it matches the
store-owned marker; keep standard Markdown frontmatter intact for files that
don’t include the store marker.
- Around line 421-430: The current loop over entries that calls
std::fs::remove_file(path) can abort partway and drop already-read buffers;
change this to a two-phase (staging) delete: first atomically move/rename each
file to a temporary "to-delete" location (use std::fs::rename into a temp dir or
path.with_extension(".deleting") inside the same filesystem) in the same loop
that currently iterates entries, then in a second loop remove_file() only from
the temp locations; ensure removal attempts continue for all staged files and
aggregate or return a combined error instead of early-returning on the first
failure. This touches the loop variable entries and the remove_file calls shown
in the diff — implement atomic rename for staging and robust final deletion over
the staged names.
---
Duplicate comments:
In `@src/core/tree_summarizer_cli.rs`:
- Around line 338-343: The current load_config function swallows load/init
errors by using unwrap_or_default on
crate::openhuman::config::Config::load_or_init(), which can hide
parse/I/O/permission problems; change that call to propagate errors (use ? on
the awaited Result from Config::load_or_init()) so the function returns the
actual error instead of falling back to Config::default(), keep calling
config.apply_env_overrides() afterwards and return Ok(config) as before.
- Around line 45-99: parse_opts currently accepts any unknown token into rest
which hides typos and stray args; change it to fail fast: inside parse_opts
(function parse_opts and the CliOpts handling) update the default match arm so
that if args[i] starts with '-' you return Err(anyhow::anyhow!("unknown flag:
{}", args[i])); otherwise push positional args into rest but after parsing
validate rest.len() (e.g. ensure at most one positional or whatever the CLI
expects) and return Err(...) if there are unexpected extra positional arguments;
apply the same change pattern to the other parse functions/locations referenced
(lines around 124-125, 185, 230-234, 270, 305) so unknown flags and extra
positional args error out instead of being silently accepted.
In `@src/openhuman/tree_summarizer/store.rs`:
- Around line 24-30: The namespace-to-path mapping is lossy because sanitize()
collapses distinct inputs (e.g. '/', ':', '.') into the same folder; update
validate_namespace() to explicitly reject reserved/path characters (at minimum
'/', '\\', ':', '.', and any OS path separator or null) so those inputs are
refused rather than sanitized, and add the same validation checks where
namespaces are accepted; keep tree_dir() (and the other namespace-to-path
helpers in the same file around lines 43-65) unchanged in path construction but
rely on the strengthened validate_namespace() to prevent ambiguous, colliding
namespace values.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 2f21ba8a-b832-4e58-8119-1d39ba535846
⛔ Files ignored due to path filters (1)
app/src-tauri/Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (4)
src/core/tree_summarizer_cli.rssrc/openhuman/tree_summarizer/ops.rssrc/openhuman/tree_summarizer/schemas.rssrc/openhuman/tree_summarizer/store.rs
🚧 Files skipped from review as they are similar to previous changes (2)
- src/openhuman/tree_summarizer/schemas.rs
- src/openhuman/tree_summarizer/ops.rs
| if base.exists() { | ||
| for entry in std::fs::read_dir(&base).into_iter().flatten().flatten() { | ||
| let name = entry.file_name().to_string_lossy().to_string(); | ||
| if entry.file_type().map(|t| t.is_dir()).unwrap_or(false) && name.len() == 4 { | ||
| if depth < 2 { | ||
| depth = 2; | ||
| } | ||
| // Scan months, days, hours inside | ||
| let year_dir = entry.path(); | ||
| for month_entry in std::fs::read_dir(&year_dir).into_iter().flatten().flatten() { | ||
| if month_entry.file_type().map(|t| t.is_dir()).unwrap_or(false) { | ||
| if depth < 3 { | ||
| depth = 3; | ||
| } | ||
| let month_dir = month_entry.path(); | ||
| for day_entry in std::fs::read_dir(&month_dir) | ||
| .into_iter() | ||
| .flatten() | ||
| .flatten() | ||
| { | ||
| if day_entry.file_type().map(|t| t.is_dir()).unwrap_or(false) { | ||
| if depth < 4 { | ||
| depth = 4; | ||
| } | ||
| // Check for hour .md files | ||
| let day_dir = day_entry.path(); | ||
| for hour_entry in | ||
| std::fs::read_dir(&day_dir).into_iter().flatten().flatten() | ||
| { | ||
| let hname = | ||
| hour_entry.file_name().to_string_lossy().to_string(); | ||
| if hname.ends_with(".md") && hname != "summary.md" { | ||
| if depth < 5 { | ||
| depth = 5; | ||
| } | ||
| // Try to parse timestamp from path | ||
| if let Some(ts) = timestamp_from_hour_path( | ||
| &name, | ||
| &month_entry.file_name().to_string_lossy().to_string(), | ||
| &day_entry.file_name().to_string_lossy().to_string(), | ||
| &hname, | ||
| ) { | ||
| match &oldest { | ||
| None => oldest = Some(ts), | ||
| Some(o) if ts < *o => oldest = Some(ts), | ||
| _ => {} | ||
| } | ||
| match &newest { | ||
| None => newest = Some(ts), | ||
| Some(n) if ts > *n => newest = Some(ts), | ||
| _ => {} | ||
| } | ||
| } | ||
| } | ||
| } | ||
| } | ||
| } | ||
| } | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
Status scanning shouldn't silently suppress filesystem errors.
The nested into_iter().flatten().flatten() loops discard read_dir and DirEntry errors and then return a partial TreeStatus as success. A permission problem or corrupt entry will look like a smaller tree instead of surfacing an actual failure. Use explicit iteration with ? here.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/openhuman/tree_summarizer/store.rs` around lines 259 - 319, The current
nested loops over std::fs::read_dir use into_iter().flatten().flatten() which
silently swallows IO and DirEntry errors; change each loop (the one iterating
over base, year_dir, month_dir, day_dir, and hour entries) to explicitly iterate
over the Result returned by std::fs::read_dir (for entry_res in
std::fs::read_dir(&some_dir) { let entry = entry_res?; ... }) and similarly
handle file_type() calls using ? where they return io::Result so filesystem
errors propagate instead of being ignored; update the blocks that reference
name, month_entry.file_name(), day_entry.file_name(), hname,
timestamp_from_hour_path, oldest, newest and depth to use the unwrapped entry
variables after the ? propagation so any permission/corruption error bubbles up
as an Err from this function rather than producing a partial TreeStatus.
| // If metadata is provided, write it as a YAML frontmatter block | ||
| let file_content = if let Some(meta) = metadata { | ||
| let meta_str = serde_json::to_string(meta).unwrap_or_default(); | ||
| format!("---\nmetadata: {meta_str}\n---\n\n{content}") | ||
| } else { | ||
| content.to_string() | ||
| }; |
There was a problem hiding this comment.
buffer_drain() treats any leading frontmatter as disposable metadata.
buffer_write() stores metadata only in the staged file's frontmatter, but buffer_drain() returns just the stripped body, so that metadata never reaches callers. The same helper also strips the first frontmatter block from ordinary Markdown content when the user ingests a file that already starts with ---. Parse a store-owned marker and return metadata separately instead of blanket-stripping the first block.
Also applies to: 416-418, 442-455
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/openhuman/tree_summarizer/store.rs` around lines 370 - 376, buffer_write
currently writes metadata into a YAML frontmatter block but buffer_drain blindly
strips the first frontmatter it finds, causing store-owned metadata to be
discarded and also removing legitimate user frontmatter from ingested files;
update buffer_drain (and the shared frontmatter helper used around
buffer_write/buffer_drain) to detect and parse a store-specific marker (e.g., a
unique key like "op_store_metadata" or similar) in the frontmatter, return that
parsed metadata separately instead of removing arbitrary frontmatter, and only
strip the frontmatter when it matches the store-owned marker; keep standard
Markdown frontmatter intact for files that don’t include the store marker.
| // Delete after successful reads — propagate errors to prevent duplicates | ||
| for (name, path) in &entries { | ||
| std::fs::remove_file(path).with_context(|| { | ||
| format!( | ||
| "failed to remove buffer entry '{}' at {}", | ||
| name, | ||
| path.display() | ||
| ) | ||
| })?; | ||
| } |
There was a problem hiding this comment.
Abort-on-first-delete can drop already-drained entries.
If one remove_file() call fails, earlier files in the loop have already been deleted but their contents are never returned because the function exits with Err. A single cleanup failure can therefore lose buffered data. This drain needs an all-or-nothing staging step instead of deleting in place.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/openhuman/tree_summarizer/store.rs` around lines 421 - 430, The current
loop over entries that calls std::fs::remove_file(path) can abort partway and
drop already-read buffers; change this to a two-phase (staging) delete: first
atomically move/rename each file to a temporary "to-delete" location (use
std::fs::rename into a temp dir or path.with_extension(".deleting") inside the
same filesystem) in the same loop that currently iterates entries, then in a
second loop remove_file() only from the temp locations; ensure removal attempts
continue for all staged files and aggregate or return a combined error instead
of early-returning on the first failure. This touches the loop variable entries
and the remove_file calls shown in the diff — implement atomic rename for
staging and robust final deletion over the staged names.
Summary
tree_summarizerdomain module that maintains a hierarchical tree of time-based summaries (Hour → Day → Month → Year → Root) stored as markdown files in memory namespacesopenhuman tree-summarizer) for ingesting content, running summarization jobs, querying the tree, and inspecting statusNew files
src/openhuman/tree_summarizer/types.rssrc/openhuman/tree_summarizer/store.rssrc/openhuman/tree_summarizer/engine.rssrc/openhuman/tree_summarizer/ops.rssrc/openhuman/tree_summarizer/schemas.rssrc/openhuman/tree_summarizer/bus.rssrc/openhuman/tree_summarizer/mod.rssrc/core/tree_summarizer_cli.rsCLI usage
Test plan
cargo checkpasses cleanlycargo fmtappliedcargo test -p openhuman -- tree_summarizer— store unit tests (roundtrip read/write, children, buffer ops)Summary by CodeRabbit
New Features
Observability
Documentation
Tests