Refactor: rust modules#633
Conversation
…nization - Moved list_capabilities, lookup_capability, and search_capabilities functions from about_app/mod.rs to a new ops.rs module. - Updated exports in mod.rs to reflect the new structure, improving code organization and maintainability.
…ions - Added a new `loader.rs` file to manage built-in agent definitions, allowing for easy addition of new agents through a structured format. - Implemented the `load_builtins` function to parse agent definitions from TOML files and inject system prompts. - Updated `mod.rs` to include the new loader module, streamlining the organization of agent-related code. - Ensured that the built-in agents are loaded in a stable order and validated against their defined IDs for consistency.
…oolkit integrations - Introduced a new `descriptions.rs` file containing human-readable capability summaries for various Composio toolkit slugs, enhancing the prompt renderer's ability to inform the orchestrator about connected integrations. - Added a `helpers.rs` file with a utility function `pick_str` to streamline JSON object parsing for user profile extraction across different providers. - Updated `mod.rs` to include the new modules, improving code organization and maintainability. - Created a `traits.rs` file defining the core provider trait for Composio toolkit implementations, ensuring a consistent interface for all providers. - Established a `types.rs` file to define shared types used across provider implementations, including sync reasons and user profile structures.
…te module - Moved the Gmail provider implementation from `mod.rs` to a new `provider.rs` file, enhancing code organization and maintainability. - Retained the incremental sync logic and user profile fetching functionality, ensuring no change in behavior. - Updated module imports accordingly to reflect the new structure.
…rate module - Moved the Notion provider implementation from `mod.rs` to a new `provider.rs` file, improving code organization and maintainability. - Retained all existing functionality related to incremental sync and user profile fetching. - Updated module imports to reflect the new structure, ensuring seamless integration with the rest of the codebase.
- Added a new `dispatch.rs` file containing the `dispatch_subagent` function, which handles the logic for delegating tasks to subagents. - The function includes error handling for uninitialized agent registries and unknown agent IDs, returning appropriate tool results. - Integrated global event publishing for subagent lifecycle events, enhancing observability. - Updated `mod.rs` to include the new dispatch module, improving code organization and maintainability. - Added unit tests to ensure the correctness of the dispatch logic and error handling.
- Added a new `factory.rs` file containing functions to create embedding providers based on specified configurations, supporting Ollama, OpenAI, custom endpoints, and a no-op provider. - Implemented error handling for unrecognized provider names to ensure immediate feedback on configuration issues. - Updated `mod.rs` to include the new factory module, enhancing code organization and maintainability. - Introduced a `provider_trait.rs` file defining the `EmbeddingProvider` trait, establishing a consistent interface for embedding implementations.
📝 WalkthroughWalkthroughThis pull request refactors Rust module structure across multiple subsystems, moving operational logic from Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 inconclusive)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (11)
src/openhuman/about_app/ops.rs (2)
8-36: Consider addingtracing::debugentry logs for dev-oriented diagnostics.The
RpcOutcomelog strings are user/audit-oriented and returned to callers. Per the repo's logging guideline, new/changed flows should also emittracingatdebug/tracelevel with stable prefixes like[about_app]and correlation fields (category, id, query) at entry and on branch decisions (e.g., the unknown-id error path inlookup_capability). The RPC handlers inschemas.rsalready emit[about_app]debug logs on entry, so this is optional/duplicative at the wrapper layer — up to you whether to add exit logs here or rely on the handler-level ones.As per coding guidelines: "Add substantial, development-oriented logs on new/changed flows; include logs at entry/exit points, branch decisions, external calls, retries/timeouts, state transitions, and error handling paths" and "Use
log/tracingatdebugortracelevel".🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/openhuman/about_app/ops.rs` around lines 8 - 36, Add development-oriented tracing.debug logs in the three ops functions: emit an entry log in list_capabilities (include "[about_app]" prefix and the category value), in search_capabilities emit an entry log (include "[about_app]" and the trimmed query) and an exit/summary log if you want, and in lookup_capability emit an entry log (include "[about_app]" and the id) plus a branch/debug log on the unknown-id error path before returning the Err so the failure is visible; use tracing::debug with stable prefixes and include the correlation fields (category, id, query) in the log fields for easy filtering.
20-26: Minor: redundant.trim()in error message.
super::lookupalready trimsidinternally (seecatalog.rslookup). Trimming again in the error message is harmless but slightly misleading — the user sees the trimmed form even if they passed whitespace. Not a bug; flagging only for consistency.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/openhuman/about_app/ops.rs` around lines 20 - 26, The error message in lookup_capability redundantly calls id.trim() even though super::lookup already trims the input; update the error construction in lookup_capability so it formats the original id (without calling .trim())—i.e., change the ok_or_else closure used with lookup(id) to use format!("unknown capability id '{}'", id) while keeping the RpcOutcome::single_log behavior and the format that logs capability.id.src/openhuman/tools/impl/agent/dispatch.rs (2)
112-116: Misplaced test:ask_clarification_tool_re_exporteddoesn't belong indispatch.rs.This test asserts a re-export of
AskClarificationTool, which is unrelated to subagent dispatch. It also only incidentally touches this module (viasuper::super::AskClarificationTool). Consider moving it next toAskClarificationToolitself (or into a dedicated integration test), sodispatch.rs's test module stays focused on dispatch behavior.♻️ Proposed cleanup
#[cfg(test)] mod tests { use super::*; - use crate::openhuman::tools::traits::Tool; - - use super::super::AskClarificationTool; - - #[test] - fn ask_clarification_tool_re_exported() { - let tool = AskClarificationTool::new(); - assert_eq!(tool.name(), "ask_user_clarification"); - } #[tokio::test] async fn dispatch_subagent_returns_tool_error_when_agent_unknown() {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/openhuman/tools/impl/agent/dispatch.rs` around lines 112 - 116, The test ask_clarification_tool_re_exported in dispatch.rs is misplaced because it verifies the re-export of AskClarificationTool rather than dispatch behavior; move this test next to the AskClarificationTool implementation (or into a dedicated integration test) so dispatch.rs only contains dispatch-related tests, and update any module path imports accordingly (replace super::super::AskClarificationTool with the local module path where AskClarificationTool is defined and ensure the test still constructs AskClarificationTool::new() and asserts name() == "ask_user_clarification").
118-141: Test relies on globalOnceCellstate and is order-dependent.
AgentDefinitionRegistry::global()is backed by a process-wideOnceCell, so which branch this test exercises ("registry not initialised" vs. "not found in registry") depends on whether some earlier test in the same binary calledinit_global*. The assertion accepts either message, so it won't flake on correctness, but it also means this test won't reliably cover both branches.If you want deterministic coverage of both paths, consider either:
- Splitting the two branches into separate
#[test]binaries (integration tests undertests/), or- Adding a test-only constructor/helper on
AgentDefinitionRegistrythat lets a unit test inject a registry without touchingGLOBAL, so the "agent not found" path can be asserted directly.Non-blocking — the current assertion is safe, just noting the coverage gap.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/openhuman/tools/impl/agent/dispatch.rs` around lines 118 - 141, Test dispatch_subagent_returns_tool_error_when_agent_unknown depends on global OnceCell state (AgentDefinitionRegistry::global()) so its branch coverage is order-dependent; to fix, make the branches deterministic either by splitting into two integration tests that run in separate test binaries (so one can assert the "registry not initialised" case and the other initialises GLOBAL then asserts the "not found in registry" case), or add a test-only API on AgentDefinitionRegistry (e.g., AgentDefinitionRegistry::with_definitions_for_test or a temporary registry setter) and change the unit test to inject a controlled registry when calling dispatch_subagent so you can assert the "agent not found" branch reliably without touching the global OnceCell; reference the existing test function dispatch_subagent_returns_tool_error_when_agent_unknown and the symbols AgentDefinitionRegistry::global() and dispatch_subagent when applying the change.src/openhuman/agent/agents/loader.rs (3)
243-249: Minor:find()re-parses all built-ins per call.Each test helper call invokes
load_builtins()and walks all 12 built-ins. With ~10 tests relying onfind, the TOML is parsed dozens of times per test run. Not a correctness issue and unit-test cost is negligible, but aonce_cell::sync::Lazy<Vec<AgentDefinition>>or a single shared fixture would cut redundant work if the suite grows.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/openhuman/agent/agents/loader.rs` around lines 243 - 249, The current find(id: &str) re-calls load_builtins() each time; replace repeated parsing with a static cached Vec by adding a once_cell::sync::Lazy static (e.g. BUILTINS: Lazy<Vec<AgentDefinition>>) initialized with load_builtins().unwrap(), update find to iterate BUILTINS.iter().find(|d| d.id == id).cloned().unwrap_or_else(|| panic!("missing built-in {id}")), and add the once_cell import; this ensures built-ins are parsed once and reused across calls to find and tests.
120-122: Consider adding a debug log on load.
load_builtinsis the single ingress point for baked-in agent definitions and currently runs silently. A one-linetracing::debug!emitting the count (and ideally the ids) would make startup traceable and help diagnose downstream registry issues without any behavior change.🪵 Proposed trace log
pub fn load_builtins() -> Result<Vec<AgentDefinition>> { - BUILTINS.iter().map(parse_builtin).collect() + let defs: Vec<AgentDefinition> = BUILTINS.iter().map(parse_builtin).collect::<Result<_>>()?; + tracing::debug!( + target: "[agent-loader]", + count = defs.len(), + ids = ?defs.iter().map(|d| d.id.as_str()).collect::<Vec<_>>(), + "loaded built-in agent definitions" + ); + Ok(defs) }As per coding guidelines: "Add substantial, development-oriented logs on new/changed flows" and "Use
log/tracingatdebugortracelevel for development-oriented diagnostics in Rust".🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/openhuman/agent/agents/loader.rs` around lines 120 - 122, load_builtins quietly returns parsed BUILTINS; add a development-oriented debug trace so startup is observable: inside the load_builtins function (which maps BUILTINS via parse_builtin to produce Vec<AgentDefinition>), after collecting results log a tracing::debug! that reports the number of loaded agents and ideally their ids (derive or map ids from the returned AgentDefinition items) before returning the Vec, ensuring no behavior change and using debug level for diagnostics.
154-159: Nit: hardcoded12couples two invariants in one assertion.
assert_eq!(defs.len(), BUILTINS.len())already enforces loader/slice consistency; the subsequentassert_eq!(defs.len(), 12, ...)exists purely as a tripwire for someone adding/removing a built-in without updating tests. That's fine, but consider asserting it againstBUILTINS.len()only once and letting the folder-ids test carry the intent, or keep the literal and add a brief// bump when adding a built-incomment so the next contributor understands why this magic number exists.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/openhuman/agent/agents/loader.rs` around lines 154 - 159, The test all_builtins_parse currently has a hardcoded assert_eq!(defs.len(), 12, ...) that duplicates the ASSERT against BUILTINS.len(); either remove the literal-number assertion and rely on assert_eq!(defs.len(), BUILTINS.len()) or keep the literal but make its purpose obvious by adding a brief comment like "// bump when adding/removing a built-in" next to the assert_eq!(defs.len(), 12, ...) so future contributors understand the magic number; update the all_builtins_parse test accordingly (referencing defs.len() and BUILTINS).src/openhuman/embeddings/factory.rs (1)
19-47: Add branch-decision tracing to the factory.Per the repo's logging guideline ("Add substantial, development-oriented logs on new/changed flows; include logs at entry/exit points, branch decisions..."),
create_embedding_provideris a key configuration entry point but currently emits no logs. A singletracing::debug!per arm (with a stable prefix like[embeddings.factory]and the resolvedprovider,model,dims) would make misconfiguration much easier to diagnose end-to-end. Take care not to logapi_key— only whether it wasSome/None.As per coding guidelines: "Add substantial, development-oriented logs on new/changed flows; include logs at entry/exit points, branch decisions, external calls..." and "Never log secrets, API keys, JWTs, credentials, or full PII...".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/openhuman/embeddings/factory.rs` around lines 19 - 47, Add tracing::debug logs to create_embedding_provider: at function entry and in each match arm emit a debug with prefix "[embeddings.factory]" that records the resolved provider string, model, dims, and whether api_key.is_some() (do not log the key itself). Place one debug before returning for the "ollama" arm (OllamaEmbedding::new), for "openai" and "custom:" arms (OpenAiEmbedding::new), for the "none" arm (NoopEmbedding), and for the unknown branch that returns an error so branch decisions are fully traceable.src/openhuman/composio/providers/notion/provider.rs (1)
244-244: Usingextract_item_idto pull a timestamp is semantically confusing.
extract_item_idis (per its name) a helper for extracting item identifiers, but here it's reused to pulllast_edited_time— a timestamp used as the sync cursor. Functionally fine if it just walks JSON paths and returns a string, but the intent at the call site is obscured. Consider using (or introducing) apick_str-style helper or renaming to convey that it's a generic "first-matching string at path" utility.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/openhuman/composio/providers/notion/provider.rs` at line 244, The call to extract_item_id(page, PAGE_EDITED_PATHS) is semantically misleading because extract_item_id implies an identifier helper but is being used to fetch a timestamp (last_edited_time) for the sync cursor; change this by either renaming extract_item_id to a generic helper like pick_str (or add a new helper pick_str/first_matching_str) and use that at this call site, or create a dedicated function (e.g., pick_timestamp or pick_str) and replace the use in provider.rs where PAGE_EDITED_PATHS is passed; ensure the new name is used consistently where non-ID strings are extracted (including the variable edited_time / last_edited_time) so intent is clear.src/openhuman/composio/providers/gmail/provider.rs (2)
246-256: Don't silently drop the state save error on the failure path.
let _ = state.save(&memory).await;discards any persistence failure right before returningErr. Given the comment explicitly flags that this save exists to preserve budget accounting, a failing save here means the budget counter is also lost — which is exactly the scenario the save was trying to avoid. Atracing::warn!with the error gives you a breadcrumb without changing the return behavior.♻️ Proposed fix
- // Save state so budget accounting isn't lost. - let _ = state.save(&memory).await; + // Save state so budget accounting isn't lost. + if let Err(save_err) = state.save(&memory).await { + tracing::warn!( + page = page_num, + error = %save_err, + "[composio:gmail] failed to persist sync state after tool error" + ); + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/openhuman/composio/providers/gmail/provider.rs` around lines 246 - 256, The state.save(&memory).await call in the error path currently discards any error (let _ = state.save(&memory).await;), which can hide failures to persist budget accounting; change this to capture and log the save error (e.g., store the Result from state.save(&memory).await and if Err call tracing::warn! with context) before returning the Err from the function so the return behavior stays the same; update the block around resp, error handling (the branch using ACTION_FETCH_EMAILS, page_num, resp.error) to log persistence failures while preserving the existing Err(...) return value.
78-80: Preferself.toolkit_slug()over repeated"gmail"string literals.The toolkit slug is duplicated as a literal in
SyncState::load,persist_single_item(twice), and bothSyncOutcome/ProviderUserProfilebuilders. If the trait method ever changes the slug, these call sites will silently drift. Reusingself.toolkit_slug()keeps a single source of truth and makes this impl more mechanical to copy for new providers.♻️ Example fix
- let mut state = SyncState::load(&memory, "gmail", &connection_id).await?; + let toolkit = self.toolkit_slug(); + let mut state = SyncState::load(&memory, toolkit, &connection_id).await?; @@ - match persist_single_item( - &memory, - "gmail", - &doc_id, - &title, - msg, - "gmail", - ctx.connection_id.as_deref(), - ) + match persist_single_item( + &memory, + toolkit, + &doc_id, + &title, + msg, + toolkit, + ctx.connection_id.as_deref(), + )Also applies to: 131-134, 173-173, 182-183, 307-313, 371-373
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/openhuman/composio/providers/gmail/provider.rs` around lines 78 - 80, The implementation hardcodes the "gmail" toolkit slug in multiple places; replace those literals with calls to the provider's toolkit_slug method to centralize the canonical slug. Update uses in SyncState::load, persist_single_item (both occurrences), the SyncOutcome builder, and the ProviderUserProfile builder to call the impl's toolkit_slug (use self.toolkit_slug() inside instance methods, and GmailProvider::toolkit_slug() or Self::toolkit_slug() where self isn't available) so all sites derive the slug from the single toolkit_slug() definition.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/openhuman/composio/providers/gmail/provider.rs`:
- Around line 42-45: Doc comment for MAX_PAGES_PER_SYNC is stale: it states "at
most 500 items per sync pass" but with SyncReason::ConnectionCreated the code
uses INITIAL_PAGE_SIZE = 50 so the real maximum is MAX_PAGES_PER_SYNC *
INITIAL_PAGE_SIZE (1000). Update the comment to reflect both cases or present a
range (e.g., "up to MAX_PAGES_PER_SYNC * PAGE_SIZE, or up to MAX_PAGES_PER_SYNC
* INITIAL_PAGE_SIZE for ConnectionCreated (500–1000 items)") and mention the
related constants MAX_PAGES_PER_SYNC, PAGE_SIZE and INITIAL_PAGE_SIZE and the
SyncReason::ConnectionCreated backfill behavior so readers see the correct
ceiling.
- Around line 277-285: The code compares mixed-format date strings via
extract_item_id and MESSAGE_DATE_PATHS, so lexicographic comparison (date_val >
*existing) can pick the wrong "newest" cursor; change to only use Gmail's
numeric internalDate (e.g. "internalDate" / "data.internalDate") and compare
them as parsed integers (u64) rather than raw strings. Replace the
extract_item_id usage here with a new helper (e.g. extract_internal_date_ms)
that returns Option<u64>, update the newest_date variable to Option<u64>,
perform numeric comparisons to pick the max, and ensure the value passed to
state.advance_cursor and consumed by sync::cursor_to_gmail_after_filter remains
the consistent numeric representation the cursor expects.
In `@src/openhuman/embeddings/factory.rs`:
- Around line 36-40: The custom: arm currently accepts "custom:" with an empty
base_url and passes it to OpenAiEmbedding::new, causing opaque failures later;
modify the match arm handling name.starts_with("custom:") to validate base_url
after strip_prefix and if base_url.is_empty() return an Err (or treat as unknown
provider) with a clear message instead of constructing OpenAiEmbedding;
reference the symbols name, base_url, api_key, OpenAiEmbedding::new and the test
factory_custom_empty_url when updating the factory function so misconfigured
"custom:" is rejected immediately.
---
Nitpick comments:
In `@src/openhuman/about_app/ops.rs`:
- Around line 8-36: Add development-oriented tracing.debug logs in the three ops
functions: emit an entry log in list_capabilities (include "[about_app]" prefix
and the category value), in search_capabilities emit an entry log (include
"[about_app]" and the trimmed query) and an exit/summary log if you want, and in
lookup_capability emit an entry log (include "[about_app]" and the id) plus a
branch/debug log on the unknown-id error path before returning the Err so the
failure is visible; use tracing::debug with stable prefixes and include the
correlation fields (category, id, query) in the log fields for easy filtering.
- Around line 20-26: The error message in lookup_capability redundantly calls
id.trim() even though super::lookup already trims the input; update the error
construction in lookup_capability so it formats the original id (without calling
.trim())—i.e., change the ok_or_else closure used with lookup(id) to use
format!("unknown capability id '{}'", id) while keeping the
RpcOutcome::single_log behavior and the format that logs capability.id.
In `@src/openhuman/agent/agents/loader.rs`:
- Around line 243-249: The current find(id: &str) re-calls load_builtins() each
time; replace repeated parsing with a static cached Vec by adding a
once_cell::sync::Lazy static (e.g. BUILTINS: Lazy<Vec<AgentDefinition>>)
initialized with load_builtins().unwrap(), update find to iterate
BUILTINS.iter().find(|d| d.id == id).cloned().unwrap_or_else(|| panic!("missing
built-in {id}")), and add the once_cell import; this ensures built-ins are
parsed once and reused across calls to find and tests.
- Around line 120-122: load_builtins quietly returns parsed BUILTINS; add a
development-oriented debug trace so startup is observable: inside the
load_builtins function (which maps BUILTINS via parse_builtin to produce
Vec<AgentDefinition>), after collecting results log a tracing::debug! that
reports the number of loaded agents and ideally their ids (derive or map ids
from the returned AgentDefinition items) before returning the Vec, ensuring no
behavior change and using debug level for diagnostics.
- Around line 154-159: The test all_builtins_parse currently has a hardcoded
assert_eq!(defs.len(), 12, ...) that duplicates the ASSERT against
BUILTINS.len(); either remove the literal-number assertion and rely on
assert_eq!(defs.len(), BUILTINS.len()) or keep the literal but make its purpose
obvious by adding a brief comment like "// bump when adding/removing a built-in"
next to the assert_eq!(defs.len(), 12, ...) so future contributors understand
the magic number; update the all_builtins_parse test accordingly (referencing
defs.len() and BUILTINS).
In `@src/openhuman/composio/providers/gmail/provider.rs`:
- Around line 246-256: The state.save(&memory).await call in the error path
currently discards any error (let _ = state.save(&memory).await;), which can
hide failures to persist budget accounting; change this to capture and log the
save error (e.g., store the Result from state.save(&memory).await and if Err
call tracing::warn! with context) before returning the Err from the function so
the return behavior stays the same; update the block around resp, error handling
(the branch using ACTION_FETCH_EMAILS, page_num, resp.error) to log persistence
failures while preserving the existing Err(...) return value.
- Around line 78-80: The implementation hardcodes the "gmail" toolkit slug in
multiple places; replace those literals with calls to the provider's
toolkit_slug method to centralize the canonical slug. Update uses in
SyncState::load, persist_single_item (both occurrences), the SyncOutcome
builder, and the ProviderUserProfile builder to call the impl's toolkit_slug
(use self.toolkit_slug() inside instance methods, and
GmailProvider::toolkit_slug() or Self::toolkit_slug() where self isn't
available) so all sites derive the slug from the single toolkit_slug()
definition.
In `@src/openhuman/composio/providers/notion/provider.rs`:
- Line 244: The call to extract_item_id(page, PAGE_EDITED_PATHS) is semantically
misleading because extract_item_id implies an identifier helper but is being
used to fetch a timestamp (last_edited_time) for the sync cursor; change this by
either renaming extract_item_id to a generic helper like pick_str (or add a new
helper pick_str/first_matching_str) and use that at this call site, or create a
dedicated function (e.g., pick_timestamp or pick_str) and replace the use in
provider.rs where PAGE_EDITED_PATHS is passed; ensure the new name is used
consistently where non-ID strings are extracted (including the variable
edited_time / last_edited_time) so intent is clear.
In `@src/openhuman/embeddings/factory.rs`:
- Around line 19-47: Add tracing::debug logs to create_embedding_provider: at
function entry and in each match arm emit a debug with prefix
"[embeddings.factory]" that records the resolved provider string, model, dims,
and whether api_key.is_some() (do not log the key itself). Place one debug
before returning for the "ollama" arm (OllamaEmbedding::new), for "openai" and
"custom:" arms (OpenAiEmbedding::new), for the "none" arm (NoopEmbedding), and
for the unknown branch that returns an error so branch decisions are fully
traceable.
In `@src/openhuman/tools/impl/agent/dispatch.rs`:
- Around line 112-116: The test ask_clarification_tool_re_exported in
dispatch.rs is misplaced because it verifies the re-export of
AskClarificationTool rather than dispatch behavior; move this test next to the
AskClarificationTool implementation (or into a dedicated integration test) so
dispatch.rs only contains dispatch-related tests, and update any module path
imports accordingly (replace super::super::AskClarificationTool with the local
module path where AskClarificationTool is defined and ensure the test still
constructs AskClarificationTool::new() and asserts name() ==
"ask_user_clarification").
- Around line 118-141: Test
dispatch_subagent_returns_tool_error_when_agent_unknown depends on global
OnceCell state (AgentDefinitionRegistry::global()) so its branch coverage is
order-dependent; to fix, make the branches deterministic either by splitting
into two integration tests that run in separate test binaries (so one can assert
the "registry not initialised" case and the other initialises GLOBAL then
asserts the "not found in registry" case), or add a test-only API on
AgentDefinitionRegistry (e.g.,
AgentDefinitionRegistry::with_definitions_for_test or a temporary registry
setter) and change the unit test to inject a controlled registry when calling
dispatch_subagent so you can assert the "agent not found" branch reliably
without touching the global OnceCell; reference the existing test function
dispatch_subagent_returns_tool_error_when_agent_unknown and the symbols
AgentDefinitionRegistry::global() and dispatch_subagent when applying the
change.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 24e3d5ed-ae4f-4f5f-8138-0a4b5f364a9a
📒 Files selected for processing (19)
src/openhuman/about_app/mod.rssrc/openhuman/about_app/ops.rssrc/openhuman/agent/agents/loader.rssrc/openhuman/agent/agents/mod.rssrc/openhuman/composio/providers/descriptions.rssrc/openhuman/composio/providers/gmail/mod.rssrc/openhuman/composio/providers/gmail/provider.rssrc/openhuman/composio/providers/helpers.rssrc/openhuman/composio/providers/mod.rssrc/openhuman/composio/providers/notion/mod.rssrc/openhuman/composio/providers/notion/provider.rssrc/openhuman/composio/providers/notion/sync.rssrc/openhuman/composio/providers/traits.rssrc/openhuman/composio/providers/types.rssrc/openhuman/embeddings/factory.rssrc/openhuman/embeddings/mod.rssrc/openhuman/embeddings/provider_trait.rssrc/openhuman/tools/impl/agent/dispatch.rssrc/openhuman/tools/impl/agent/mod.rs
| /// Maximum pages to fetch in a single sync pass (guards against infinite | ||
| /// pagination loops). Combined with PAGE_SIZE this yields at most | ||
| /// 500 items per sync pass, well within the daily budget. | ||
| const MAX_PAGES_PER_SYNC: u32 = 20; |
There was a problem hiding this comment.
Stale bound in doc comment.
The comment claims "at most 500 items per sync pass", but on SyncReason::ConnectionCreated the loop uses INITIAL_PAGE_SIZE = 50, so the ceiling is actually MAX_PAGES_PER_SYNC * INITIAL_PAGE_SIZE = 1000. Worth updating to reflect both cases (or phrasing it as a per-reason range) so future readers don't under-estimate backfill size.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/openhuman/composio/providers/gmail/provider.rs` around lines 42 - 45, Doc
comment for MAX_PAGES_PER_SYNC is stale: it states "at most 500 items per sync
pass" but with SyncReason::ConnectionCreated the code uses INITIAL_PAGE_SIZE =
50 so the real maximum is MAX_PAGES_PER_SYNC * INITIAL_PAGE_SIZE (1000). Update
the comment to reflect both cases or present a range (e.g., "up to
MAX_PAGES_PER_SYNC * PAGE_SIZE, or up to MAX_PAGES_PER_SYNC * INITIAL_PAGE_SIZE
for ConnectionCreated (500–1000 items)") and mention the related constants
MAX_PAGES_PER_SYNC, PAGE_SIZE and INITIAL_PAGE_SIZE and the
SyncReason::ConnectionCreated backfill behavior so readers see the correct
ceiling.
| // Track the newest date we've seen for cursor advancement. | ||
| if let Some(date_val) = extract_item_id(msg, MESSAGE_DATE_PATHS) { | ||
| if newest_date | ||
| .as_ref() | ||
| .map_or(true, |existing| date_val > *existing) | ||
| { | ||
| newest_date = Some(date_val); | ||
| } | ||
| } |
There was a problem hiding this comment.
Mixed-format date comparison may pick the wrong "newest" cursor.
MESSAGE_DATE_PATHS mixes internalDate (Gmail returns epoch-millis as a numeric string, ~13 digits) with date / receivedAt (typically RFC-2822 / ISO-8601 strings). The lexicographic date_val > *existing comparison only holds within a single format; across formats it can happily treat "Wed, 01 Jan 2025 …" as greater than "1736000000000", then feed that value back into state.advance_cursor(...) and into sync::cursor_to_gmail_after_filter on the next run.
Also, extract_item_id is semantically an ID extractor — reusing it for dates relies on the helper happening to return any string. Consider either (a) restricting to internalDate/data.internalDate only and parsing to u64 before comparing, or (b) introducing a extract_item_str/extract_internal_date_ms helper so the cursor is always the same representation cursor_to_gmail_after_filter expects.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/openhuman/composio/providers/gmail/provider.rs` around lines 277 - 285,
The code compares mixed-format date strings via extract_item_id and
MESSAGE_DATE_PATHS, so lexicographic comparison (date_val > *existing) can pick
the wrong "newest" cursor; change to only use Gmail's numeric internalDate (e.g.
"internalDate" / "data.internalDate") and compare them as parsed integers (u64)
rather than raw strings. Replace the extract_item_id usage here with a new
helper (e.g. extract_internal_date_ms) that returns Option<u64>, update the
newest_date variable to Option<u64>, perform numeric comparisons to pick the
max, and ensure the value passed to state.advance_cursor and consumed by
sync::cursor_to_gmail_after_filter remains the consistent numeric representation
the cursor expects.
| name if name.starts_with("custom:") => { | ||
| let base_url = name.strip_prefix("custom:").unwrap_or(""); | ||
| let key = api_key.unwrap_or(""); | ||
| Ok(Box::new(OpenAiEmbedding::new(base_url, key, model, dims))) | ||
| } |
There was a problem hiding this comment.
Consider rejecting empty custom: URL rather than silently accepting it.
"custom:" (no URL) resolves to base_url = "", which is then handed to OpenAiEmbedding::new("", …). Unlike the "ollama" arm — where OllamaEmbedding::new substitutes a sensible default for an empty base URL — OpenAiEmbedding will just store "" and fail opaquely at request time. The test factory_custom_empty_url asserts construction succeeds, which locks in this silent-misconfiguration behaviour.
Surfacing this as a configuration error here would be consistent with the "unknown provider" arm and the rationale stated in the doc comment ("configuration mistakes surface immediately rather than silently degrading").
♻️ Proposed fix
name if name.starts_with("custom:") => {
let base_url = name.strip_prefix("custom:").unwrap_or("");
+ if base_url.trim().is_empty() {
+ return Err(anyhow::anyhow!(
+ "custom embedding provider requires a URL, e.g. \"custom:http://host:port\""
+ ));
+ }
let key = api_key.unwrap_or("");
Ok(Box::new(OpenAiEmbedding::new(base_url, key, model, dims)))
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/openhuman/embeddings/factory.rs` around lines 36 - 40, The custom: arm
currently accepts "custom:" with an empty base_url and passes it to
OpenAiEmbedding::new, causing opaque failures later; modify the match arm
handling name.starts_with("custom:") to validate base_url after strip_prefix and
if base_url.is_empty() return an Err (or treat as unknown provider) with a clear
message instead of constructing OpenAiEmbedding; reference the symbols name,
base_url, api_key, OpenAiEmbedding::new and the test factory_custom_empty_url
when updating the factory function so misconfigured "custom:" is rejected
immediately.
refactor: enforce light
mod.rsrule across Rust modulesCloses #559
Summary
src/openhuman/**/mod.rsfiles that contained operationallogic.
ops.rs,loader.rs,provider.rs,dispatch.rs,factory.rs,traits.rs,types.rs,helpers.rs,descriptions.rs,provider_trait.rs).mod.rstomoddeclarations and re-exports only.preserved.
Modules refactored
about_applist_capabilities,lookup_capability,search_capabilitiesops.rsagent/agentsBuiltinAgent,BUILTINS,load_builtins,parse_builtin+ 13loader.rscomposio/providerstraits.rs,types.rs,descriptions.rs,helpers.rscomposio/providers/gmailGmailProviderimpl + sync logicprovider.rscomposio/providers/notionNotionProviderimpl + sync logicprovider.rsembeddingsEmbeddingProvidertrait, factory fns forprovider_trait.rs,factory.rstools/impl/agentdispatch_subagent+ event publishing + unit testsdispatch.rsStats
structural moves)
Test plan
cargo check— all modules resolve with updated importscargo test— all existing + newly added unit tests pass (agents loader: 13tests, subagent dispatch: new tests)
cargo fmt --check— formatting clean (final commit runscargo fmt)core_server, Tauri shell)compile without edits
Summary by CodeRabbit
Release Notes
New Features
Refactor