Skip to content

Refactor: rust modules#633

Merged
senamakel merged 8 commits intotinyhumansai:mainfrom
YellowSnnowmann:refactor/rust-modules
Apr 17, 2026
Merged

Refactor: rust modules#633
senamakel merged 8 commits intotinyhumansai:mainfrom
YellowSnnowmann:refactor/rust-modules

Conversation

@YellowSnnowmann
Copy link
Copy Markdown
Contributor

@YellowSnnowmann YellowSnnowmann commented Apr 17, 2026

refactor: enforce light mod.rs rule across Rust modules

Closes #559

Summary

  • Audit and refactor src/openhuman/**/mod.rs files that contained operational
    logic.
  • Extract implementation code into focused sibling modules (ops.rs, loader.rs,
    provider.rs, dispatch.rs, factory.rs, traits.rs, types.rs, helpers.rs,
    descriptions.rs, provider_trait.rs).
  • Reduce every touched mod.rs to mod declarations and re-exports only.
  • Zero behavioral changes — public API surface, tests, and runtime behavior are
    preserved.

Modules refactored

Module What moved New file(s)
about_app list_capabilities, lookup_capability, search_capabilities
ops.rs
agent/agents BuiltinAgent, BUILTINS, load_builtins, parse_builtin + 13
unit tests loader.rs
composio/providers Provider trait, shared types, descriptions, JSON helpers
traits.rs, types.rs, descriptions.rs, helpers.rs
composio/providers/gmail GmailProvider impl + sync logic provider.rs
composio/providers/notion NotionProvider impl + sync logic provider.rs
embeddings EmbeddingProvider trait, factory fns for
Ollama/OpenAI/custom/no-op provider_trait.rs, factory.rs
tools/impl/agent dispatch_subagent + event publishing + unit tests
dispatch.rs

Stats

  • 19 files changed
  • 1,735 insertions / 1,688 deletions (net +47 lines, almost entirely
    structural moves)
  • 8 commits across the branch

Test plan

  • cargo check — all modules resolve with updated imports
  • cargo test — all existing + newly added unit tests pass (agents loader: 13
    tests, subagent dispatch: new tests)
  • cargo fmt --check — formatting clean (final commit runs cargo fmt)
  • Verify no public API changes: downstream consumers (core_server, Tauri shell)
    compile without edits

Summary by CodeRabbit

Release Notes

  • New Features

    • Added Gmail integration with automatic message syncing and user profile support.
    • Added Notion integration with automatic page syncing and user profile support.
    • Added toolkit descriptions for Composio providers, improving service discoverability.
  • Refactor

    • Reorganized internal architecture for capability catalog, agent loading, and embedding provider factory for better maintainability.

…nization

- Moved list_capabilities, lookup_capability, and search_capabilities functions from about_app/mod.rs to a new ops.rs module.
- Updated exports in mod.rs to reflect the new structure, improving code organization and maintainability.
…ions

- Added a new `loader.rs` file to manage built-in agent definitions, allowing for easy addition of new agents through a structured format.
- Implemented the `load_builtins` function to parse agent definitions from TOML files and inject system prompts.
- Updated `mod.rs` to include the new loader module, streamlining the organization of agent-related code.
- Ensured that the built-in agents are loaded in a stable order and validated against their defined IDs for consistency.
…oolkit integrations

- Introduced a new `descriptions.rs` file containing human-readable capability summaries for various Composio toolkit slugs, enhancing the prompt renderer's ability to inform the orchestrator about connected integrations.
- Added a `helpers.rs` file with a utility function `pick_str` to streamline JSON object parsing for user profile extraction across different providers.
- Updated `mod.rs` to include the new modules, improving code organization and maintainability.
- Created a `traits.rs` file defining the core provider trait for Composio toolkit implementations, ensuring a consistent interface for all providers.
- Established a `types.rs` file to define shared types used across provider implementations, including sync reasons and user profile structures.
…te module

- Moved the Gmail provider implementation from `mod.rs` to a new `provider.rs` file, enhancing code organization and maintainability.
- Retained the incremental sync logic and user profile fetching functionality, ensuring no change in behavior.
- Updated module imports accordingly to reflect the new structure.
…rate module

- Moved the Notion provider implementation from `mod.rs` to a new `provider.rs` file, improving code organization and maintainability.
- Retained all existing functionality related to incremental sync and user profile fetching.
- Updated module imports to reflect the new structure, ensuring seamless integration with the rest of the codebase.
- Added a new `dispatch.rs` file containing the `dispatch_subagent` function, which handles the logic for delegating tasks to subagents.
- The function includes error handling for uninitialized agent registries and unknown agent IDs, returning appropriate tool results.
- Integrated global event publishing for subagent lifecycle events, enhancing observability.
- Updated `mod.rs` to include the new dispatch module, improving code organization and maintainability.
- Added unit tests to ensure the correctness of the dispatch logic and error handling.
- Added a new `factory.rs` file containing functions to create embedding providers based on specified configurations, supporting Ollama, OpenAI, custom endpoints, and a no-op provider.
- Implemented error handling for unrecognized provider names to ensure immediate feedback on configuration issues.
- Updated `mod.rs` to include the new factory module, enhancing code organization and maintainability.
- Introduced a `provider_trait.rs` file defining the `EmbeddingProvider` trait, establishing a consistent interface for embedding implementations.
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 17, 2026

📝 Walkthrough

Walkthrough

This pull request refactors Rust module structure across multiple subsystems, moving operational logic from mod.rs files into dedicated sibling modules (ops.rs, loader.rs, provider.rs, traits.rs, types.rs, factory.rs, dispatch.rs, etc.) while preserving public API exports. The refactoring targets the about_app, agent, composio providers, embeddings, and tools subsystems.

Changes

Cohort / File(s) Summary
About App Refactoring
src/openhuman/about_app/mod.rs, src/openhuman/about_app/ops.rs
Moved list_capabilities, lookup_capability, search_capabilities implementations from mod.rs to new ops.rs module. Functions now re-exported from mod.rs for unchanged public API surface.
Agent Loader Refactoring
src/openhuman/agent/agents/mod.rs, src/openhuman/agent/agents/loader.rs
Extracted built-in agent metadata, parsing, and loading logic (12 agents, TOML/prompt embedding, validation) from mod.rs to new loader.rs module with comprehensive test suite. Public exports re-routed via pub use loader::*.
Composio Providers Modularization
src/openhuman/composio/providers/mod.rs, src/openhuman/composio/providers/descriptions.rs, src/openhuman/composio/providers/traits.rs, src/openhuman/composio/providers/types.rs, src/openhuman/composio/providers/helpers.rs
Split provider infrastructure: toolkit_descriptiondescriptions.rs, ComposioProvider trait → traits.rs, shared types (SyncReason, ProviderUserProfile, SyncOutcome, ProviderContext) → types.rs, helper utilities (pick_str) → helpers.rs. Module now exclusively re-exports from submodules.
Gmail & Notion Provider Extraction
src/openhuman/composio/providers/gmail/mod.rs, src/openhuman/composio/providers/gmail/provider.rs, src/openhuman/composio/providers/notion/mod.rs, src/openhuman/composio/providers/notion/provider.rs, src/openhuman/composio/providers/notion/sync.rs
Moved full Gmail and Notion provider implementations (with trait implementations, incremental sync, budget tracking, per-item persistence) from mod.rs to dedicated provider.rs files. Updated import paths for pick_str utility.
Embeddings Trait & Factory Extraction
src/openhuman/embeddings/mod.rs, src/openhuman/embeddings/provider_trait.rs, src/openhuman/embeddings/factory.rs
Moved EmbeddingProvider trait to provider_trait.rs and factory functions (create_embedding_provider, default_local_embedding_provider) to factory.rs. Module now re-exports both for unchanged public surface.
Agent Dispatch Extraction
src/openhuman/tools/impl/agent/mod.rs, src/openhuman/tools/impl/agent/dispatch.rs
Extracted dispatch_subagent async helper with subagent registry lookup, event emission, toolkit override logic, and error handling to new dispatch.rs. Removed inline tests; logic re-exported from module.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Suggested reviewers

  • senamakel

Poem

🐰 Modules organized, logic spread wide,
From mod.rs files they now hide,
In ops and traits and loaders neat,
Refactoring makes the structure sweet!
Import, export, keep it clean—
The clearest rabbit code you've seen! 🌽✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name Status Explanation Resolution
Title check ❓ Inconclusive The title 'Refactor: rust modules' is vague and generic—it uses a broad term ('rust modules') without clearly specifying what aspect of the refactoring is primary or most important. Consider a more descriptive title like 'Refactor: move module logic out of mod.rs files' or 'Refactor: enforce import/export-only mod.rs pattern' to convey the primary change more clearly.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Linked Issues check ✅ Passed The pull request successfully addresses all requirements from issue #559: operational logic has been moved from mod.rs files into sibling modules (ops.rs, loader.rs, provider.rs, dispatch.rs, factory.rs, traits.rs, types.rs, helpers.rs, descriptions.rs, provider_trait.rs), mod.rs files are now export-focused with only declarations and re-exports, public APIs are preserved, and tests remain unchanged across about_app, agent/agents, composio/providers, embeddings, and tools/impl/agent modules.
Out of Scope Changes check ✅ Passed All changes are scoped to the refactoring objective. The pull request systematically moves operational logic out of mod.rs files while preserving public API surface and behavior across multiple module hierarchies without introducing unrelated functionality or feature changes.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (11)
src/openhuman/about_app/ops.rs (2)

8-36: Consider adding tracing::debug entry logs for dev-oriented diagnostics.

The RpcOutcome log strings are user/audit-oriented and returned to callers. Per the repo's logging guideline, new/changed flows should also emit tracing at debug/trace level with stable prefixes like [about_app] and correlation fields (category, id, query) at entry and on branch decisions (e.g., the unknown-id error path in lookup_capability). The RPC handlers in schemas.rs already emit [about_app] debug logs on entry, so this is optional/duplicative at the wrapper layer — up to you whether to add exit logs here or rely on the handler-level ones.

As per coding guidelines: "Add substantial, development-oriented logs on new/changed flows; include logs at entry/exit points, branch decisions, external calls, retries/timeouts, state transitions, and error handling paths" and "Use log/tracing at debug or trace level".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/openhuman/about_app/ops.rs` around lines 8 - 36, Add development-oriented
tracing.debug logs in the three ops functions: emit an entry log in
list_capabilities (include "[about_app]" prefix and the category value), in
search_capabilities emit an entry log (include "[about_app]" and the trimmed
query) and an exit/summary log if you want, and in lookup_capability emit an
entry log (include "[about_app]" and the id) plus a branch/debug log on the
unknown-id error path before returning the Err so the failure is visible; use
tracing::debug with stable prefixes and include the correlation fields
(category, id, query) in the log fields for easy filtering.

20-26: Minor: redundant .trim() in error message.

super::lookup already trims id internally (see catalog.rs lookup). Trimming again in the error message is harmless but slightly misleading — the user sees the trimmed form even if they passed whitespace. Not a bug; flagging only for consistency.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/openhuman/about_app/ops.rs` around lines 20 - 26, The error message in
lookup_capability redundantly calls id.trim() even though super::lookup already
trims the input; update the error construction in lookup_capability so it
formats the original id (without calling .trim())—i.e., change the ok_or_else
closure used with lookup(id) to use format!("unknown capability id '{}'", id)
while keeping the RpcOutcome::single_log behavior and the format that logs
capability.id.
src/openhuman/tools/impl/agent/dispatch.rs (2)

112-116: Misplaced test: ask_clarification_tool_re_exported doesn't belong in dispatch.rs.

This test asserts a re-export of AskClarificationTool, which is unrelated to subagent dispatch. It also only incidentally touches this module (via super::super::AskClarificationTool). Consider moving it next to AskClarificationTool itself (or into a dedicated integration test), so dispatch.rs's test module stays focused on dispatch behavior.

♻️ Proposed cleanup
 #[cfg(test)]
 mod tests {
     use super::*;
-    use crate::openhuman::tools::traits::Tool;
-
-    use super::super::AskClarificationTool;
-
-    #[test]
-    fn ask_clarification_tool_re_exported() {
-        let tool = AskClarificationTool::new();
-        assert_eq!(tool.name(), "ask_user_clarification");
-    }

     #[tokio::test]
     async fn dispatch_subagent_returns_tool_error_when_agent_unknown() {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/openhuman/tools/impl/agent/dispatch.rs` around lines 112 - 116, The test
ask_clarification_tool_re_exported in dispatch.rs is misplaced because it
verifies the re-export of AskClarificationTool rather than dispatch behavior;
move this test next to the AskClarificationTool implementation (or into a
dedicated integration test) so dispatch.rs only contains dispatch-related tests,
and update any module path imports accordingly (replace
super::super::AskClarificationTool with the local module path where
AskClarificationTool is defined and ensure the test still constructs
AskClarificationTool::new() and asserts name() == "ask_user_clarification").

118-141: Test relies on global OnceCell state and is order-dependent.

AgentDefinitionRegistry::global() is backed by a process-wide OnceCell, so which branch this test exercises ("registry not initialised" vs. "not found in registry") depends on whether some earlier test in the same binary called init_global*. The assertion accepts either message, so it won't flake on correctness, but it also means this test won't reliably cover both branches.

If you want deterministic coverage of both paths, consider either:

  • Splitting the two branches into separate #[test] binaries (integration tests under tests/), or
  • Adding a test-only constructor/helper on AgentDefinitionRegistry that lets a unit test inject a registry without touching GLOBAL, so the "agent not found" path can be asserted directly.

Non-blocking — the current assertion is safe, just noting the coverage gap.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/openhuman/tools/impl/agent/dispatch.rs` around lines 118 - 141, Test
dispatch_subagent_returns_tool_error_when_agent_unknown depends on global
OnceCell state (AgentDefinitionRegistry::global()) so its branch coverage is
order-dependent; to fix, make the branches deterministic either by splitting
into two integration tests that run in separate test binaries (so one can assert
the "registry not initialised" case and the other initialises GLOBAL then
asserts the "not found in registry" case), or add a test-only API on
AgentDefinitionRegistry (e.g.,
AgentDefinitionRegistry::with_definitions_for_test or a temporary registry
setter) and change the unit test to inject a controlled registry when calling
dispatch_subagent so you can assert the "agent not found" branch reliably
without touching the global OnceCell; reference the existing test function
dispatch_subagent_returns_tool_error_when_agent_unknown and the symbols
AgentDefinitionRegistry::global() and dispatch_subagent when applying the
change.
src/openhuman/agent/agents/loader.rs (3)

243-249: Minor: find() re-parses all built-ins per call.

Each test helper call invokes load_builtins() and walks all 12 built-ins. With ~10 tests relying on find, the TOML is parsed dozens of times per test run. Not a correctness issue and unit-test cost is negligible, but a once_cell::sync::Lazy<Vec<AgentDefinition>> or a single shared fixture would cut redundant work if the suite grows.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/openhuman/agent/agents/loader.rs` around lines 243 - 249, The current
find(id: &str) re-calls load_builtins() each time; replace repeated parsing with
a static cached Vec by adding a once_cell::sync::Lazy static (e.g. BUILTINS:
Lazy<Vec<AgentDefinition>>) initialized with load_builtins().unwrap(), update
find to iterate BUILTINS.iter().find(|d| d.id == id).cloned().unwrap_or_else(||
panic!("missing built-in {id}")), and add the once_cell import; this ensures
built-ins are parsed once and reused across calls to find and tests.

120-122: Consider adding a debug log on load.

load_builtins is the single ingress point for baked-in agent definitions and currently runs silently. A one-line tracing::debug! emitting the count (and ideally the ids) would make startup traceable and help diagnose downstream registry issues without any behavior change.

🪵 Proposed trace log
 pub fn load_builtins() -> Result<Vec<AgentDefinition>> {
-    BUILTINS.iter().map(parse_builtin).collect()
+    let defs: Vec<AgentDefinition> = BUILTINS.iter().map(parse_builtin).collect::<Result<_>>()?;
+    tracing::debug!(
+        target: "[agent-loader]",
+        count = defs.len(),
+        ids = ?defs.iter().map(|d| d.id.as_str()).collect::<Vec<_>>(),
+        "loaded built-in agent definitions"
+    );
+    Ok(defs)
 }

As per coding guidelines: "Add substantial, development-oriented logs on new/changed flows" and "Use log/tracing at debug or trace level for development-oriented diagnostics in Rust".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/openhuman/agent/agents/loader.rs` around lines 120 - 122, load_builtins
quietly returns parsed BUILTINS; add a development-oriented debug trace so
startup is observable: inside the load_builtins function (which maps BUILTINS
via parse_builtin to produce Vec<AgentDefinition>), after collecting results log
a tracing::debug! that reports the number of loaded agents and ideally their ids
(derive or map ids from the returned AgentDefinition items) before returning the
Vec, ensuring no behavior change and using debug level for diagnostics.

154-159: Nit: hardcoded 12 couples two invariants in one assertion.

assert_eq!(defs.len(), BUILTINS.len()) already enforces loader/slice consistency; the subsequent assert_eq!(defs.len(), 12, ...) exists purely as a tripwire for someone adding/removing a built-in without updating tests. That's fine, but consider asserting it against BUILTINS.len() only once and letting the folder-ids test carry the intent, or keep the literal and add a brief // bump when adding a built-in comment so the next contributor understands why this magic number exists.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/openhuman/agent/agents/loader.rs` around lines 154 - 159, The test
all_builtins_parse currently has a hardcoded assert_eq!(defs.len(), 12, ...)
that duplicates the ASSERT against BUILTINS.len(); either remove the
literal-number assertion and rely on assert_eq!(defs.len(), BUILTINS.len()) or
keep the literal but make its purpose obvious by adding a brief comment like "//
bump when adding/removing a built-in" next to the assert_eq!(defs.len(), 12,
...) so future contributors understand the magic number; update the
all_builtins_parse test accordingly (referencing defs.len() and BUILTINS).
src/openhuman/embeddings/factory.rs (1)

19-47: Add branch-decision tracing to the factory.

Per the repo's logging guideline ("Add substantial, development-oriented logs on new/changed flows; include logs at entry/exit points, branch decisions..."), create_embedding_provider is a key configuration entry point but currently emits no logs. A single tracing::debug! per arm (with a stable prefix like [embeddings.factory] and the resolved provider, model, dims) would make misconfiguration much easier to diagnose end-to-end. Take care not to log api_key — only whether it was Some/None.

As per coding guidelines: "Add substantial, development-oriented logs on new/changed flows; include logs at entry/exit points, branch decisions, external calls..." and "Never log secrets, API keys, JWTs, credentials, or full PII...".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/openhuman/embeddings/factory.rs` around lines 19 - 47, Add tracing::debug
logs to create_embedding_provider: at function entry and in each match arm emit
a debug with prefix "[embeddings.factory]" that records the resolved provider
string, model, dims, and whether api_key.is_some() (do not log the key itself).
Place one debug before returning for the "ollama" arm (OllamaEmbedding::new),
for "openai" and "custom:" arms (OpenAiEmbedding::new), for the "none" arm
(NoopEmbedding), and for the unknown branch that returns an error so branch
decisions are fully traceable.
src/openhuman/composio/providers/notion/provider.rs (1)

244-244: Using extract_item_id to pull a timestamp is semantically confusing.

extract_item_id is (per its name) a helper for extracting item identifiers, but here it's reused to pull last_edited_time — a timestamp used as the sync cursor. Functionally fine if it just walks JSON paths and returns a string, but the intent at the call site is obscured. Consider using (or introducing) a pick_str-style helper or renaming to convey that it's a generic "first-matching string at path" utility.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/openhuman/composio/providers/notion/provider.rs` at line 244, The call to
extract_item_id(page, PAGE_EDITED_PATHS) is semantically misleading because
extract_item_id implies an identifier helper but is being used to fetch a
timestamp (last_edited_time) for the sync cursor; change this by either renaming
extract_item_id to a generic helper like pick_str (or add a new helper
pick_str/first_matching_str) and use that at this call site, or create a
dedicated function (e.g., pick_timestamp or pick_str) and replace the use in
provider.rs where PAGE_EDITED_PATHS is passed; ensure the new name is used
consistently where non-ID strings are extracted (including the variable
edited_time / last_edited_time) so intent is clear.
src/openhuman/composio/providers/gmail/provider.rs (2)

246-256: Don't silently drop the state save error on the failure path.

let _ = state.save(&memory).await; discards any persistence failure right before returning Err. Given the comment explicitly flags that this save exists to preserve budget accounting, a failing save here means the budget counter is also lost — which is exactly the scenario the save was trying to avoid. A tracing::warn! with the error gives you a breadcrumb without changing the return behavior.

♻️ Proposed fix
-                // Save state so budget accounting isn't lost.
-                let _ = state.save(&memory).await;
+                // Save state so budget accounting isn't lost.
+                if let Err(save_err) = state.save(&memory).await {
+                    tracing::warn!(
+                        page = page_num,
+                        error = %save_err,
+                        "[composio:gmail] failed to persist sync state after tool error"
+                    );
+                }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/openhuman/composio/providers/gmail/provider.rs` around lines 246 - 256,
The state.save(&memory).await call in the error path currently discards any
error (let _ = state.save(&memory).await;), which can hide failures to persist
budget accounting; change this to capture and log the save error (e.g., store
the Result from state.save(&memory).await and if Err call tracing::warn! with
context) before returning the Err from the function so the return behavior stays
the same; update the block around resp, error handling (the branch using
ACTION_FETCH_EMAILS, page_num, resp.error) to log persistence failures while
preserving the existing Err(...) return value.

78-80: Prefer self.toolkit_slug() over repeated "gmail" string literals.

The toolkit slug is duplicated as a literal in SyncState::load, persist_single_item (twice), and both SyncOutcome/ProviderUserProfile builders. If the trait method ever changes the slug, these call sites will silently drift. Reusing self.toolkit_slug() keeps a single source of truth and makes this impl more mechanical to copy for new providers.

♻️ Example fix
-        let mut state = SyncState::load(&memory, "gmail", &connection_id).await?;
+        let toolkit = self.toolkit_slug();
+        let mut state = SyncState::load(&memory, toolkit, &connection_id).await?;
@@
-                match persist_single_item(
-                    &memory,
-                    "gmail",
-                    &doc_id,
-                    &title,
-                    msg,
-                    "gmail",
-                    ctx.connection_id.as_deref(),
-                )
+                match persist_single_item(
+                    &memory,
+                    toolkit,
+                    &doc_id,
+                    &title,
+                    msg,
+                    toolkit,
+                    ctx.connection_id.as_deref(),
+                )

Also applies to: 131-134, 173-173, 182-183, 307-313, 371-373

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/openhuman/composio/providers/gmail/provider.rs` around lines 78 - 80, The
implementation hardcodes the "gmail" toolkit slug in multiple places; replace
those literals with calls to the provider's toolkit_slug method to centralize
the canonical slug. Update uses in SyncState::load, persist_single_item (both
occurrences), the SyncOutcome builder, and the ProviderUserProfile builder to
call the impl's toolkit_slug (use self.toolkit_slug() inside instance methods,
and GmailProvider::toolkit_slug() or Self::toolkit_slug() where self isn't
available) so all sites derive the slug from the single toolkit_slug()
definition.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/openhuman/composio/providers/gmail/provider.rs`:
- Around line 42-45: Doc comment for MAX_PAGES_PER_SYNC is stale: it states "at
most 500 items per sync pass" but with SyncReason::ConnectionCreated the code
uses INITIAL_PAGE_SIZE = 50 so the real maximum is MAX_PAGES_PER_SYNC *
INITIAL_PAGE_SIZE (1000). Update the comment to reflect both cases or present a
range (e.g., "up to MAX_PAGES_PER_SYNC * PAGE_SIZE, or up to MAX_PAGES_PER_SYNC
* INITIAL_PAGE_SIZE for ConnectionCreated (500–1000 items)") and mention the
related constants MAX_PAGES_PER_SYNC, PAGE_SIZE and INITIAL_PAGE_SIZE and the
SyncReason::ConnectionCreated backfill behavior so readers see the correct
ceiling.
- Around line 277-285: The code compares mixed-format date strings via
extract_item_id and MESSAGE_DATE_PATHS, so lexicographic comparison (date_val >
*existing) can pick the wrong "newest" cursor; change to only use Gmail's
numeric internalDate (e.g. "internalDate" / "data.internalDate") and compare
them as parsed integers (u64) rather than raw strings. Replace the
extract_item_id usage here with a new helper (e.g. extract_internal_date_ms)
that returns Option<u64>, update the newest_date variable to Option<u64>,
perform numeric comparisons to pick the max, and ensure the value passed to
state.advance_cursor and consumed by sync::cursor_to_gmail_after_filter remains
the consistent numeric representation the cursor expects.

In `@src/openhuman/embeddings/factory.rs`:
- Around line 36-40: The custom: arm currently accepts "custom:" with an empty
base_url and passes it to OpenAiEmbedding::new, causing opaque failures later;
modify the match arm handling name.starts_with("custom:") to validate base_url
after strip_prefix and if base_url.is_empty() return an Err (or treat as unknown
provider) with a clear message instead of constructing OpenAiEmbedding;
reference the symbols name, base_url, api_key, OpenAiEmbedding::new and the test
factory_custom_empty_url when updating the factory function so misconfigured
"custom:" is rejected immediately.

---

Nitpick comments:
In `@src/openhuman/about_app/ops.rs`:
- Around line 8-36: Add development-oriented tracing.debug logs in the three ops
functions: emit an entry log in list_capabilities (include "[about_app]" prefix
and the category value), in search_capabilities emit an entry log (include
"[about_app]" and the trimmed query) and an exit/summary log if you want, and in
lookup_capability emit an entry log (include "[about_app]" and the id) plus a
branch/debug log on the unknown-id error path before returning the Err so the
failure is visible; use tracing::debug with stable prefixes and include the
correlation fields (category, id, query) in the log fields for easy filtering.
- Around line 20-26: The error message in lookup_capability redundantly calls
id.trim() even though super::lookup already trims the input; update the error
construction in lookup_capability so it formats the original id (without calling
.trim())—i.e., change the ok_or_else closure used with lookup(id) to use
format!("unknown capability id '{}'", id) while keeping the
RpcOutcome::single_log behavior and the format that logs capability.id.

In `@src/openhuman/agent/agents/loader.rs`:
- Around line 243-249: The current find(id: &str) re-calls load_builtins() each
time; replace repeated parsing with a static cached Vec by adding a
once_cell::sync::Lazy static (e.g. BUILTINS: Lazy<Vec<AgentDefinition>>)
initialized with load_builtins().unwrap(), update find to iterate
BUILTINS.iter().find(|d| d.id == id).cloned().unwrap_or_else(|| panic!("missing
built-in {id}")), and add the once_cell import; this ensures built-ins are
parsed once and reused across calls to find and tests.
- Around line 120-122: load_builtins quietly returns parsed BUILTINS; add a
development-oriented debug trace so startup is observable: inside the
load_builtins function (which maps BUILTINS via parse_builtin to produce
Vec<AgentDefinition>), after collecting results log a tracing::debug! that
reports the number of loaded agents and ideally their ids (derive or map ids
from the returned AgentDefinition items) before returning the Vec, ensuring no
behavior change and using debug level for diagnostics.
- Around line 154-159: The test all_builtins_parse currently has a hardcoded
assert_eq!(defs.len(), 12, ...) that duplicates the ASSERT against
BUILTINS.len(); either remove the literal-number assertion and rely on
assert_eq!(defs.len(), BUILTINS.len()) or keep the literal but make its purpose
obvious by adding a brief comment like "// bump when adding/removing a built-in"
next to the assert_eq!(defs.len(), 12, ...) so future contributors understand
the magic number; update the all_builtins_parse test accordingly (referencing
defs.len() and BUILTINS).

In `@src/openhuman/composio/providers/gmail/provider.rs`:
- Around line 246-256: The state.save(&memory).await call in the error path
currently discards any error (let _ = state.save(&memory).await;), which can
hide failures to persist budget accounting; change this to capture and log the
save error (e.g., store the Result from state.save(&memory).await and if Err
call tracing::warn! with context) before returning the Err from the function so
the return behavior stays the same; update the block around resp, error handling
(the branch using ACTION_FETCH_EMAILS, page_num, resp.error) to log persistence
failures while preserving the existing Err(...) return value.
- Around line 78-80: The implementation hardcodes the "gmail" toolkit slug in
multiple places; replace those literals with calls to the provider's
toolkit_slug method to centralize the canonical slug. Update uses in
SyncState::load, persist_single_item (both occurrences), the SyncOutcome
builder, and the ProviderUserProfile builder to call the impl's toolkit_slug
(use self.toolkit_slug() inside instance methods, and
GmailProvider::toolkit_slug() or Self::toolkit_slug() where self isn't
available) so all sites derive the slug from the single toolkit_slug()
definition.

In `@src/openhuman/composio/providers/notion/provider.rs`:
- Line 244: The call to extract_item_id(page, PAGE_EDITED_PATHS) is semantically
misleading because extract_item_id implies an identifier helper but is being
used to fetch a timestamp (last_edited_time) for the sync cursor; change this by
either renaming extract_item_id to a generic helper like pick_str (or add a new
helper pick_str/first_matching_str) and use that at this call site, or create a
dedicated function (e.g., pick_timestamp or pick_str) and replace the use in
provider.rs where PAGE_EDITED_PATHS is passed; ensure the new name is used
consistently where non-ID strings are extracted (including the variable
edited_time / last_edited_time) so intent is clear.

In `@src/openhuman/embeddings/factory.rs`:
- Around line 19-47: Add tracing::debug logs to create_embedding_provider: at
function entry and in each match arm emit a debug with prefix
"[embeddings.factory]" that records the resolved provider string, model, dims,
and whether api_key.is_some() (do not log the key itself). Place one debug
before returning for the "ollama" arm (OllamaEmbedding::new), for "openai" and
"custom:" arms (OpenAiEmbedding::new), for the "none" arm (NoopEmbedding), and
for the unknown branch that returns an error so branch decisions are fully
traceable.

In `@src/openhuman/tools/impl/agent/dispatch.rs`:
- Around line 112-116: The test ask_clarification_tool_re_exported in
dispatch.rs is misplaced because it verifies the re-export of
AskClarificationTool rather than dispatch behavior; move this test next to the
AskClarificationTool implementation (or into a dedicated integration test) so
dispatch.rs only contains dispatch-related tests, and update any module path
imports accordingly (replace super::super::AskClarificationTool with the local
module path where AskClarificationTool is defined and ensure the test still
constructs AskClarificationTool::new() and asserts name() ==
"ask_user_clarification").
- Around line 118-141: Test
dispatch_subagent_returns_tool_error_when_agent_unknown depends on global
OnceCell state (AgentDefinitionRegistry::global()) so its branch coverage is
order-dependent; to fix, make the branches deterministic either by splitting
into two integration tests that run in separate test binaries (so one can assert
the "registry not initialised" case and the other initialises GLOBAL then
asserts the "not found in registry" case), or add a test-only API on
AgentDefinitionRegistry (e.g.,
AgentDefinitionRegistry::with_definitions_for_test or a temporary registry
setter) and change the unit test to inject a controlled registry when calling
dispatch_subagent so you can assert the "agent not found" branch reliably
without touching the global OnceCell; reference the existing test function
dispatch_subagent_returns_tool_error_when_agent_unknown and the symbols
AgentDefinitionRegistry::global() and dispatch_subagent when applying the
change.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 24e3d5ed-ae4f-4f5f-8138-0a4b5f364a9a

📥 Commits

Reviewing files that changed from the base of the PR and between 6733720 and 43e68ad.

📒 Files selected for processing (19)
  • src/openhuman/about_app/mod.rs
  • src/openhuman/about_app/ops.rs
  • src/openhuman/agent/agents/loader.rs
  • src/openhuman/agent/agents/mod.rs
  • src/openhuman/composio/providers/descriptions.rs
  • src/openhuman/composio/providers/gmail/mod.rs
  • src/openhuman/composio/providers/gmail/provider.rs
  • src/openhuman/composio/providers/helpers.rs
  • src/openhuman/composio/providers/mod.rs
  • src/openhuman/composio/providers/notion/mod.rs
  • src/openhuman/composio/providers/notion/provider.rs
  • src/openhuman/composio/providers/notion/sync.rs
  • src/openhuman/composio/providers/traits.rs
  • src/openhuman/composio/providers/types.rs
  • src/openhuman/embeddings/factory.rs
  • src/openhuman/embeddings/mod.rs
  • src/openhuman/embeddings/provider_trait.rs
  • src/openhuman/tools/impl/agent/dispatch.rs
  • src/openhuman/tools/impl/agent/mod.rs

Comment on lines +42 to +45
/// Maximum pages to fetch in a single sync pass (guards against infinite
/// pagination loops). Combined with PAGE_SIZE this yields at most
/// 500 items per sync pass, well within the daily budget.
const MAX_PAGES_PER_SYNC: u32 = 20;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Stale bound in doc comment.

The comment claims "at most 500 items per sync pass", but on SyncReason::ConnectionCreated the loop uses INITIAL_PAGE_SIZE = 50, so the ceiling is actually MAX_PAGES_PER_SYNC * INITIAL_PAGE_SIZE = 1000. Worth updating to reflect both cases (or phrasing it as a per-reason range) so future readers don't under-estimate backfill size.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/openhuman/composio/providers/gmail/provider.rs` around lines 42 - 45, Doc
comment for MAX_PAGES_PER_SYNC is stale: it states "at most 500 items per sync
pass" but with SyncReason::ConnectionCreated the code uses INITIAL_PAGE_SIZE =
50 so the real maximum is MAX_PAGES_PER_SYNC * INITIAL_PAGE_SIZE (1000). Update
the comment to reflect both cases or present a range (e.g., "up to
MAX_PAGES_PER_SYNC * PAGE_SIZE, or up to MAX_PAGES_PER_SYNC * INITIAL_PAGE_SIZE
for ConnectionCreated (500–1000 items)") and mention the related constants
MAX_PAGES_PER_SYNC, PAGE_SIZE and INITIAL_PAGE_SIZE and the
SyncReason::ConnectionCreated backfill behavior so readers see the correct
ceiling.

Comment on lines +277 to +285
// Track the newest date we've seen for cursor advancement.
if let Some(date_val) = extract_item_id(msg, MESSAGE_DATE_PATHS) {
if newest_date
.as_ref()
.map_or(true, |existing| date_val > *existing)
{
newest_date = Some(date_val);
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Mixed-format date comparison may pick the wrong "newest" cursor.

MESSAGE_DATE_PATHS mixes internalDate (Gmail returns epoch-millis as a numeric string, ~13 digits) with date / receivedAt (typically RFC-2822 / ISO-8601 strings). The lexicographic date_val > *existing comparison only holds within a single format; across formats it can happily treat "Wed, 01 Jan 2025 …" as greater than "1736000000000", then feed that value back into state.advance_cursor(...) and into sync::cursor_to_gmail_after_filter on the next run.

Also, extract_item_id is semantically an ID extractor — reusing it for dates relies on the helper happening to return any string. Consider either (a) restricting to internalDate/data.internalDate only and parsing to u64 before comparing, or (b) introducing a extract_item_str/extract_internal_date_ms helper so the cursor is always the same representation cursor_to_gmail_after_filter expects.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/openhuman/composio/providers/gmail/provider.rs` around lines 277 - 285,
The code compares mixed-format date strings via extract_item_id and
MESSAGE_DATE_PATHS, so lexicographic comparison (date_val > *existing) can pick
the wrong "newest" cursor; change to only use Gmail's numeric internalDate (e.g.
"internalDate" / "data.internalDate") and compare them as parsed integers (u64)
rather than raw strings. Replace the extract_item_id usage here with a new
helper (e.g. extract_internal_date_ms) that returns Option<u64>, update the
newest_date variable to Option<u64>, perform numeric comparisons to pick the
max, and ensure the value passed to state.advance_cursor and consumed by
sync::cursor_to_gmail_after_filter remains the consistent numeric representation
the cursor expects.

Comment on lines +36 to +40
name if name.starts_with("custom:") => {
let base_url = name.strip_prefix("custom:").unwrap_or("");
let key = api_key.unwrap_or("");
Ok(Box::new(OpenAiEmbedding::new(base_url, key, model, dims)))
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Consider rejecting empty custom: URL rather than silently accepting it.

"custom:" (no URL) resolves to base_url = "", which is then handed to OpenAiEmbedding::new("", …). Unlike the "ollama" arm — where OllamaEmbedding::new substitutes a sensible default for an empty base URL — OpenAiEmbedding will just store "" and fail opaquely at request time. The test factory_custom_empty_url asserts construction succeeds, which locks in this silent-misconfiguration behaviour.

Surfacing this as a configuration error here would be consistent with the "unknown provider" arm and the rationale stated in the doc comment ("configuration mistakes surface immediately rather than silently degrading").

♻️ Proposed fix
         name if name.starts_with("custom:") => {
             let base_url = name.strip_prefix("custom:").unwrap_or("");
+            if base_url.trim().is_empty() {
+                return Err(anyhow::anyhow!(
+                    "custom embedding provider requires a URL, e.g. \"custom:http://host:port\""
+                ));
+            }
             let key = api_key.unwrap_or("");
             Ok(Box::new(OpenAiEmbedding::new(base_url, key, model, dims)))
         }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/openhuman/embeddings/factory.rs` around lines 36 - 40, The custom: arm
currently accepts "custom:" with an empty base_url and passes it to
OpenAiEmbedding::new, causing opaque failures later; modify the match arm
handling name.starts_with("custom:") to validate base_url after strip_prefix and
if base_url.is_empty() return an Err (or treat as unknown provider) with a clear
message instead of constructing OpenAiEmbedding; reference the symbols name,
base_url, api_key, OpenAiEmbedding::new and the test factory_custom_empty_url
when updating the factory function so misconfigured "custom:" is rejected
immediately.

@YellowSnnowmann YellowSnnowmann changed the title Refactorrust modules Refactor: rust modules Apr 17, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Refactor Rust modules: keep mod.rs files import/export-only

2 participants