feat(agent): fuzzy-filter skills_agent toolkit actions by task prompt#579
Conversation
Narrow large Composio toolkits (e.g. github ~500 actions) down to the handful relevant to a given delegation prompt before registering them as native tools on a spawned skills_agent. Falls back to the full catalogue when the filter yields fewer than MIN_CONFIDENT_HITS hits to avoid starving the sub-agent on under-specified prompts. Filter is only invoked when both `definition.id == "skills_agent"` and a `toolkit=` argument is present, so orchestrator and other sub-agents are unaffected. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Warning Rate limit exceeded
Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 1 minutes and 13 seconds. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughA new fuzzy-filtering system for Composio integration tools is introduced through a dedicated Changes
Sequence DiagramsequenceDiagram
participant SubagentRunner
participant ToolFilter
participant IntegrationTools
participant SubAgent
SubagentRunner->>ToolFilter: filter_actions_by_prompt(task_prompt, integration.tools, max_results)
ToolFilter->>ToolFilter: detect_verb_intent(prompt)
Note over ToolFilter: Extract CRUD-like verbs<br/>(Create, Send, Read, etc.)
ToolFilter->>ToolFilter: tokenize & normalize(prompt)
Note over ToolFilter: Expand abbreviations,<br/>remove stopwords
ToolFilter->>IntegrationTools: iterate & score each tool
Note over ToolFilter: Verb-gate match +<br/>token overlap scoring +<br/>bonus/penalty
ToolFilter->>ToolFilter: sort_by_score()
ToolFilter->>ToolFilter: truncate_to_max_results()
ToolFilter-->>SubagentRunner: Vec<usize> (top action indices)
alt confidence >= MIN_CONFIDENT_HITS
SubagentRunner->>SubAgent: register filtered actions only
Note over SubagentRunner: High-confidence subset
else
SubagentRunner->>SubAgent: register all integration.tools
Note over SubagentRunner: Fallback: full toolkit
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
src/openhuman/agent/harness/subagent_runner.rs (1)
293-307: Usedebug!for these filter diagnostics.These are per-spawn branch diagnostics, so
info!will be noisy in normal runs.debug!keeps the traceability without promoting routine filtering decisions to the default log level. As per coding guidelines, "In Rust, uselog/tracingatdebugortracelevel; prefer stable prefixes..."🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/openhuman/agent/harness/subagent_runner.rs` around lines 293 - 307, Replace the two tracing::info! diagnostics in the fuzzy filter branch of subagent_runner (the calls with agent_id = %definition.id, toolkit = %tk, total = integration.tools.len(), kept = filter_hits.len(), "[subagent_runner:typed] fuzzy tool filter narrowed toolkit" and the else branch with filter_hits = filter_hits.len(), "[subagent_runner:typed] fuzzy filter thin; falling back to full toolkit") with tracing::debug! so these per-spawn filter decisions are logged at debug level rather than info; keep the exact structured fields (agent_id, toolkit, total, kept/filter_hits) and message text unchanged except for using debug! to reduce noise.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/openhuman/agent/harness/subagent_runner.rs`:
- Around line 286-310: The prompt shows the full toolkit because
narrowed_integrations is rebuilt from parent.connected_integrations instead of
using the filtered "selected" tools; update the code so the same filtered list
created by filter_actions_by_prompt (and stored in selected/dynamic_tools) is
used to construct narrowed_integrations passed into
render_subagent_system_prompt: map the selected Vec<&ConnectedIntegrationTool>
back to the corresponding ConnectedIntegration entries (or build new
ConnectedIntegration objects containing only those tools) and replace the
current parent.connected_integrations-derived list with this narrowed list
before calling render_subagent_system_prompt, ensuring symbols involved are
filter_actions_by_prompt, MIN_CONFIDENT_HITS, selected, dynamic_tools,
narrowed_integrations, parent.connected_integrations, and
render_subagent_system_prompt.
In `@src/openhuman/agent/harness/tool_filter.rs`:
- Around line 56-71: The current hard verb gate in tool_filter.rs (the gated Vec
creation that filters actions by tool_verb(&a.name) and verbs) can remove valid
tools whose API names use generic verbs; instead stop filtering out actions here
and convert this hard gate into a score adjustment: keep all actions in gated
(or remove this filter entirely) and apply a positive boost when
tool_verb(&a.name) matches the detected verbs and a smaller penalty (or zero
boost) when it does not during the ranking/scoring step (where action scores are
computed downstream); update references to gated and any callers expecting
filtered length/top-K to use the full candidate set, and ensure the scoring
function uses tool_verb and verbs to nudge matches rather than drop them.
---
Nitpick comments:
In `@src/openhuman/agent/harness/subagent_runner.rs`:
- Around line 293-307: Replace the two tracing::info! diagnostics in the fuzzy
filter branch of subagent_runner (the calls with agent_id = %definition.id,
toolkit = %tk, total = integration.tools.len(), kept = filter_hits.len(),
"[subagent_runner:typed] fuzzy tool filter narrowed toolkit" and the else branch
with filter_hits = filter_hits.len(), "[subagent_runner:typed] fuzzy filter
thin; falling back to full toolkit") with tracing::debug! so these per-spawn
filter decisions are logged at debug level rather than info; keep the exact
structured fields (agent_id, toolkit, total, kept/filter_hits) and message text
unchanged except for using debug! to reduce noise.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 8663eb8c-ac53-49e4-a85c-4f18e1873c0f
⛔ Files ignored due to path filters (2)
Cargo.lockis excluded by!**/*.lockapp/src-tauri/Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (12)
src/openhuman/agent/harness/mod.rssrc/openhuman/agent/harness/subagent_runner.rssrc/openhuman/agent/harness/tool_filter.rstests/fixtures/composio_facebook.jsontests/fixtures/composio_github.jsontests/fixtures/composio_gmail.jsontests/fixtures/composio_googledrive.jsontests/fixtures/composio_googlesheets.jsontests/fixtures/composio_instagram.jsontests/fixtures/composio_notion.jsontests/fixtures/composio_reddit.jsontests/fixtures/composio_slack.json
Summary
agent::harness::tool_filtermodule that scores a Composio toolkit's actions against a delegation prompt and returns the top-K matches.subagent_runnercalls the filter when spawning askills_agentwith atoolkit=argument, so large catalogues (e.g. github ~500 actions) get narrowed to the ones actually relevant to the task before they are registered as native tools.MIN_CONFIDENT_HITS(3) hits — a too-narrow filter is worse than none.Problem
When the orchestrator delegates to
skills_agentfor a connected toolkit, every Composio action for that toolkit was being registered as a native tool on the spawned sub-agent. For toolkits like GitHub that's ~500 tool specs shoved into the LLM context per delegation — wasted tokens, slower first token, and more opportunity for the model to mis-route to a near-miss action it didn't actually need.The orchestrator's
SkillDelegationToolschema already forces the delegation prompt to be a clear, context-rich instruction, which makes it a reliable matching target — but nothing was using it to prune the toolset before load.Solution
Add a small lexical scoring filter (
tool_filter::filter_actions_by_prompt) and call it fromsubagent_runner::run_typed_modeexactly when:definition.id == "skills_agent", andtoolkit_filter.is_some(), andconnectedinparent.connected_integrations.Scoring combines:
name+description(with stopword removal and a small abbreviation table — e.g. `pr` → `pull request`)Top-K is capped at 25 (
TOOL_FILTER_TOP_K). If hits <MIN_CONFIDENT_HITS(3) the filter is skipped and every action is registered, with both branches logged at info level so the narrowing ratio is visible in production:No other agent path is touched — orchestrator, researcher, planner, code_executor, critic, archivist, and any non-toolkit
skills_agentspawn all keep their parent-filtered indices unchanged.Submission Checklist
tool_filter::tests, including 9 real-data integration tests (real_data_gmail_send_email,real_data_github_create_pr,real_data_github_list_prs,real_data_slack_send_message,real_data_notion_create_page,real_data_gmail_delete_emails,real_data_full_funnel_report, plus boundary tests for stopwords / abbreviation expansion / verb detection / plurals)tool_filter, doc comments onfilter_actions_by_prompt,MIN_CONFIDENT_HITS, and the verb tablesubagent_runner.rsdocuments both the rationale and the fallback contractImpact
skills_agentdelegations against large toolkits proportionally.Related
weighted_overlapfor an embedding-based scorer behind the samefilter_actions_by_promptsignatureSummary by CodeRabbit
New Features
Tests