Skip to content

feat(agent): fuzzy-filter skills_agent toolkit actions by task prompt#579

Merged
graycyrus merged 2 commits intotinyhumansai:mainfrom
sanil-23:feat/optimization-skills-agent-tool-filter
Apr 15, 2026
Merged

feat(agent): fuzzy-filter skills_agent toolkit actions by task prompt#579
graycyrus merged 2 commits intotinyhumansai:mainfrom
sanil-23:feat/optimization-skills-agent-tool-filter

Conversation

@sanil-23
Copy link
Copy Markdown
Contributor

@sanil-23 sanil-23 commented Apr 15, 2026

Summary

  • New agent::harness::tool_filter module that scores a Composio toolkit's actions against a delegation prompt and returns the top-K matches.
  • subagent_runner calls the filter when spawning a skills_agent with a toolkit= argument, so large catalogues (e.g. github ~500 actions) get narrowed to the ones actually relevant to the task before they are registered as native tools.
  • Falls back to the full toolkit when the filter yields fewer than MIN_CONFIDENT_HITS (3) hits — a too-narrow filter is worse than none.
  • 15 unit tests, including 9 real-data integration tests that exercise the filter against captured Composio dumps for gmail / github / slack / notion / googlesheets / googledrive / instagram / reddit / facebook (~1.8MB of fixtures, 1000+ actions total).

Problem

When the orchestrator delegates to skills_agent for a connected toolkit, every Composio action for that toolkit was being registered as a native tool on the spawned sub-agent. For toolkits like GitHub that's ~500 tool specs shoved into the LLM context per delegation — wasted tokens, slower first token, and more opportunity for the model to mis-route to a near-miss action it didn't actually need.

The orchestrator's SkillDelegationTool schema already forces the delegation prompt to be a clear, context-rich instruction, which makes it a reliable matching target — but nothing was using it to prune the toolset before load.

Solution

Add a small lexical scoring filter (tool_filter::filter_actions_by_prompt) and call it from subagent_runner::run_typed_mode exactly when:

  1. definition.id == "skills_agent", and
  2. toolkit_filter.is_some(), and
  3. The toolkit is found and connected in parent.connected_integrations.

Scoring combines:

  • Tokenized weighted overlap between the prompt and each action's name + description (with stopword removal and a small abbreviation table — e.g. `pr` → `pull request`)
  • A verb bonus that rewards actions whose name starts with a verb prefix matching a verb the prompt expresses (`create`, `list`, `delete`, `update`, `get`, `search`, `send`)

Top-K is capped at 25 (TOOL_FILTER_TOP_K). If hits < MIN_CONFIDENT_HITS (3) the filter is skipped and every action is registered, with both branches logged at info level so the narrowing ratio is visible in production:

[subagent_runner:typed] fuzzy tool filter narrowed toolkit total=487 kept=18
[subagent_runner:typed] fuzzy filter thin; falling back to full toolkit

No other agent path is touched — orchestrator, researcher, planner, code_executor, critic, archivist, and any non-toolkit skills_agent spawn all keep their parent-filtered indices unchanged.

Submission Checklist

  • Unit tests — 15 tests in tool_filter::tests, including 9 real-data integration tests (real_data_gmail_send_email, real_data_github_create_pr, real_data_github_list_prs, real_data_slack_send_message, real_data_notion_create_page, real_data_gmail_delete_emails, real_data_full_funnel_report, plus boundary tests for stopwords / abbreviation expansion / verb detection / plurals)
  • E2E / integration — N/A — change is internal to sub-agent spawn; surfaced behavior is identical when the filter falls back, and the narrowed-set behavior is covered by the real-dataset unit tests against the live Composio dumps
  • Doc comments — Module-level rustdoc on tool_filter, doc comments on filter_actions_by_prompt, MIN_CONFIDENT_HITS, and the verb table
  • Inline comments — Filter call site in subagent_runner.rs documents both the rationale and the fallback contract

Impact

  • Runtime: Desktop only (sub-agent spawn happens inside the core sidecar's agent harness). No frontend, RPC, or schema changes.
  • Performance: Reduces per-spawn tool spec count for large toolkits (github ~500 → ≤25 in the common case). Filter itself is pure-Rust string scoring over an in-memory list — sub-millisecond on 1000 actions.
  • Token cost: Drops the system-prompt tool catalogue size for skills_agent delegations against large toolkits proportionally.
  • Compatibility: Behavior is identical to the prior code path whenever the filter falls back (under-specified prompts, small toolkits). No config flag required; activation is automatic via the existing gate.
  • Security / migration: None.

Related

  • Issue(s): none — exploratory optimization
  • Follow-up PR(s)/TODOs:
    • Consider extending the filter to surface a "rejected actions" debug dump alongside the existing info log
    • If lexical scoring proves insufficient on toolkits with terse action names, swap weighted_overlap for an embedding-based scorer behind the same filter_actions_by_prompt signature

Summary by CodeRabbit

  • New Features

    • Added intelligent tool filtering to the agent that fuzzy-matches tools to task prompts, improving relevance and selection accuracy. Falls back to all available tools when confidence is low.
  • Tests

    • Added integration test fixtures for Facebook, Instagram, and Reddit tool catalogs.

Narrow large Composio toolkits (e.g. github ~500 actions) down to the
handful relevant to a given delegation prompt before registering them
as native tools on a spawned skills_agent. Falls back to the full
catalogue when the filter yields fewer than MIN_CONFIDENT_HITS hits to
avoid starving the sub-agent on under-specified prompts.

Filter is only invoked when both `definition.id == "skills_agent"` and
a `toolkit=` argument is present, so orchestrator and other sub-agents
are unaffected.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 15, 2026

Warning

Rate limit exceeded

@sanil-23 has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 1 minutes and 13 seconds before requesting another review.

Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 1 minutes and 13 seconds.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 520fb5df-440c-44f3-bada-00f05a47412b

📥 Commits

Reviewing files that changed from the base of the PR and between 7c7450e and bbe1555.

📒 Files selected for processing (1)
  • src/openhuman/composio/ops.rs
📝 Walkthrough

Walkthrough

A new fuzzy-filtering system for Composio integration tools is introduced through a dedicated tool_filter module, which ranks and selects relevant actions based on prompt intent and semantic overlap. The subagent_runner now applies this filter to conditionally expose only high-confidence matches to sub-agents, falling back to all actions when confidence is low. Test fixtures are added for validation.

Changes

Cohort / File(s) Summary
Module Declaration & Core Filter
src/openhuman/agent/harness/mod.rs, src/openhuman/agent/harness/tool_filter.rs
New tool_filter module implementing fuzzy ranking pipeline with verb detection, token overlap scoring, and bonus/penalty logic to rank ConnectedIntegrationTool actions by relevance to a given prompt; exports filter_actions_by_prompt() function, Verb enum, and MIN_CONFIDENT_HITS constant.
Subagent Runner Integration
src/openhuman/agent/harness/subagent_runner.rs
Modified run_typed_mode to replace unconditional tool registration with filtered selection using tool_filter::filter_actions_by_prompt(); selects filtered subset when confidence is high or falls back to full toolkit when confidence is low.
Test Fixtures
tests/fixtures/composio_*.json (Facebook, Instagram, Reddit)
New JSON fixture files containing Composio tool schemas and metadata for testing tool discovery and filtering logic across multiple social media integration providers.

Sequence Diagram

sequenceDiagram
    participant SubagentRunner
    participant ToolFilter
    participant IntegrationTools
    participant SubAgent
    
    SubagentRunner->>ToolFilter: filter_actions_by_prompt(task_prompt, integration.tools, max_results)
    
    ToolFilter->>ToolFilter: detect_verb_intent(prompt)
    Note over ToolFilter: Extract CRUD-like verbs<br/>(Create, Send, Read, etc.)
    
    ToolFilter->>ToolFilter: tokenize & normalize(prompt)
    Note over ToolFilter: Expand abbreviations,<br/>remove stopwords
    
    ToolFilter->>IntegrationTools: iterate & score each tool
    Note over ToolFilter: Verb-gate match +<br/>token overlap scoring +<br/>bonus/penalty
    
    ToolFilter->>ToolFilter: sort_by_score()
    ToolFilter->>ToolFilter: truncate_to_max_results()
    
    ToolFilter-->>SubagentRunner: Vec<usize> (top action indices)
    
    alt confidence >= MIN_CONFIDENT_HITS
        SubagentRunner->>SubAgent: register filtered actions only
        Note over SubagentRunner: High-confidence subset
    else
        SubagentRunner->>SubAgent: register all integration.tools
        Note over SubagentRunner: Fallback: full toolkit
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested reviewers

  • senamakel

🐰✨ A Fuzzy Filter's Tale

A rabbit hops through toolkit trees,
Picking tools with expertise,
Verb by verb and word by word—
Now sub-agents hear what they've heard!
Smart selection lights the way,
Delegating work with flair today! 🌟

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main feature: adding fuzzy-filtering to skills_agent toolkit actions based on task prompt, which is the core change across all modified files.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
src/openhuman/agent/harness/subagent_runner.rs (1)

293-307: Use debug! for these filter diagnostics.

These are per-spawn branch diagnostics, so info! will be noisy in normal runs. debug! keeps the traceability without promoting routine filtering decisions to the default log level. As per coding guidelines, "In Rust, use log/tracing at debug or trace level; prefer stable prefixes..."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/openhuman/agent/harness/subagent_runner.rs` around lines 293 - 307,
Replace the two tracing::info! diagnostics in the fuzzy filter branch of
subagent_runner (the calls with agent_id = %definition.id, toolkit = %tk, total
= integration.tools.len(), kept = filter_hits.len(), "[subagent_runner:typed]
fuzzy tool filter narrowed toolkit" and the else branch with filter_hits =
filter_hits.len(), "[subagent_runner:typed] fuzzy filter thin; falling back to
full toolkit") with tracing::debug! so these per-spawn filter decisions are
logged at debug level rather than info; keep the exact structured fields
(agent_id, toolkit, total, kept/filter_hits) and message text unchanged except
for using debug! to reduce noise.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/openhuman/agent/harness/subagent_runner.rs`:
- Around line 286-310: The prompt shows the full toolkit because
narrowed_integrations is rebuilt from parent.connected_integrations instead of
using the filtered "selected" tools; update the code so the same filtered list
created by filter_actions_by_prompt (and stored in selected/dynamic_tools) is
used to construct narrowed_integrations passed into
render_subagent_system_prompt: map the selected Vec<&ConnectedIntegrationTool>
back to the corresponding ConnectedIntegration entries (or build new
ConnectedIntegration objects containing only those tools) and replace the
current parent.connected_integrations-derived list with this narrowed list
before calling render_subagent_system_prompt, ensuring symbols involved are
filter_actions_by_prompt, MIN_CONFIDENT_HITS, selected, dynamic_tools,
narrowed_integrations, parent.connected_integrations, and
render_subagent_system_prompt.

In `@src/openhuman/agent/harness/tool_filter.rs`:
- Around line 56-71: The current hard verb gate in tool_filter.rs (the gated Vec
creation that filters actions by tool_verb(&a.name) and verbs) can remove valid
tools whose API names use generic verbs; instead stop filtering out actions here
and convert this hard gate into a score adjustment: keep all actions in gated
(or remove this filter entirely) and apply a positive boost when
tool_verb(&a.name) matches the detected verbs and a smaller penalty (or zero
boost) when it does not during the ranking/scoring step (where action scores are
computed downstream); update references to gated and any callers expecting
filtered length/top-K to use the full candidate set, and ensure the scoring
function uses tool_verb and verbs to nudge matches rather than drop them.

---

Nitpick comments:
In `@src/openhuman/agent/harness/subagent_runner.rs`:
- Around line 293-307: Replace the two tracing::info! diagnostics in the fuzzy
filter branch of subagent_runner (the calls with agent_id = %definition.id,
toolkit = %tk, total = integration.tools.len(), kept = filter_hits.len(),
"[subagent_runner:typed] fuzzy tool filter narrowed toolkit" and the else branch
with filter_hits = filter_hits.len(), "[subagent_runner:typed] fuzzy filter
thin; falling back to full toolkit") with tracing::debug! so these per-spawn
filter decisions are logged at debug level rather than info; keep the exact
structured fields (agent_id, toolkit, total, kept/filter_hits) and message text
unchanged except for using debug! to reduce noise.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 8663eb8c-ac53-49e4-a85c-4f18e1873c0f

📥 Commits

Reviewing files that changed from the base of the PR and between 70a2a6f and 7c7450e.

⛔ Files ignored due to path filters (2)
  • Cargo.lock is excluded by !**/*.lock
  • app/src-tauri/Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (12)
  • src/openhuman/agent/harness/mod.rs
  • src/openhuman/agent/harness/subagent_runner.rs
  • src/openhuman/agent/harness/tool_filter.rs
  • tests/fixtures/composio_facebook.json
  • tests/fixtures/composio_github.json
  • tests/fixtures/composio_gmail.json
  • tests/fixtures/composio_googledrive.json
  • tests/fixtures/composio_googlesheets.json
  • tests/fixtures/composio_instagram.json
  • tests/fixtures/composio_notion.json
  • tests/fixtures/composio_reddit.json
  • tests/fixtures/composio_slack.json

Comment thread src/openhuman/agent/harness/subagent_runner.rs
Comment thread src/openhuman/agent/harness/tool_filter.rs
@sanil-23 sanil-23 marked this pull request as draft April 15, 2026 15:16
@sanil-23 sanil-23 marked this pull request as ready for review April 15, 2026 15:25
@graycyrus graycyrus merged commit d545c19 into tinyhumansai:main Apr 15, 2026
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants