feat(agent): fuzzy-filter skills_agent toolkit actions by task prompt by sanil-23 · Pull Request #579 · tinyhumansai/openhuman

sanil-23 · 2026-04-15T14:26:24Z

Summary

New agent::harness::tool_filter module that scores a Composio toolkit's actions against a delegation prompt and returns the top-K matches.
subagent_runner calls the filter when spawning a skills_agent with a toolkit= argument, so large catalogues (e.g. github ~500 actions) get narrowed to the ones actually relevant to the task before they are registered as native tools.
Falls back to the full toolkit when the filter yields fewer than MIN_CONFIDENT_HITS (3) hits — a too-narrow filter is worse than none.
15 unit tests, including 9 real-data integration tests that exercise the filter against captured Composio dumps for gmail / github / slack / notion / googlesheets / googledrive / instagram / reddit / facebook (~1.8MB of fixtures, 1000+ actions total).

Problem

When the orchestrator delegates to skills_agent for a connected toolkit, every Composio action for that toolkit was being registered as a native tool on the spawned sub-agent. For toolkits like GitHub that's ~500 tool specs shoved into the LLM context per delegation — wasted tokens, slower first token, and more opportunity for the model to mis-route to a near-miss action it didn't actually need.

The orchestrator's SkillDelegationTool schema already forces the delegation prompt to be a clear, context-rich instruction, which makes it a reliable matching target — but nothing was using it to prune the toolset before load.

Solution

Add a small lexical scoring filter (tool_filter::filter_actions_by_prompt) and call it from subagent_runner::run_typed_mode exactly when:

definition.id == "skills_agent", and
toolkit_filter.is_some(), and
The toolkit is found and connected in parent.connected_integrations.

Scoring combines:

Tokenized weighted overlap between the prompt and each action's name + description (with stopword removal and a small abbreviation table — e.g. `pr` → `pull request`)
A verb bonus that rewards actions whose name starts with a verb prefix matching a verb the prompt expresses (`create`, `list`, `delete`, `update`, `get`, `search`, `send`)

Top-K is capped at 25 (TOOL_FILTER_TOP_K). If hits < MIN_CONFIDENT_HITS (3) the filter is skipped and every action is registered, with both branches logged at info level so the narrowing ratio is visible in production:

[subagent_runner:typed] fuzzy tool filter narrowed toolkit total=487 kept=18
[subagent_runner:typed] fuzzy filter thin; falling back to full toolkit

No other agent path is touched — orchestrator, researcher, planner, code_executor, critic, archivist, and any non-toolkit skills_agent spawn all keep their parent-filtered indices unchanged.

Submission Checklist

Unit tests — 15 tests in tool_filter::tests, including 9 real-data integration tests (real_data_gmail_send_email, real_data_github_create_pr, real_data_github_list_prs, real_data_slack_send_message, real_data_notion_create_page, real_data_gmail_delete_emails, real_data_full_funnel_report, plus boundary tests for stopwords / abbreviation expansion / verb detection / plurals)
E2E / integration — N/A — change is internal to sub-agent spawn; surfaced behavior is identical when the filter falls back, and the narrowed-set behavior is covered by the real-dataset unit tests against the live Composio dumps
Doc comments — Module-level rustdoc on tool_filter, doc comments on filter_actions_by_prompt, MIN_CONFIDENT_HITS, and the verb table
Inline comments — Filter call site in subagent_runner.rs documents both the rationale and the fallback contract

Impact

Runtime: Desktop only (sub-agent spawn happens inside the core sidecar's agent harness). No frontend, RPC, or schema changes.
Performance: Reduces per-spawn tool spec count for large toolkits (github ~500 → ≤25 in the common case). Filter itself is pure-Rust string scoring over an in-memory list — sub-millisecond on 1000 actions.
Token cost: Drops the system-prompt tool catalogue size for skills_agent delegations against large toolkits proportionally.
Compatibility: Behavior is identical to the prior code path whenever the filter falls back (under-specified prompts, small toolkits). No config flag required; activation is automatic via the existing gate.
Security / migration: None.

Issue(s): none — exploratory optimization
Follow-up PR(s)/TODOs:
- Consider extending the filter to surface a "rejected actions" debug dump alongside the existing info log
- If lexical scoring proves insufficient on toolkits with terse action names, swap weighted_overlap for an embedding-based scorer behind the same filter_actions_by_prompt signature

Summary by CodeRabbit

New Features
- Added intelligent tool filtering to the agent that fuzzy-matches tools to task prompts, improving relevance and selection accuracy. Falls back to all available tools when confidence is low.
Tests
- Added integration test fixtures for Facebook, Instagram, and Reddit tool catalogs.

Narrow large Composio toolkits (e.g. github ~500 actions) down to the handful relevant to a given delegation prompt before registering them as native tools on a spawned skills_agent. Falls back to the full catalogue when the filter yields fewer than MIN_CONFIDENT_HITS hits to avoid starving the sub-agent on under-specified prompts. Filter is only invoked when both `definition.id == "skills_agent"` and a `toolkit=` argument is present, so orchestrator and other sub-agents are unaffected. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

coderabbitai · 2026-04-15T14:26:44Z

Warning

Rate limit exceeded

@sanil-23 has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 1 minutes and 13 seconds before requesting another review.

Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 1 minutes and 13 seconds.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 520fb5df-440c-44f3-bada-00f05a47412b

📥 Commits

Reviewing files that changed from the base of the PR and between 7c7450e and bbe1555.

📒 Files selected for processing (1)

src/openhuman/composio/ops.rs

📝 Walkthrough

Walkthrough

A new fuzzy-filtering system for Composio integration tools is introduced through a dedicated tool_filter module, which ranks and selects relevant actions based on prompt intent and semantic overlap. The subagent_runner now applies this filter to conditionally expose only high-confidence matches to sub-agents, falling back to all actions when confidence is low. Test fixtures are added for validation.

Changes

Cohort / File(s)	Summary
Module Declaration & Core Filter `src/openhuman/agent/harness/mod.rs`, `src/openhuman/agent/harness/tool_filter.rs`	New `tool_filter` module implementing fuzzy ranking pipeline with verb detection, token overlap scoring, and bonus/penalty logic to rank `ConnectedIntegrationTool` actions by relevance to a given prompt; exports `filter_actions_by_prompt()` function, `Verb` enum, and `MIN_CONFIDENT_HITS` constant.
Subagent Runner Integration `src/openhuman/agent/harness/subagent_runner.rs`	Modified `run_typed_mode` to replace unconditional tool registration with filtered selection using `tool_filter::filter_actions_by_prompt()`; selects filtered subset when confidence is high or falls back to full toolkit when confidence is low.
Test Fixtures `tests/fixtures/composio_*.json` (Facebook, Instagram, Reddit)	New JSON fixture files containing Composio tool schemas and metadata for testing tool discovery and filtering logic across multiple social media integration providers.

Sequence Diagram

sequenceDiagram
    participant SubagentRunner
    participant ToolFilter
    participant IntegrationTools
    participant SubAgent
    
    SubagentRunner->>ToolFilter: filter_actions_by_prompt(task_prompt, integration.tools, max_results)
    
    ToolFilter->>ToolFilter: detect_verb_intent(prompt)
    Note over ToolFilter: Extract CRUD-like verbs<br/>(Create, Send, Read, etc.)
    
    ToolFilter->>ToolFilter: tokenize & normalize(prompt)
    Note over ToolFilter: Expand abbreviations,<br/>remove stopwords
    
    ToolFilter->>IntegrationTools: iterate & score each tool
    Note over ToolFilter: Verb-gate match +<br/>token overlap scoring +<br/>bonus/penalty
    
    ToolFilter->>ToolFilter: sort_by_score()
    ToolFilter->>ToolFilter: truncate_to_max_results()
    
    ToolFilter-->>SubagentRunner: Vec<usize> (top action indices)
    
    alt confidence >= MIN_CONFIDENT_HITS
        SubagentRunner->>SubAgent: register filtered actions only
        Note over SubagentRunner: High-confidence subset
    else
        SubagentRunner->>SubAgent: register all integration.tools
        Note over SubagentRunner: Fallback: full toolkit
    end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

feat(prompt): unified delegation guide + dynamic per-toolkit tool registration (#447) #570: Implements dynamic per-action ComposioActionTool registration and toolkit scoping, which this PR builds upon by filtering which actions are registered.
feat(agent): pure orchestrator pattern with per-skill delegation tools #496: Modifies context aggregation and subagent wiring in run_typed_mode, the same function where tool selection filtering is now integrated.

Suggested reviewers

senamakel

🐰✨ A Fuzzy Filter's Tale

A rabbit hops through toolkit trees,
Picking tools with expertise,
Verb by verb and word by word—
Now sub-agents hear what they've heard!
Smart selection lights the way,
Delegating work with flair today! 🌟

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately describes the main feature: adding fuzzy-filtering to skills_agent toolkit actions based on task prompt, which is the core change across all modified files.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (1)

src/openhuman/agent/harness/subagent_runner.rs (1)
293-307: Use debug! for these filter diagnostics.

These are per-spawn branch diagnostics, so info! will be noisy in normal runs. debug! keeps the traceability without promoting routine filtering decisions to the default log level. As per coding guidelines, "In Rust, use log/tracing at debug or trace level; prefer stable prefixes..."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/openhuman/agent/harness/subagent_runner.rs` around lines 293 - 307,
Replace the two tracing::info! diagnostics in the fuzzy filter branch of
subagent_runner (the calls with agent_id = %definition.id, toolkit = %tk, total
= integration.tools.len(), kept = filter_hits.len(), "[subagent_runner:typed]
fuzzy tool filter narrowed toolkit" and the else branch with filter_hits =
filter_hits.len(), "[subagent_runner:typed] fuzzy filter thin; falling back to
full toolkit") with tracing::debug! so these per-spawn filter decisions are
logged at debug level rather than info; keep the exact structured fields
(agent_id, toolkit, total, kept/filter_hits) and message text unchanged except
for using debug! to reduce noise.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/openhuman/agent/harness/subagent_runner.rs`:
- Around line 286-310: The prompt shows the full toolkit because
narrowed_integrations is rebuilt from parent.connected_integrations instead of
using the filtered "selected" tools; update the code so the same filtered list
created by filter_actions_by_prompt (and stored in selected/dynamic_tools) is
used to construct narrowed_integrations passed into
render_subagent_system_prompt: map the selected Vec<&ConnectedIntegrationTool>
back to the corresponding ConnectedIntegration entries (or build new
ConnectedIntegration objects containing only those tools) and replace the
current parent.connected_integrations-derived list with this narrowed list
before calling render_subagent_system_prompt, ensuring symbols involved are
filter_actions_by_prompt, MIN_CONFIDENT_HITS, selected, dynamic_tools,
narrowed_integrations, parent.connected_integrations, and
render_subagent_system_prompt.

In `@src/openhuman/agent/harness/tool_filter.rs`:
- Around line 56-71: The current hard verb gate in tool_filter.rs (the gated Vec
creation that filters actions by tool_verb(&a.name) and verbs) can remove valid
tools whose API names use generic verbs; instead stop filtering out actions here
and convert this hard gate into a score adjustment: keep all actions in gated
(or remove this filter entirely) and apply a positive boost when
tool_verb(&a.name) matches the detected verbs and a smaller penalty (or zero
boost) when it does not during the ranking/scoring step (where action scores are
computed downstream); update references to gated and any callers expecting
filtered length/top-K to use the full candidate set, and ensure the scoring
function uses tool_verb and verbs to nudge matches rather than drop them.

---

Nitpick comments:
In `@src/openhuman/agent/harness/subagent_runner.rs`:
- Around line 293-307: Replace the two tracing::info! diagnostics in the fuzzy
filter branch of subagent_runner (the calls with agent_id = %definition.id,
toolkit = %tk, total = integration.tools.len(), kept = filter_hits.len(),
"[subagent_runner:typed] fuzzy tool filter narrowed toolkit" and the else branch
with filter_hits = filter_hits.len(), "[subagent_runner:typed] fuzzy filter
thin; falling back to full toolkit") with tracing::debug! so these per-spawn
filter decisions are logged at debug level rather than info; keep the exact
structured fields (agent_id, toolkit, total, kept/filter_hits) and message text
unchanged except for using debug! to reduce noise.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 8663eb8c-ac53-49e4-a85c-4f18e1873c0f

📥 Commits

Reviewing files that changed from the base of the PR and between 70a2a6f and 7c7450e.

⛔ Files ignored due to path filters (2)

Cargo.lock is excluded by !**/*.lock
app/src-tauri/Cargo.lock is excluded by !**/*.lock

📒 Files selected for processing (12)

src/openhuman/agent/harness/mod.rs
src/openhuman/agent/harness/subagent_runner.rs
src/openhuman/agent/harness/tool_filter.rs
tests/fixtures/composio_facebook.json
tests/fixtures/composio_github.json
tests/fixtures/composio_gmail.json
tests/fixtures/composio_googledrive.json
tests/fixtures/composio_googlesheets.json
tests/fixtures/composio_instagram.json
tests/fixtures/composio_notion.json
tests/fixtures/composio_reddit.json
tests/fixtures/composio_slack.json

coderabbitai Bot reviewed Apr 15, 2026

View reviewed changes

Comment thread src/openhuman/agent/harness/subagent_runner.rs

Comment thread src/openhuman/agent/harness/tool_filter.rs

test(composio): mock toolkits route in ops integration test

bbe1555

sanil-23 marked this pull request as draft April 15, 2026 15:16

sanil-23 marked this pull request as ready for review April 15, 2026 15:25

graycyrus approved these changes Apr 15, 2026

View reviewed changes

graycyrus merged commit d545c19 into tinyhumansai:main Apr 15, 2026
8 checks passed

This was referenced Apr 17, 2026

feat(agent): progressive-disclosure handoff + token-based summarizer threshold (#574) #586

Merged

feat: improve sub-agent tooling, conversation timeline UX, and Tauri setup #646

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(agent): fuzzy-filter skills_agent toolkit actions by task prompt#579

feat(agent): fuzzy-filter skills_agent toolkit actions by task prompt#579
graycyrus merged 2 commits intotinyhumansai:mainfrom
sanil-23:feat/optimization-skills-agent-tool-filter

sanil-23 commented Apr 15, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Apr 15, 2026 •

edited

Loading

Rate limit exceeded

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related PRs

Suggested reviewers

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

sanil-23 commented Apr 15, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Problem

Solution

Submission Checklist

Impact

Related

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related PRs

Suggested reviewers

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

sanil-23 commented Apr 15, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 15, 2026 •

edited

Loading