fix: reduce agent loop hallucination and improve tool call reliability by sanil-23 · Pull Request #156 · tinyhumansai/openhuman

sanil-23 · 2026-04-01T10:21:45Z

Summary

Strengthen tool-use instructions with explicit anti-hallucination rules
Wire context guard into the tool loop to prevent context overflow
Add 120-second timeout on tool execution to prevent hangs
Add debug logging at all loop boundaries for diagnosis

Problem

Agent sometimes narrates tool use ("I'll run ls") without emitting <tool_call> tags — loop exits with no execution
Context guard exists (context_guard.rs) but was not wired into run_tool_call_loop() — context could silently overflow
No timeout on tool execution — hanging tools stall the entire turn
No logging at loop boundaries — impossible to diagnose missed tool calls

Solution

instructions.rs: Replace single "CRITICAL" line with 6 explicit rules: always use tags, never narrate without calling, use exact names, multiple calls ok, continue after results, only skip tools for knowledge-only answers
tool_loop.rs:
- Call context_guard.check() before each LLM request; abort on ContextExhausted
- Update guard with UsageInfo from each ChatResponse
- Wrap tool.execute() in tokio::time::timeout(120s)
- Add tracing::debug/warn/error at: loop start, LLM response (with token counts), tool call parsing, each tool execution, unknown tools, timeouts, final response
context_guard.rs: Make COMPACTION_TRIGGER_THRESHOLD pub(crate) for log messages

Submission Checklist

Unit tests — cargo check + cargo fmt pass; existing context_guard tests unaffected
E2E / integration — Requires real LLM backend for hallucination validation; structural correctness verified by compilation
Doc comments — Existing doc comments preserved
Inline comments — Section markers in tool loop (── Context guard ──, ── Approval hook ──)

Impact

Desktop + CLI: All agent loop paths affected (agent_turn, run_tool_call_loop)
Performance: 120s timeout per tool prevents indefinite hangs; context guard adds negligible overhead
Observability: [agent_loop] prefixed logs enable grep-based diagnosis across sidecar output

Summary by CodeRabbit

Bug Fixes & Improvements
- Clarified and tightened system prompt rules for consistent tool invocation.
- Added context utilization checks with compaction warnings and exhaustion handling.
- Introduced timeouts and improved logging for tool execution and LLM interactions.
- Strengthened provider retry/failover behavior to improve chat reliability.
New Features
- Added support for an alternate bracket-style tool-call syntax to improve parsing robustness.

- Strengthen tool-use instructions with explicit anti-hallucination rules: "NEVER narrate tool use without emitting tags", "use exact tool names", "only respond without tool call when no tool is needed" - Wire context guard into tool loop: check utilization before each LLM call, abort on context exhaustion (>95% with circuit breaker tripped) - Add 120-second timeout on tool execution to prevent hangs - Add debug/warn/error logging at all loop boundaries: LLM request, response (with token counts), tool call parsing, tool execution, unknown tools, timeouts, and final response Closes tinyhumansai#144

coderabbitai · 2026-04-01T10:22:04Z

📝 Walkthrough

Walkthrough

Exposed a crate-internal compaction threshold, tightened and restructured tool-invocation prompt rules, added bracket-style parsing for alternate tool-call syntax, integrated context-usage checks and timeouts into the tool loop, and added a chat method implementation with retry/backoff in the reliable provider.

Changes

Cohort / File(s)	Summary
Context & Loop Management `src/openhuman/agent/loop_/context_guard.rs`, `src/openhuman/agent/loop_/tool_loop.rs`	Made `COMPACTION_TRIGGER_THRESHOLD` `pub(crate)`. Integrated `ContextGuard` checks before LLM requests, bail on context exhaustion, update guard from LLM usage, added warnings/logging, and enforce 120s timeout around each `tool.execute(...)` with distinct handling for timeout vs failures.
Prompt / Instruction Text `src/openhuman/agent/loop_/instructions.rs`	Rewrote system prompt tool-invocation section into a structured "### Rules (MUST follow)" with six explicit rules and moved the example into a separate "### Example" section.
Tool-call Parsing `src/openhuman/agent/loop_/parse.rs`	Added `parse_bracket_tool_call(inner: &str)` to parse bracket-style pseudo-syntax, extended recognized open tags to include `[TOOL_CALL]`/`[tool_call]`, and updated fallback parsing logic to try bracket syntax before issuing malformed warnings.
Provider Retry Path `src/openhuman/providers/reliable.rs`	Added `async fn chat(&self, request: ChatRequest<'_>, model: &str, temperature: f64)` to `ReliableProvider` with per-model/provider retry/backoff, rate-limit/key rotation handling, early bail on context-window errors, and aggregated failure reporting.

Sequence Diagram(s)

sequenceDiagram
    actor Agent
    participant ContextGuard
    participant LLM
    participant Parser
    participant ToolExecutor
    participant Tool

    Agent->>ContextGuard: check utilization
    alt exhausted
        ContextGuard-->>Agent: bail (context exhausted)
    else compaction suggested
        ContextGuard-->>Agent: warn (compaction threshold)
    end

    Agent->>LLM: send request (history + prompt)
    LLM-->>Agent: response + usage?
    Agent->>ContextGuard: update(token usage) 

    Agent->>Parser: parse response for tool calls (JSON or bracket)
    alt tool call found
        Parser-->>Agent: ParsedToolCall
        Agent->>ToolExecutor: invoke tool (timeout 120s)
        ToolExecutor->>Tool: execute(args)
        alt timeout
            Tool-->>ToolExecutor: Err(timeout)
            ToolExecutor-->>Agent: log timeout, continue loop
        else success/failure payload
            Tool-->>ToolExecutor: Ok(result)
            ToolExecutor-->>Agent: return result in <tool_result>
        end
        Agent->>Agent: continue tool loop or finalize
    else no tool
        Agent-->>Agent: return final response
    end

Estimated Code Review Effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related issues

[Bug] Agent loop: hallucination and missing tool calls (noisy prompts / loop integration) #144 — Addresses hallucinated/missing tool calls via tightened prompt rules, extended parsing, and tool-loop execution/timeout changes.

Poem

🐰 I hopped through code with careful cheer,
I added guards so nothing veers,
Timeouts set, and prompts made clear,
Tool-calls parsed both far and near,
A tidy loop — now cheer, my dear! 🎩

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly and specifically summarizes the main changes: reducing hallucination through improved instructions and increasing tool call reliability through context guards and timeouts.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

src/openhuman/agent/loop_/instructions.rs (1)
9-10: Mismatch between documented and accepted tag formats.

Instructions document only <tool_call> format (lines 9-10), but the parser at line 85 accepts four variants: <tool_call>, <toolcall>, <tool-call>, and <invoke>. Since the LLM will follow the documented format, this is functionally acceptable. Consider either documenting the parser's flexibility in the instructions or reducing the parser to support only the documented format.

Also applies to: 15-16
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/openhuman/agent/loop_/instructions.rs` around lines 9 - 10, The docs in
instructions.push_str currently only show the <tool_call> tag but the parser
accepts four variants (<tool_call>, <toolcall>, <tool-call>, <invoke>); update
the instructions text to list and show examples for all accepted variants (add
the additional tag forms in the push_str strings) OR change the parser to only
accept the documented <tool_call> form so they match; locate the instruction
strings in instructions.rs (the instructions.push_str calls) and either expand
the example to include "<toolcall>", "<tool-call>", and "<invoke>" or modify the
parser's tag-matching logic to restrict it to "<tool_call>" only.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/openhuman/agent/loop_/tool_loop.rs`:
- Around line 288-329: Extract the hardcoded 120s into a named constant (e.g.,
TOOL_EXEC_TIMEOUT_SECS) and use it in the tokio::time::timeout call wrapping
tool.execute(call.arguments.clone()); ensure the constant value is changed so it
does not equal DELEGATE_TIMEOUT_SECS (from delegate.rs) to avoid simultaneous
firing (pick a slightly shorter or longer value and document the choice in a
comment), and update any related log messages to reference the new constant name
rather than the literal 120.

---

Nitpick comments:
In `@src/openhuman/agent/loop_/instructions.rs`:
- Around line 9-10: The docs in instructions.push_str currently only show the
<tool_call> tag but the parser accepts four variants (<tool_call>, <toolcall>,
<tool-call>, <invoke>); update the instructions text to list and show examples
for all accepted variants (add the additional tag forms in the push_str strings)
OR change the parser to only accept the documented <tool_call> form so they
match; locate the instruction strings in instructions.rs (the
instructions.push_str calls) and either expand the example to include
"<toolcall>", "<tool-call>", and "<invoke>" or modify the parser's tag-matching
logic to restrict it to "<tool_call>" only.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: b741908e-39de-4fc5-999f-dcb6c4df8d24

📥 Commits

Reviewing files that changed from the base of the PR and between 7fd4568 and 838096e.

📒 Files selected for processing (3)

src/openhuman/agent/loop_/context_guard.rs
src/openhuman/agent/loop_/instructions.rs
src/openhuman/agent/loop_/tool_loop.rs

coderabbitai · 2026-04-01T10:26:40Z

+                // Execute with a 120-second timeout to prevent hangs.
+                match tokio::time::timeout(
+                    std::time::Duration::from_secs(120),
+                    tool.execute(call.arguments.clone()),
+                )
+                .await
+                {
+                    Ok(Ok(r)) => {
                        if r.success {
+                            tracing::debug!(
+                                iteration,
+                                tool = call.name.as_str(),
+                                output_len = r.output.len(),
+                                "[agent_loop] tool succeeded"
+                            );
                            scrub_credentials(&r.output)
                        } else {
-                            format!("Error: {}", r.error.unwrap_or(r.output))
+                            let err_msg = r.error.unwrap_or(r.output);
+                            tracing::warn!(
+                                iteration,
+                                tool = call.name.as_str(),
+                                "[agent_loop] tool returned error: {err_msg}"
+                            );
+                            format!("Error: {err_msg}")
                        }
                    }
-                    Err(e) => {
+                    Ok(Err(e)) => {
+                        tracing::error!(
+                            iteration,
+                            tool = call.name.as_str(),
+                            "[agent_loop] tool execution failed: {e}"
+                        );
                        format!("Error executing {}: {e}", call.name)
                    }
+                    Err(_) => {
+                        tracing::error!(
+                            iteration,
+                            tool = call.name.as_str(),
+                            "[agent_loop] tool execution timed out after 120s"
+                        );
+                        format!("Error: tool '{}' timed out after 120 seconds", call.name)
+                    }


⚠️ Potential issue | 🟡 Minor

Timeout value conflicts with delegate tool's internal timeout.

The 120-second timeout matches exactly the DELEGATE_TIMEOUT_SECS in src/openhuman/tools/delegate.rs:12-13. When a delegated agent call hangs, both timeouts fire simultaneously, and the outer timeout may shadow the delegate's more specific error message. Additionally, the hardcoded 120 should be extracted to a named constant for maintainability.

Proposed fix: extract constant and adjust value

+/// Timeout for individual tool execution to prevent hangs. +/// Set slightly above the longest internal tool timeout (delegate: 120s). +const TOOL_EXECUTION_TIMEOUT_SECS: u64 = 125; + // ... in the tool execution block: - match tokio::time::timeout( - std::time::Duration::from_secs(120), - tool.execute(call.arguments.clone()), - ) + match tokio::time::timeout( + std::time::Duration::from_secs(TOOL_EXECUTION_TIMEOUT_SECS), + tool.execute(call.arguments.clone()), + )

Note: The timeout does not propagate a cancellation signal to the tool—tools continue running after the outer timeout fires. This is acceptable given the Tool trait lacks a cancellation mechanism, but be aware that timed-out operations may complete in the background.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

// Execute with a 120-second timeout to prevent hangs.

match tokio::time::timeout(

std::time::Duration::from_secs(120),

tool.execute(call.arguments.clone()),

)

.await

{

Ok(Ok(r)) => {

if r.success {

tracing::debug!(

iteration,

tool = call.name.as_str(),

output_len = r.output.len(),

"[agent_loop] tool succeeded"

);

scrub_credentials(&r.output)

} else {

format!("Error: {}", r.error.unwrap_or(r.output))

let err_msg = r.error.unwrap_or(r.output);

tracing::warn!(

iteration,

tool = call.name.as_str(),

"[agent_loop] tool returned error: {err_msg}"

);

format!("Error: {err_msg}")

}

}

Err(e) => {

Ok(Err(e)) => {

tracing::error!(

iteration,

tool = call.name.as_str(),

"[agent_loop] tool execution failed: {e}"

);

format!("Error executing {}: {e}", call.name)

}

Err(_) => {

tracing::error!(

iteration,

tool = call.name.as_str(),

"[agent_loop] tool execution timed out after 120s"

);

format!("Error: tool '{}' timed out after 120 seconds", call.name)

}

/// Timeout for individual tool execution to prevent hangs.

/// Set slightly above the longest internal tool timeout (delegate: 120s).

const TOOL_EXECUTION_TIMEOUT_SECS: u64 = 125;

// Execute with a 120-second timeout to prevent hangs.

match tokio::time::timeout(

std::time::Duration::from_secs(TOOL_EXECUTION_TIMEOUT_SECS),

tool.execute(call.arguments.clone()),

)

.await

{

Ok(Ok(r)) => {

if r.success {

tracing::debug!(

iteration,

tool = call.name.as_str(),

output_len = r.output.len(),

"[agent_loop] tool succeeded"

);

scrub_credentials(&r.output)

} else {

let err_msg = r.error.unwrap_or(r.output);

tracing::warn!(

iteration,

tool = call.name.as_str(),

"[agent_loop] tool returned error: {err_msg}"

);

format!("Error: {err_msg}")

}

}

Ok(Err(e)) => {

tracing::error!(

iteration,

tool = call.name.as_str(),

"[agent_loop] tool execution failed: {e}"

);

format!("Error executing {}: {e}", call.name)

}

Err(_) => {

tracing::error!(

iteration,

tool = call.name.as_str(),

"[agent_loop] tool execution timed out after 120s"

);

format!("Error: tool '{}' timed out after 120 seconds", call.name)

}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/openhuman/agent/loop_/tool_loop.rs` around lines 288 - 329, Extract the hardcoded 120s into a named constant (e.g., TOOL_EXEC_TIMEOUT_SECS) and use it in the tokio::time::timeout call wrapping tool.execute(call.arguments.clone()); ensure the constant value is changed so it does not equal DELEGATE_TIMEOUT_SECS (from delegate.rs) to avoid simultaneous firing (pick a slightly shorter or longer value and document the choice in a comment), and update any related log messages to reference the new constant name rather than the literal 120.

…tool call parser Root cause: ReliableProvider did not implement the chat(ChatRequest) trait method. The agent loop called provider.chat() which fell through to the default trait implementation — this used chat_with_history() which strips native tool support and sends raw tool-role messages without the required assistant tool_calls, causing the backend Jinja template to reject the request with "Message has tool role, but there was no previous assistant message with a tool call!" Fixes: - Add chat(ChatRequest) to ReliableProvider with full retry/failover logic, matching the existing chat_with_system/chat_with_history implementations. Delegates to inner provider's chat() which properly converts messages to native OpenAI format with tool_calls. - Add [TOOL_CALL]/[/TOOL_CALL] bracket format to the tool call parser (parse.rs) — some models emit this format instead of <tool_call> XML. - Add parse_bracket_tool_call() for the pseudo-syntax format: {tool => "name", args => { --key "value" }} Verified with real staging backend (agentic-v1 model): - Shell tool calls execute successfully - File read tool calls return real content - Knowledge questions return without tool calls - No Jinja template errors Closes tinyhumansai#144

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (2)

src/openhuman/providers/reliable.rs (1)
522-630: Implementation is correct but duplicates retry/failover logic.

The new chat method correctly implements the same retry/failover pattern as chat_with_system, chat_with_history, and chat_with_tools. The error classification, backoff computation, key rotation, and failure aggregation are all consistent.

However, this introduces a fourth copy of nearly identical retry/failover logic (~100 lines). Consider extracting a generic helper in a future refactor to reduce maintenance burden:
async fn with_retry<F, T, Fut>(&self, model: &str, operation: F) -> anyhow::Result<T>
where
    F: Fn(&dyn Provider, &str) -> Fut,
    Fut: std::future::Future<Output = anyhow::Result<T>>,
This is not blocking for this PR since it matches the existing code style, but worth considering for maintainability.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/openhuman/providers/reliable.rs` around lines 522 - 630, The chat method
duplicates the same retry/failover loop used in chat_with_system,
chat_with_history, and chat_with_tools; extract that logic into a shared helper
to avoid repetition. Implement an async helper like with_retry(&self, model:
&str, operation: F) -> anyhow::Result<T> (where F: Fn(&dyn Provider, &str) ->
Fut and Fut: Future<Output = anyhow::Result<T>>) that contains the nested loops
over model_chain, providers, attempts, backoff computation (compute_backoff),
key rotation (rotate_key), failure aggregation (push_failure/failures),
non-retryable/rate-limit handling (is_non_retryable, is_rate_limited,
is_non_retryable_rate_limit, is_context_window_exceeded), and tracing; then have
chat, chat_with_system, chat_with_history, and chat_with_tools call with_retry
passing a closure that invokes provider.chat (or the appropriate provider
method) so each method only constructs the provider request and delegates
retries to the shared helper.
src/openhuman/agent/loop_/parse.rs (1)
85-120: Bracket syntax is undocumented—models are not instructed to use it.

The system instructions in instructions.rs (lines 6-40 per context snippet 3) explicitly mandate <tool_call> XML tags with JSON format. There's no mention of the bracket syntax {tool => "name", args => { --key "value" }}.

This fallback parser handles non-standard model outputs gracefully, which is good for resilience. However, consider:

Adding a comment clarifying this is a compatibility fallback for models that don't follow instructions

The bracket syntax only produces string values (line 103), which works for tools expecting strings but will fail for tools expecting numbers, booleans, or nested objects
📝 Suggested documentation comment
 /// Parse bracket-style pseudo-syntax emitted by some models:
 /// `{tool => "shell", args => { --command "ls" }}`
 ///
 /// Extracts tool name and converts `--key "value"` args into a JSON object.
+///
+/// NOTE: This is a compatibility fallback for models that emit non-standard syntax
+/// despite being instructed to use `<tool_call>` XML tags with JSON. All argument
+/// values are extracted as strings; tools requiring typed parameters (numbers,
+/// booleans, objects) should use the primary JSON format.
 fn parse_bracket_tool_call(inner: &str) -> Option<ParsedToolCall> {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/openhuman/agent/loop_/parse.rs` around lines 85 - 120, Add a short doc
comment above parse_bracket_tool_call that this is a compatibility fallback for
non-conforming model outputs, and change the argument parsing so values are not
always treated as strings: when extracting values in parse_bracket_tool_call
attempt to interpret each captured value as a JSON scalar/object by trying
serde_json::from_str (or a small helper that tries parsing as
number/bool/null/object/array) and on parse failure fallback to a String; update
the ARG_RE capture logic if needed to allow unquoted primitives or nested braces
and ensure ParsedToolCall.arguments stores the parsed serde_json::Value (not
always String) so tools that expect numbers/booleans/nested objects receive
proper types.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/openhuman/agent/loop_/parse.rs`:
- Around line 89-104: The loop in parse_bracket_tool_call currently uses the ?
operator on arg_cap.get(1) and arg_cap.get(2) inside the for over
ARG_RE.captures_iter, which causes the whole function to return None on any
malformed argument; change the loop to explicitly check those captures (e.g.,
with if let Some(k) = arg_cap.get(1) { ... } else { continue } and similarly for
get(2)) and skip malformed pairs instead of returning, so that only the bad
argument is ignored and the rest are inserted into the serde_json::Map before
the function returns the ParsedToolCall.

In `@src/openhuman/providers/reliable.rs`:
- Around line 522-630: Add unit tests for the new chat method mirroring
scenarios covered by chat_with_system and chat_with_history (successful provider
response, retries, rate-limit key rotation, non-retryable errors, and
context-window exceeded) and extend MockProvider to implement chat() so tests
can drive provider behavior directly; update logging statements inside chat (the
tracing::info! and tracing::warn! calls around provider recovery, rate-limit key
rotation, retries, exhausted retries, and non-retryable errors) to use the
project's preferred levels (debug or trace) to comply with coding guidelines,
and ensure helper methods referenced (rotate_key, compute_backoff,
is_context_window_exceeded) are exercised by the tests.

---

Nitpick comments:
In `@src/openhuman/agent/loop_/parse.rs`:
- Around line 85-120: Add a short doc comment above parse_bracket_tool_call that
this is a compatibility fallback for non-conforming model outputs, and change
the argument parsing so values are not always treated as strings: when
extracting values in parse_bracket_tool_call attempt to interpret each captured
value as a JSON scalar/object by trying serde_json::from_str (or a small helper
that tries parsing as number/bool/null/object/array) and on parse failure
fallback to a String; update the ARG_RE capture logic if needed to allow
unquoted primitives or nested braces and ensure ParsedToolCall.arguments stores
the parsed serde_json::Value (not always String) so tools that expect
numbers/booleans/nested objects receive proper types.

In `@src/openhuman/providers/reliable.rs`:
- Around line 522-630: The chat method duplicates the same retry/failover loop
used in chat_with_system, chat_with_history, and chat_with_tools; extract that
logic into a shared helper to avoid repetition. Implement an async helper like
with_retry(&self, model: &str, operation: F) -> anyhow::Result<T> (where F:
Fn(&dyn Provider, &str) -> Fut and Fut: Future<Output = anyhow::Result<T>>) that
contains the nested loops over model_chain, providers, attempts, backoff
computation (compute_backoff), key rotation (rotate_key), failure aggregation
(push_failure/failures), non-retryable/rate-limit handling (is_non_retryable,
is_rate_limited, is_non_retryable_rate_limit, is_context_window_exceeded), and
tracing; then have chat, chat_with_system, chat_with_history, and
chat_with_tools call with_retry passing a closure that invokes provider.chat (or
the appropriate provider method) so each method only constructs the provider
request and delegates retries to the shared helper.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 63f5ea0b-5fc6-4481-8558-beae7f7b6d1e

📥 Commits

Reviewing files that changed from the base of the PR and between 838096e and caba8de.

📒 Files selected for processing (2)

src/openhuman/agent/loop_/parse.rs
src/openhuman/providers/reliable.rs

coderabbitai · 2026-04-01T13:32:08Z

+fn parse_bracket_tool_call(inner: &str) -> Option<ParsedToolCall> {
+    static BRACKET_TOOL_RE: LazyLock<Regex> = LazyLock::new(|| {
+        Regex::new(r#"(?s)\{?\s*tool\s*=>\s*"([^"]+)"\s*,\s*args\s*=>\s*\{(.*?)\}\s*\}?"#).unwrap()
+    });
+    static ARG_RE: LazyLock<Regex> = LazyLock::new(|| Regex::new(r#"--(\w+)\s+"([^"]*)"#).unwrap());
+
+    let cap = BRACKET_TOOL_RE.captures(inner.trim())?;
+    let name = cap.get(1)?.as_str().to_string();
+    let args_body = cap.get(2)?.as_str();
+
+    let mut args = serde_json::Map::new();
+    for arg_cap in ARG_RE.captures_iter(args_body) {
+        let key = arg_cap.get(1)?.as_str().to_string();
+        let value = arg_cap.get(2)?.as_str().to_string();
+        args.insert(key, serde_json::Value::String(value));
+    }


⚠️ Potential issue | 🟡 Minor

Early return on None from captures_iter inside loop can skip remaining arguments.

Line 101-102 uses ? operator inside the for loop, which causes the entire function to return None if any single argument capture group fails to extract. This means if one --key "value" pair is malformed, all previously parsed arguments are discarded.

This is likely unintentional—typically you'd want to skip the malformed argument and continue parsing the rest.

🔧 Proposed fix to continue on malformed arguments

let mut args = serde_json::Map::new(); for arg_cap in ARG_RE.captures_iter(args_body) { - let key = arg_cap.get(1)?.as_str().to_string(); - let value = arg_cap.get(2)?.as_str().to_string(); - args.insert(key, serde_json::Value::String(value)); + if let (Some(key_match), Some(value_match)) = (arg_cap.get(1), arg_cap.get(2)) { + let key = key_match.as_str().to_string(); + let value = value_match.as_str().to_string(); + args.insert(key, serde_json::Value::String(value)); + } }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/openhuman/agent/loop_/parse.rs` around lines 89 - 104, The loop in parse_bracket_tool_call currently uses the ? operator on arg_cap.get(1) and arg_cap.get(2) inside the for over ARG_RE.captures_iter, which causes the whole function to return None on any malformed argument; change the loop to explicitly check those captures (e.g., with if let Some(k) = arg_cap.get(1) { ... } else { continue } and similarly for get(2)) and skip malformed pairs instead of returning, so that only the bad argument is ignored and the rest are inserted into the serde_json::Map before the function returns the ParsedToolCall.

coderabbitai · 2026-04-01T13:32:08Z

+    async fn chat(
+        &self,
+        request: ChatRequest<'_>,
+        model: &str,
+        temperature: f64,
+    ) -> anyhow::Result<ChatResponse> {
+        let models = self.model_chain(model);
+        let mut failures = Vec::new();
+
+        for current_model in &models {
+            for (provider_name, provider) in &self.providers {
+                let mut backoff_ms = self.base_backoff_ms;
+
+                for attempt in 0..=self.max_retries {
+                    let req = ChatRequest {
+                        messages: request.messages,
+                        tools: request.tools,
+                        system_prompt_cache_boundary: request.system_prompt_cache_boundary,
+                    };
+                    match provider.chat(req, current_model, temperature).await {
+                        Ok(resp) => {
+                            if attempt > 0 || *current_model != model {
+                                tracing::info!(
+                                    provider = provider_name,
+                                    model = *current_model,
+                                    attempt,
+                                    original_model = model,
+                                    "Provider recovered (failover/retry)"
+                                );
+                            }
+                            return Ok(resp);
+                        }
+                        Err(e) => {
+                            let non_retryable_rate_limit = is_non_retryable_rate_limit(&e);
+                            let non_retryable = is_non_retryable(&e) || non_retryable_rate_limit;
+                            let rate_limited = is_rate_limited(&e);
+                            let failure_reason = failure_reason(rate_limited, non_retryable);
+                            let error_detail = compact_error_detail(&e);
+
+                            push_failure(
+                                &mut failures,
+                                provider_name,
+                                current_model,
+                                attempt + 1,
+                                self.max_retries + 1,
+                                failure_reason,
+                                &error_detail,
+                            );
+
+                            if rate_limited && !non_retryable_rate_limit {
+                                if let Some(new_key) = self.rotate_key() {
+                                    tracing::info!(
+                                        provider = provider_name,
+                                        error = %error_detail,
+                                        "Rate limited, rotated API key (key ending ...{})",
+                                        &new_key[new_key.len().saturating_sub(4)..]
+                                    );
+                                }
+                            }
+
+                            if non_retryable {
+                                tracing::warn!(
+                                    provider = provider_name,
+                                    model = *current_model,
+                                    error = %error_detail,
+                                    "Non-retryable error, moving on"
+                                );
+
+                                if is_context_window_exceeded(&e) {
+                                    anyhow::bail!(
+                                        "Request exceeds model context window; retries and fallbacks were skipped. Attempts:\n{}",
+                                        failures.join("\n")
+                                    );
+                                }
+
+                                break;
+                            }
+
+                            if attempt < self.max_retries {
+                                let wait = self.compute_backoff(backoff_ms, &e);
+                                tracing::warn!(
+                                    provider = provider_name,
+                                    model = *current_model,
+                                    attempt = attempt + 1,
+                                    backoff_ms = wait,
+                                    reason = failure_reason,
+                                    error = %error_detail,
+                                    "Provider call failed, retrying"
+                                );
+                                tokio::time::sleep(Duration::from_millis(wait)).await;
+                                backoff_ms = (backoff_ms.saturating_mul(2)).min(10_000);
+                            }
+                        }
+                    }
+                }
+
+                tracing::warn!(
+                    provider = provider_name,
+                    model = *current_model,
+                    "Exhausted retries, trying next provider/model"
+                );
+            }
+        }
+
+        anyhow::bail!(
+            "All providers/models failed. Attempts:\n{}",
+            failures.join("\n")
+        )
+    }


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Check if MockProvider or ModelAwareMock implement chat method rg -n -A5 'impl Provider for (Mock|ModelAware)' src/openhuman/providers/reliable.rs

Repository: tinyhumansai/openhuman

Length of output: 491

🏁 Script executed:

#!/bin/bash # Check the full impl Provider for MockProvider block rg -n -A50 'impl Provider for MockProvider' src/openhuman/providers/reliable.rs | head -70

Repository: tinyhumansai/openhuman

Length of output: 1969

🏁 Script executed:

#!/bin/bash # Search for any chat method tests (not chat_with_system or chat_with_history) rg -n 'fn test.*chat\(' src/openhuman/providers/reliable.rs | grep -v 'chat_with_system\|chat_with_history'

Repository: tinyhumansai/openhuman

Length of output: 48

🏁 Script executed:

#!/bin/bash # Check the exact lines 522-630 to inspect logging sed -n '522,630p' src/openhuman/providers/reliable.rs

Repository: tinyhumansai/openhuman

Length of output: 5057

Consider adding tests for the new chat method, and adjust logging levels to match coding guidelines.

The new chat method has no dedicated tests. While its implementation mirrors the well-tested chat_with_system and chat_with_history methods, explicit test coverage would help catch regressions.

The existing MockProvider doesn't implement chat(), which defaults to the trait's fallback. Consider extending MockProvider to support testing chat() directly in a follow-up.

Additionally, the logging in this method uses tracing::info!() and tracing::warn!() levels. Per coding guidelines, new/changed flows should use debug or trace level for Rust files. Consider adjusting log levels accordingly to align with project standards.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/openhuman/providers/reliable.rs` around lines 522 - 630, Add unit tests for the new chat method mirroring scenarios covered by chat_with_system and chat_with_history (successful provider response, retries, rate-limit key rotation, non-retryable errors, and context-window exceeded) and extend MockProvider to implement chat() so tests can drive provider behavior directly; update logging statements inside chat (the tracing::info! and tracing::warn! calls around provider recovery, rate-limit key rotation, retries, exhausted retries, and non-retryable errors) to use the project's preferred levels (debug or trace) to comply with coding guidelines, and ensure helper methods referenced (rotate_key, compute_backoff, is_context_window_exceeded) are exercised by the tests.

…i#156 The actual fix in tinyhumansai#156 was adding chat(ChatRequest) to ReliableProvider. The prompt changes in instructions.rs and the bracket tool call parser in parse.rs were added during investigation but are not needed — the model uses native tool calls when ReliableProvider properly delegates to the inner provider.

The actual fix in #156 was adding chat(ChatRequest) to ReliableProvider. The prompt changes in instructions.rs and the bracket tool call parser in parse.rs were added during investigation but are not needed — the model uses native tool calls when ReliableProvider properly delegates to the inner provider.

coderabbitai Bot reviewed Apr 1, 2026

View reviewed changes

graycyrus merged commit 305c58a into tinyhumansai:main Apr 1, 2026
8 checks passed

sanil-23 mentioned this pull request Apr 1, 2026

revert: remove unnecessary prompt and parser changes from #156 #169

Merged

5 tasks

sanil-23 mentioned this pull request Apr 1, 2026

[Bug] Memory Workspace UI broken after controller registry migration #170

Closed

This was referenced Apr 6, 2026

fix(agent): inject skill tools into agent registry and unify ToolResult (#341) #360

Merged

feat(integrations): add agent integration tools for Twilio, Google Places, and Parallel #375

Merged

feat(agent): simplify harness + trim bundled prompts #500

Merged

This was referenced Apr 17, 2026

feat(agent): progressive-disclosure handoff + token-based summarizer threshold (#574) #586

Merged

feat(planner): read-only sandbox gate on composio meta-tools #904

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: reduce agent loop hallucination and improve tool call reliability#156

fix: reduce agent loop hallucination and improve tool call reliability#156
graycyrus merged 2 commits intotinyhumansai:mainfrom
sanil-23:fix/agent-loop-hallucination

sanil-23 commented Apr 1, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Apr 1, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram(s)

Estimated Code Review Effort

Possibly related issues

Poem

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Apr 1, 2026

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Apr 1, 2026

Uh oh!

coderabbitai Bot Apr 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

sanil-23 commented Apr 1, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Problem

Solution

Submission Checklist

Impact

Related

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated Code Review Effort

Possibly related issues

Poem

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

sanil-23 commented Apr 1, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 1, 2026 •

edited

Loading