Skip to content

fix: reduce agent loop hallucination and improve tool call reliability#156

Merged
graycyrus merged 2 commits intotinyhumansai:mainfrom
sanil-23:fix/agent-loop-hallucination
Apr 1, 2026
Merged

fix: reduce agent loop hallucination and improve tool call reliability#156
graycyrus merged 2 commits intotinyhumansai:mainfrom
sanil-23:fix/agent-loop-hallucination

Conversation

@sanil-23
Copy link
Copy Markdown
Contributor

@sanil-23 sanil-23 commented Apr 1, 2026

Summary

  • Strengthen tool-use instructions with explicit anti-hallucination rules
  • Wire context guard into the tool loop to prevent context overflow
  • Add 120-second timeout on tool execution to prevent hangs
  • Add debug logging at all loop boundaries for diagnosis

Problem

  • Agent sometimes narrates tool use ("I'll run ls") without emitting <tool_call> tags — loop exits with no execution
  • Context guard exists (context_guard.rs) but was not wired into run_tool_call_loop() — context could silently overflow
  • No timeout on tool execution — hanging tools stall the entire turn
  • No logging at loop boundaries — impossible to diagnose missed tool calls

Solution

  • instructions.rs: Replace single "CRITICAL" line with 6 explicit rules: always use tags, never narrate without calling, use exact names, multiple calls ok, continue after results, only skip tools for knowledge-only answers
  • tool_loop.rs:
    • Call context_guard.check() before each LLM request; abort on ContextExhausted
    • Update guard with UsageInfo from each ChatResponse
    • Wrap tool.execute() in tokio::time::timeout(120s)
    • Add tracing::debug/warn/error at: loop start, LLM response (with token counts), tool call parsing, each tool execution, unknown tools, timeouts, final response
  • context_guard.rs: Make COMPACTION_TRIGGER_THRESHOLD pub(crate) for log messages

Submission Checklist

  • Unit testscargo check + cargo fmt pass; existing context_guard tests unaffected
  • E2E / integration — Requires real LLM backend for hallucination validation; structural correctness verified by compilation
  • Doc comments — Existing doc comments preserved
  • Inline comments — Section markers in tool loop (── Context guard ──, ── Approval hook ──)

Impact

  • Desktop + CLI: All agent loop paths affected (agent_turn, run_tool_call_loop)
  • Performance: 120s timeout per tool prevents indefinite hangs; context guard adds negligible overhead
  • Observability: [agent_loop] prefixed logs enable grep-based diagnosis across sidecar output

Related

Summary by CodeRabbit

  • Bug Fixes & Improvements

    • Clarified and tightened system prompt rules for consistent tool invocation.
    • Added context utilization checks with compaction warnings and exhaustion handling.
    • Introduced timeouts and improved logging for tool execution and LLM interactions.
    • Strengthened provider retry/failover behavior to improve chat reliability.
  • New Features

    • Added support for an alternate bracket-style tool-call syntax to improve parsing robustness.

- Strengthen tool-use instructions with explicit anti-hallucination rules:
  "NEVER narrate tool use without emitting tags", "use exact tool names",
  "only respond without tool call when no tool is needed"
- Wire context guard into tool loop: check utilization before each LLM
  call, abort on context exhaustion (>95% with circuit breaker tripped)
- Add 120-second timeout on tool execution to prevent hangs
- Add debug/warn/error logging at all loop boundaries: LLM request,
  response (with token counts), tool call parsing, tool execution,
  unknown tools, timeouts, and final response

Closes tinyhumansai#144
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 1, 2026

📝 Walkthrough

Walkthrough

Exposed a crate-internal compaction threshold, tightened and restructured tool-invocation prompt rules, added bracket-style parsing for alternate tool-call syntax, integrated context-usage checks and timeouts into the tool loop, and added a chat method implementation with retry/backoff in the reliable provider.

Changes

Cohort / File(s) Summary
Context & Loop Management
src/openhuman/agent/loop_/context_guard.rs, src/openhuman/agent/loop_/tool_loop.rs
Made COMPACTION_TRIGGER_THRESHOLD pub(crate). Integrated ContextGuard checks before LLM requests, bail on context exhaustion, update guard from LLM usage, added warnings/logging, and enforce 120s timeout around each tool.execute(...) with distinct handling for timeout vs failures.
Prompt / Instruction Text
src/openhuman/agent/loop_/instructions.rs
Rewrote system prompt tool-invocation section into a structured "### Rules (MUST follow)" with six explicit rules and moved the example into a separate "### Example" section.
Tool-call Parsing
src/openhuman/agent/loop_/parse.rs
Added parse_bracket_tool_call(inner: &str) to parse bracket-style pseudo-syntax, extended recognized open tags to include [TOOL_CALL]/[tool_call], and updated fallback parsing logic to try bracket syntax before issuing malformed warnings.
Provider Retry Path
src/openhuman/providers/reliable.rs
Added async fn chat(&self, request: ChatRequest<'_>, model: &str, temperature: f64) to ReliableProvider with per-model/provider retry/backoff, rate-limit/key rotation handling, early bail on context-window errors, and aggregated failure reporting.

Sequence Diagram(s)

sequenceDiagram
    actor Agent
    participant ContextGuard
    participant LLM
    participant Parser
    participant ToolExecutor
    participant Tool

    Agent->>ContextGuard: check utilization
    alt exhausted
        ContextGuard-->>Agent: bail (context exhausted)
    else compaction suggested
        ContextGuard-->>Agent: warn (compaction threshold)
    end

    Agent->>LLM: send request (history + prompt)
    LLM-->>Agent: response + usage?
    Agent->>ContextGuard: update(token usage) 

    Agent->>Parser: parse response for tool calls (JSON or bracket)
    alt tool call found
        Parser-->>Agent: ParsedToolCall
        Agent->>ToolExecutor: invoke tool (timeout 120s)
        ToolExecutor->>Tool: execute(args)
        alt timeout
            Tool-->>ToolExecutor: Err(timeout)
            ToolExecutor-->>Agent: log timeout, continue loop
        else success/failure payload
            Tool-->>ToolExecutor: Ok(result)
            ToolExecutor-->>Agent: return result in <tool_result>
        end
        Agent->>Agent: continue tool loop or finalize
    else no tool
        Agent-->>Agent: return final response
    end
Loading

Estimated Code Review Effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related issues

Poem

🐰 I hopped through code with careful cheer,
I added guards so nothing veers,
Timeouts set, and prompts made clear,
Tool-calls parsed both far and near,
A tidy loop — now cheer, my dear! 🎩

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically summarizes the main changes: reducing hallucination through improved instructions and increasing tool call reliability through context guards and timeouts.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
src/openhuman/agent/loop_/instructions.rs (1)

9-10: Mismatch between documented and accepted tag formats.

Instructions document only <tool_call> format (lines 9-10), but the parser at line 85 accepts four variants: <tool_call>, <toolcall>, <tool-call>, and <invoke>. Since the LLM will follow the documented format, this is functionally acceptable. Consider either documenting the parser's flexibility in the instructions or reducing the parser to support only the documented format.

Also applies to: 15-16

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/openhuman/agent/loop_/instructions.rs` around lines 9 - 10, The docs in
instructions.push_str currently only show the <tool_call> tag but the parser
accepts four variants (<tool_call>, <toolcall>, <tool-call>, <invoke>); update
the instructions text to list and show examples for all accepted variants (add
the additional tag forms in the push_str strings) OR change the parser to only
accept the documented <tool_call> form so they match; locate the instruction
strings in instructions.rs (the instructions.push_str calls) and either expand
the example to include "<toolcall>", "<tool-call>", and "<invoke>" or modify the
parser's tag-matching logic to restrict it to "<tool_call>" only.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/openhuman/agent/loop_/tool_loop.rs`:
- Around line 288-329: Extract the hardcoded 120s into a named constant (e.g.,
TOOL_EXEC_TIMEOUT_SECS) and use it in the tokio::time::timeout call wrapping
tool.execute(call.arguments.clone()); ensure the constant value is changed so it
does not equal DELEGATE_TIMEOUT_SECS (from delegate.rs) to avoid simultaneous
firing (pick a slightly shorter or longer value and document the choice in a
comment), and update any related log messages to reference the new constant name
rather than the literal 120.

---

Nitpick comments:
In `@src/openhuman/agent/loop_/instructions.rs`:
- Around line 9-10: The docs in instructions.push_str currently only show the
<tool_call> tag but the parser accepts four variants (<tool_call>, <toolcall>,
<tool-call>, <invoke>); update the instructions text to list and show examples
for all accepted variants (add the additional tag forms in the push_str strings)
OR change the parser to only accept the documented <tool_call> form so they
match; locate the instruction strings in instructions.rs (the
instructions.push_str calls) and either expand the example to include
"<toolcall>", "<tool-call>", and "<invoke>" or modify the parser's tag-matching
logic to restrict it to "<tool_call>" only.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: b741908e-39de-4fc5-999f-dcb6c4df8d24

📥 Commits

Reviewing files that changed from the base of the PR and between 7fd4568 and 838096e.

📒 Files selected for processing (3)
  • src/openhuman/agent/loop_/context_guard.rs
  • src/openhuman/agent/loop_/instructions.rs
  • src/openhuman/agent/loop_/tool_loop.rs

Comment on lines +288 to +329
// Execute with a 120-second timeout to prevent hangs.
match tokio::time::timeout(
std::time::Duration::from_secs(120),
tool.execute(call.arguments.clone()),
)
.await
{
Ok(Ok(r)) => {
if r.success {
tracing::debug!(
iteration,
tool = call.name.as_str(),
output_len = r.output.len(),
"[agent_loop] tool succeeded"
);
scrub_credentials(&r.output)
} else {
format!("Error: {}", r.error.unwrap_or(r.output))
let err_msg = r.error.unwrap_or(r.output);
tracing::warn!(
iteration,
tool = call.name.as_str(),
"[agent_loop] tool returned error: {err_msg}"
);
format!("Error: {err_msg}")
}
}
Err(e) => {
Ok(Err(e)) => {
tracing::error!(
iteration,
tool = call.name.as_str(),
"[agent_loop] tool execution failed: {e}"
);
format!("Error executing {}: {e}", call.name)
}
Err(_) => {
tracing::error!(
iteration,
tool = call.name.as_str(),
"[agent_loop] tool execution timed out after 120s"
);
format!("Error: tool '{}' timed out after 120 seconds", call.name)
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Timeout value conflicts with delegate tool's internal timeout.

The 120-second timeout matches exactly the DELEGATE_TIMEOUT_SECS in src/openhuman/tools/delegate.rs:12-13. When a delegated agent call hangs, both timeouts fire simultaneously, and the outer timeout may shadow the delegate's more specific error message. Additionally, the hardcoded 120 should be extracted to a named constant for maintainability.

Proposed fix: extract constant and adjust value
+/// Timeout for individual tool execution to prevent hangs.
+/// Set slightly above the longest internal tool timeout (delegate: 120s).
+const TOOL_EXECUTION_TIMEOUT_SECS: u64 = 125;
+
 // ... in the tool execution block:
-                match tokio::time::timeout(
-                    std::time::Duration::from_secs(120),
-                    tool.execute(call.arguments.clone()),
-                )
+                match tokio::time::timeout(
+                    std::time::Duration::from_secs(TOOL_EXECUTION_TIMEOUT_SECS),
+                    tool.execute(call.arguments.clone()),
+                )

Note: The timeout does not propagate a cancellation signal to the tool—tools continue running after the outer timeout fires. This is acceptable given the Tool trait lacks a cancellation mechanism, but be aware that timed-out operations may complete in the background.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// Execute with a 120-second timeout to prevent hangs.
match tokio::time::timeout(
std::time::Duration::from_secs(120),
tool.execute(call.arguments.clone()),
)
.await
{
Ok(Ok(r)) => {
if r.success {
tracing::debug!(
iteration,
tool = call.name.as_str(),
output_len = r.output.len(),
"[agent_loop] tool succeeded"
);
scrub_credentials(&r.output)
} else {
format!("Error: {}", r.error.unwrap_or(r.output))
let err_msg = r.error.unwrap_or(r.output);
tracing::warn!(
iteration,
tool = call.name.as_str(),
"[agent_loop] tool returned error: {err_msg}"
);
format!("Error: {err_msg}")
}
}
Err(e) => {
Ok(Err(e)) => {
tracing::error!(
iteration,
tool = call.name.as_str(),
"[agent_loop] tool execution failed: {e}"
);
format!("Error executing {}: {e}", call.name)
}
Err(_) => {
tracing::error!(
iteration,
tool = call.name.as_str(),
"[agent_loop] tool execution timed out after 120s"
);
format!("Error: tool '{}' timed out after 120 seconds", call.name)
}
/// Timeout for individual tool execution to prevent hangs.
/// Set slightly above the longest internal tool timeout (delegate: 120s).
const TOOL_EXECUTION_TIMEOUT_SECS: u64 = 125;
// Execute with a 120-second timeout to prevent hangs.
match tokio::time::timeout(
std::time::Duration::from_secs(TOOL_EXECUTION_TIMEOUT_SECS),
tool.execute(call.arguments.clone()),
)
.await
{
Ok(Ok(r)) => {
if r.success {
tracing::debug!(
iteration,
tool = call.name.as_str(),
output_len = r.output.len(),
"[agent_loop] tool succeeded"
);
scrub_credentials(&r.output)
} else {
let err_msg = r.error.unwrap_or(r.output);
tracing::warn!(
iteration,
tool = call.name.as_str(),
"[agent_loop] tool returned error: {err_msg}"
);
format!("Error: {err_msg}")
}
}
Ok(Err(e)) => {
tracing::error!(
iteration,
tool = call.name.as_str(),
"[agent_loop] tool execution failed: {e}"
);
format!("Error executing {}: {e}", call.name)
}
Err(_) => {
tracing::error!(
iteration,
tool = call.name.as_str(),
"[agent_loop] tool execution timed out after 120s"
);
format!("Error: tool '{}' timed out after 120 seconds", call.name)
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/openhuman/agent/loop_/tool_loop.rs` around lines 288 - 329, Extract the
hardcoded 120s into a named constant (e.g., TOOL_EXEC_TIMEOUT_SECS) and use it
in the tokio::time::timeout call wrapping tool.execute(call.arguments.clone());
ensure the constant value is changed so it does not equal DELEGATE_TIMEOUT_SECS
(from delegate.rs) to avoid simultaneous firing (pick a slightly shorter or
longer value and document the choice in a comment), and update any related log
messages to reference the new constant name rather than the literal 120.

…tool call parser

Root cause: ReliableProvider did not implement the chat(ChatRequest)
trait method. The agent loop called provider.chat() which fell through
to the default trait implementation — this used chat_with_history()
which strips native tool support and sends raw tool-role messages
without the required assistant tool_calls, causing the backend Jinja
template to reject the request with "Message has tool role, but there
was no previous assistant message with a tool call!"

Fixes:
- Add chat(ChatRequest) to ReliableProvider with full retry/failover
  logic, matching the existing chat_with_system/chat_with_history
  implementations. Delegates to inner provider's chat() which properly
  converts messages to native OpenAI format with tool_calls.
- Add [TOOL_CALL]/[/TOOL_CALL] bracket format to the tool call parser
  (parse.rs) — some models emit this format instead of <tool_call> XML.
- Add parse_bracket_tool_call() for the pseudo-syntax format:
  {tool => "name", args => { --key "value" }}

Verified with real staging backend (agentic-v1 model):
- Shell tool calls execute successfully
- File read tool calls return real content
- Knowledge questions return without tool calls
- No Jinja template errors

Closes tinyhumansai#144
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
src/openhuman/providers/reliable.rs (1)

522-630: Implementation is correct but duplicates retry/failover logic.

The new chat method correctly implements the same retry/failover pattern as chat_with_system, chat_with_history, and chat_with_tools. The error classification, backoff computation, key rotation, and failure aggregation are all consistent.

However, this introduces a fourth copy of nearly identical retry/failover logic (~100 lines). Consider extracting a generic helper in a future refactor to reduce maintenance burden:

async fn with_retry<F, T, Fut>(&self, model: &str, operation: F) -> anyhow::Result<T>
where
    F: Fn(&dyn Provider, &str) -> Fut,
    Fut: std::future::Future<Output = anyhow::Result<T>>,

This is not blocking for this PR since it matches the existing code style, but worth considering for maintainability.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/openhuman/providers/reliable.rs` around lines 522 - 630, The chat method
duplicates the same retry/failover loop used in chat_with_system,
chat_with_history, and chat_with_tools; extract that logic into a shared helper
to avoid repetition. Implement an async helper like with_retry(&self, model:
&str, operation: F) -> anyhow::Result<T> (where F: Fn(&dyn Provider, &str) ->
Fut and Fut: Future<Output = anyhow::Result<T>>) that contains the nested loops
over model_chain, providers, attempts, backoff computation (compute_backoff),
key rotation (rotate_key), failure aggregation (push_failure/failures),
non-retryable/rate-limit handling (is_non_retryable, is_rate_limited,
is_non_retryable_rate_limit, is_context_window_exceeded), and tracing; then have
chat, chat_with_system, chat_with_history, and chat_with_tools call with_retry
passing a closure that invokes provider.chat (or the appropriate provider
method) so each method only constructs the provider request and delegates
retries to the shared helper.
src/openhuman/agent/loop_/parse.rs (1)

85-120: Bracket syntax is undocumented—models are not instructed to use it.

The system instructions in instructions.rs (lines 6-40 per context snippet 3) explicitly mandate <tool_call> XML tags with JSON format. There's no mention of the bracket syntax {tool => "name", args => { --key "value" }}.

This fallback parser handles non-standard model outputs gracefully, which is good for resilience. However, consider:

  1. Adding a comment clarifying this is a compatibility fallback for models that don't follow instructions
  2. The bracket syntax only produces string values (line 103), which works for tools expecting strings but will fail for tools expecting numbers, booleans, or nested objects
📝 Suggested documentation comment
 /// Parse bracket-style pseudo-syntax emitted by some models:
 /// `{tool => "shell", args => { --command "ls" }}`
 ///
 /// Extracts tool name and converts `--key "value"` args into a JSON object.
+///
+/// NOTE: This is a compatibility fallback for models that emit non-standard syntax
+/// despite being instructed to use `<tool_call>` XML tags with JSON. All argument
+/// values are extracted as strings; tools requiring typed parameters (numbers,
+/// booleans, objects) should use the primary JSON format.
 fn parse_bracket_tool_call(inner: &str) -> Option<ParsedToolCall> {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/openhuman/agent/loop_/parse.rs` around lines 85 - 120, Add a short doc
comment above parse_bracket_tool_call that this is a compatibility fallback for
non-conforming model outputs, and change the argument parsing so values are not
always treated as strings: when extracting values in parse_bracket_tool_call
attempt to interpret each captured value as a JSON scalar/object by trying
serde_json::from_str (or a small helper that tries parsing as
number/bool/null/object/array) and on parse failure fallback to a String; update
the ARG_RE capture logic if needed to allow unquoted primitives or nested braces
and ensure ParsedToolCall.arguments stores the parsed serde_json::Value (not
always String) so tools that expect numbers/booleans/nested objects receive
proper types.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/openhuman/agent/loop_/parse.rs`:
- Around line 89-104: The loop in parse_bracket_tool_call currently uses the ?
operator on arg_cap.get(1) and arg_cap.get(2) inside the for over
ARG_RE.captures_iter, which causes the whole function to return None on any
malformed argument; change the loop to explicitly check those captures (e.g.,
with if let Some(k) = arg_cap.get(1) { ... } else { continue } and similarly for
get(2)) and skip malformed pairs instead of returning, so that only the bad
argument is ignored and the rest are inserted into the serde_json::Map before
the function returns the ParsedToolCall.

In `@src/openhuman/providers/reliable.rs`:
- Around line 522-630: Add unit tests for the new chat method mirroring
scenarios covered by chat_with_system and chat_with_history (successful provider
response, retries, rate-limit key rotation, non-retryable errors, and
context-window exceeded) and extend MockProvider to implement chat() so tests
can drive provider behavior directly; update logging statements inside chat (the
tracing::info! and tracing::warn! calls around provider recovery, rate-limit key
rotation, retries, exhausted retries, and non-retryable errors) to use the
project's preferred levels (debug or trace) to comply with coding guidelines,
and ensure helper methods referenced (rotate_key, compute_backoff,
is_context_window_exceeded) are exercised by the tests.

---

Nitpick comments:
In `@src/openhuman/agent/loop_/parse.rs`:
- Around line 85-120: Add a short doc comment above parse_bracket_tool_call that
this is a compatibility fallback for non-conforming model outputs, and change
the argument parsing so values are not always treated as strings: when
extracting values in parse_bracket_tool_call attempt to interpret each captured
value as a JSON scalar/object by trying serde_json::from_str (or a small helper
that tries parsing as number/bool/null/object/array) and on parse failure
fallback to a String; update the ARG_RE capture logic if needed to allow
unquoted primitives or nested braces and ensure ParsedToolCall.arguments stores
the parsed serde_json::Value (not always String) so tools that expect
numbers/booleans/nested objects receive proper types.

In `@src/openhuman/providers/reliable.rs`:
- Around line 522-630: The chat method duplicates the same retry/failover loop
used in chat_with_system, chat_with_history, and chat_with_tools; extract that
logic into a shared helper to avoid repetition. Implement an async helper like
with_retry(&self, model: &str, operation: F) -> anyhow::Result<T> (where F:
Fn(&dyn Provider, &str) -> Fut and Fut: Future<Output = anyhow::Result<T>>) that
contains the nested loops over model_chain, providers, attempts, backoff
computation (compute_backoff), key rotation (rotate_key), failure aggregation
(push_failure/failures), non-retryable/rate-limit handling (is_non_retryable,
is_rate_limited, is_non_retryable_rate_limit, is_context_window_exceeded), and
tracing; then have chat, chat_with_system, chat_with_history, and
chat_with_tools call with_retry passing a closure that invokes provider.chat (or
the appropriate provider method) so each method only constructs the provider
request and delegates retries to the shared helper.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 63f5ea0b-5fc6-4481-8558-beae7f7b6d1e

📥 Commits

Reviewing files that changed from the base of the PR and between 838096e and caba8de.

📒 Files selected for processing (2)
  • src/openhuman/agent/loop_/parse.rs
  • src/openhuman/providers/reliable.rs

Comment on lines +89 to +104
fn parse_bracket_tool_call(inner: &str) -> Option<ParsedToolCall> {
static BRACKET_TOOL_RE: LazyLock<Regex> = LazyLock::new(|| {
Regex::new(r#"(?s)\{?\s*tool\s*=>\s*"([^"]+)"\s*,\s*args\s*=>\s*\{(.*?)\}\s*\}?"#).unwrap()
});
static ARG_RE: LazyLock<Regex> = LazyLock::new(|| Regex::new(r#"--(\w+)\s+"([^"]*)"#).unwrap());

let cap = BRACKET_TOOL_RE.captures(inner.trim())?;
let name = cap.get(1)?.as_str().to_string();
let args_body = cap.get(2)?.as_str();

let mut args = serde_json::Map::new();
for arg_cap in ARG_RE.captures_iter(args_body) {
let key = arg_cap.get(1)?.as_str().to_string();
let value = arg_cap.get(2)?.as_str().to_string();
args.insert(key, serde_json::Value::String(value));
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Early return on None from captures_iter inside loop can skip remaining arguments.

Line 101-102 uses ? operator inside the for loop, which causes the entire function to return None if any single argument capture group fails to extract. This means if one --key "value" pair is malformed, all previously parsed arguments are discarded.

This is likely unintentional—typically you'd want to skip the malformed argument and continue parsing the rest.

🔧 Proposed fix to continue on malformed arguments
     let mut args = serde_json::Map::new();
     for arg_cap in ARG_RE.captures_iter(args_body) {
-        let key = arg_cap.get(1)?.as_str().to_string();
-        let value = arg_cap.get(2)?.as_str().to_string();
-        args.insert(key, serde_json::Value::String(value));
+        if let (Some(key_match), Some(value_match)) = (arg_cap.get(1), arg_cap.get(2)) {
+            let key = key_match.as_str().to_string();
+            let value = value_match.as_str().to_string();
+            args.insert(key, serde_json::Value::String(value));
+        }
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/openhuman/agent/loop_/parse.rs` around lines 89 - 104, The loop in
parse_bracket_tool_call currently uses the ? operator on arg_cap.get(1) and
arg_cap.get(2) inside the for over ARG_RE.captures_iter, which causes the whole
function to return None on any malformed argument; change the loop to explicitly
check those captures (e.g., with if let Some(k) = arg_cap.get(1) { ... } else {
continue } and similarly for get(2)) and skip malformed pairs instead of
returning, so that only the bad argument is ignored and the rest are inserted
into the serde_json::Map before the function returns the ParsedToolCall.

Comment on lines +522 to +630
async fn chat(
&self,
request: ChatRequest<'_>,
model: &str,
temperature: f64,
) -> anyhow::Result<ChatResponse> {
let models = self.model_chain(model);
let mut failures = Vec::new();

for current_model in &models {
for (provider_name, provider) in &self.providers {
let mut backoff_ms = self.base_backoff_ms;

for attempt in 0..=self.max_retries {
let req = ChatRequest {
messages: request.messages,
tools: request.tools,
system_prompt_cache_boundary: request.system_prompt_cache_boundary,
};
match provider.chat(req, current_model, temperature).await {
Ok(resp) => {
if attempt > 0 || *current_model != model {
tracing::info!(
provider = provider_name,
model = *current_model,
attempt,
original_model = model,
"Provider recovered (failover/retry)"
);
}
return Ok(resp);
}
Err(e) => {
let non_retryable_rate_limit = is_non_retryable_rate_limit(&e);
let non_retryable = is_non_retryable(&e) || non_retryable_rate_limit;
let rate_limited = is_rate_limited(&e);
let failure_reason = failure_reason(rate_limited, non_retryable);
let error_detail = compact_error_detail(&e);

push_failure(
&mut failures,
provider_name,
current_model,
attempt + 1,
self.max_retries + 1,
failure_reason,
&error_detail,
);

if rate_limited && !non_retryable_rate_limit {
if let Some(new_key) = self.rotate_key() {
tracing::info!(
provider = provider_name,
error = %error_detail,
"Rate limited, rotated API key (key ending ...{})",
&new_key[new_key.len().saturating_sub(4)..]
);
}
}

if non_retryable {
tracing::warn!(
provider = provider_name,
model = *current_model,
error = %error_detail,
"Non-retryable error, moving on"
);

if is_context_window_exceeded(&e) {
anyhow::bail!(
"Request exceeds model context window; retries and fallbacks were skipped. Attempts:\n{}",
failures.join("\n")
);
}

break;
}

if attempt < self.max_retries {
let wait = self.compute_backoff(backoff_ms, &e);
tracing::warn!(
provider = provider_name,
model = *current_model,
attempt = attempt + 1,
backoff_ms = wait,
reason = failure_reason,
error = %error_detail,
"Provider call failed, retrying"
);
tokio::time::sleep(Duration::from_millis(wait)).await;
backoff_ms = (backoff_ms.saturating_mul(2)).min(10_000);
}
}
}
}

tracing::warn!(
provider = provider_name,
model = *current_model,
"Exhausted retries, trying next provider/model"
);
}
}

anyhow::bail!(
"All providers/models failed. Attempts:\n{}",
failures.join("\n")
)
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check if MockProvider or ModelAwareMock implement chat method
rg -n -A5 'impl Provider for (Mock|ModelAware)' src/openhuman/providers/reliable.rs

Repository: tinyhumansai/openhuman

Length of output: 491


🏁 Script executed:

#!/bin/bash
# Check the full impl Provider for MockProvider block
rg -n -A50 'impl Provider for MockProvider' src/openhuman/providers/reliable.rs | head -70

Repository: tinyhumansai/openhuman

Length of output: 1969


🏁 Script executed:

#!/bin/bash
# Search for any chat method tests (not chat_with_system or chat_with_history)
rg -n 'fn test.*chat\(' src/openhuman/providers/reliable.rs | grep -v 'chat_with_system\|chat_with_history'

Repository: tinyhumansai/openhuman

Length of output: 48


🏁 Script executed:

#!/bin/bash
# Check the exact lines 522-630 to inspect logging
sed -n '522,630p' src/openhuman/providers/reliable.rs

Repository: tinyhumansai/openhuman

Length of output: 5057


Consider adding tests for the new chat method, and adjust logging levels to match coding guidelines.

The new chat method has no dedicated tests. While its implementation mirrors the well-tested chat_with_system and chat_with_history methods, explicit test coverage would help catch regressions.

The existing MockProvider doesn't implement chat(), which defaults to the trait's fallback. Consider extending MockProvider to support testing chat() directly in a follow-up.

Additionally, the logging in this method uses tracing::info!() and tracing::warn!() levels. Per coding guidelines, new/changed flows should use debug or trace level for Rust files. Consider adjusting log levels accordingly to align with project standards.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/openhuman/providers/reliable.rs` around lines 522 - 630, Add unit tests
for the new chat method mirroring scenarios covered by chat_with_system and
chat_with_history (successful provider response, retries, rate-limit key
rotation, non-retryable errors, and context-window exceeded) and extend
MockProvider to implement chat() so tests can drive provider behavior directly;
update logging statements inside chat (the tracing::info! and tracing::warn!
calls around provider recovery, rate-limit key rotation, retries, exhausted
retries, and non-retryable errors) to use the project's preferred levels (debug
or trace) to comply with coding guidelines, and ensure helper methods referenced
(rotate_key, compute_backoff, is_context_window_exceeded) are exercised by the
tests.

@graycyrus graycyrus merged commit 305c58a into tinyhumansai:main Apr 1, 2026
8 checks passed
sanil-23 added a commit to sanil-23/openhuman that referenced this pull request Apr 1, 2026
…i#156

The actual fix in tinyhumansai#156 was adding chat(ChatRequest) to
ReliableProvider. The prompt changes in instructions.rs and the
bracket tool call parser in parse.rs were added during investigation
but are not needed — the model uses native tool calls when
ReliableProvider properly delegates to the inner provider.
graycyrus pushed a commit that referenced this pull request Apr 1, 2026
The actual fix in #156 was adding chat(ChatRequest) to
ReliableProvider. The prompt changes in instructions.rs and the
bracket tool call parser in parse.rs were added during investigation
but are not needed — the model uses native tool calls when
ReliableProvider properly delegates to the inner provider.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants