fix: reduce agent loop hallucination and improve tool call reliability#156
Conversation
- Strengthen tool-use instructions with explicit anti-hallucination rules: "NEVER narrate tool use without emitting tags", "use exact tool names", "only respond without tool call when no tool is needed" - Wire context guard into tool loop: check utilization before each LLM call, abort on context exhaustion (>95% with circuit breaker tripped) - Add 120-second timeout on tool execution to prevent hangs - Add debug/warn/error logging at all loop boundaries: LLM request, response (with token counts), tool call parsing, tool execution, unknown tools, timeouts, and final response Closes tinyhumansai#144
📝 WalkthroughWalkthroughExposed a crate-internal compaction threshold, tightened and restructured tool-invocation prompt rules, added bracket-style parsing for alternate tool-call syntax, integrated context-usage checks and timeouts into the tool loop, and added a Changes
Sequence Diagram(s)sequenceDiagram
actor Agent
participant ContextGuard
participant LLM
participant Parser
participant ToolExecutor
participant Tool
Agent->>ContextGuard: check utilization
alt exhausted
ContextGuard-->>Agent: bail (context exhausted)
else compaction suggested
ContextGuard-->>Agent: warn (compaction threshold)
end
Agent->>LLM: send request (history + prompt)
LLM-->>Agent: response + usage?
Agent->>ContextGuard: update(token usage)
Agent->>Parser: parse response for tool calls (JSON or bracket)
alt tool call found
Parser-->>Agent: ParsedToolCall
Agent->>ToolExecutor: invoke tool (timeout 120s)
ToolExecutor->>Tool: execute(args)
alt timeout
Tool-->>ToolExecutor: Err(timeout)
ToolExecutor-->>Agent: log timeout, continue loop
else success/failure payload
Tool-->>ToolExecutor: Ok(result)
ToolExecutor-->>Agent: return result in <tool_result>
end
Agent->>Agent: continue tool loop or finalize
else no tool
Agent-->>Agent: return final response
end
Estimated Code Review Effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related issues
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
src/openhuman/agent/loop_/instructions.rs (1)
9-10: Mismatch between documented and accepted tag formats.Instructions document only
<tool_call>format (lines 9-10), but the parser at line 85 accepts four variants:<tool_call>,<toolcall>,<tool-call>, and<invoke>. Since the LLM will follow the documented format, this is functionally acceptable. Consider either documenting the parser's flexibility in the instructions or reducing the parser to support only the documented format.Also applies to: 15-16
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/openhuman/agent/loop_/instructions.rs` around lines 9 - 10, The docs in instructions.push_str currently only show the <tool_call> tag but the parser accepts four variants (<tool_call>, <toolcall>, <tool-call>, <invoke>); update the instructions text to list and show examples for all accepted variants (add the additional tag forms in the push_str strings) OR change the parser to only accept the documented <tool_call> form so they match; locate the instruction strings in instructions.rs (the instructions.push_str calls) and either expand the example to include "<toolcall>", "<tool-call>", and "<invoke>" or modify the parser's tag-matching logic to restrict it to "<tool_call>" only.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/openhuman/agent/loop_/tool_loop.rs`:
- Around line 288-329: Extract the hardcoded 120s into a named constant (e.g.,
TOOL_EXEC_TIMEOUT_SECS) and use it in the tokio::time::timeout call wrapping
tool.execute(call.arguments.clone()); ensure the constant value is changed so it
does not equal DELEGATE_TIMEOUT_SECS (from delegate.rs) to avoid simultaneous
firing (pick a slightly shorter or longer value and document the choice in a
comment), and update any related log messages to reference the new constant name
rather than the literal 120.
---
Nitpick comments:
In `@src/openhuman/agent/loop_/instructions.rs`:
- Around line 9-10: The docs in instructions.push_str currently only show the
<tool_call> tag but the parser accepts four variants (<tool_call>, <toolcall>,
<tool-call>, <invoke>); update the instructions text to list and show examples
for all accepted variants (add the additional tag forms in the push_str strings)
OR change the parser to only accept the documented <tool_call> form so they
match; locate the instruction strings in instructions.rs (the
instructions.push_str calls) and either expand the example to include
"<toolcall>", "<tool-call>", and "<invoke>" or modify the parser's tag-matching
logic to restrict it to "<tool_call>" only.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: b741908e-39de-4fc5-999f-dcb6c4df8d24
📒 Files selected for processing (3)
src/openhuman/agent/loop_/context_guard.rssrc/openhuman/agent/loop_/instructions.rssrc/openhuman/agent/loop_/tool_loop.rs
| // Execute with a 120-second timeout to prevent hangs. | ||
| match tokio::time::timeout( | ||
| std::time::Duration::from_secs(120), | ||
| tool.execute(call.arguments.clone()), | ||
| ) | ||
| .await | ||
| { | ||
| Ok(Ok(r)) => { | ||
| if r.success { | ||
| tracing::debug!( | ||
| iteration, | ||
| tool = call.name.as_str(), | ||
| output_len = r.output.len(), | ||
| "[agent_loop] tool succeeded" | ||
| ); | ||
| scrub_credentials(&r.output) | ||
| } else { | ||
| format!("Error: {}", r.error.unwrap_or(r.output)) | ||
| let err_msg = r.error.unwrap_or(r.output); | ||
| tracing::warn!( | ||
| iteration, | ||
| tool = call.name.as_str(), | ||
| "[agent_loop] tool returned error: {err_msg}" | ||
| ); | ||
| format!("Error: {err_msg}") | ||
| } | ||
| } | ||
| Err(e) => { | ||
| Ok(Err(e)) => { | ||
| tracing::error!( | ||
| iteration, | ||
| tool = call.name.as_str(), | ||
| "[agent_loop] tool execution failed: {e}" | ||
| ); | ||
| format!("Error executing {}: {e}", call.name) | ||
| } | ||
| Err(_) => { | ||
| tracing::error!( | ||
| iteration, | ||
| tool = call.name.as_str(), | ||
| "[agent_loop] tool execution timed out after 120s" | ||
| ); | ||
| format!("Error: tool '{}' timed out after 120 seconds", call.name) | ||
| } |
There was a problem hiding this comment.
Timeout value conflicts with delegate tool's internal timeout.
The 120-second timeout matches exactly the DELEGATE_TIMEOUT_SECS in src/openhuman/tools/delegate.rs:12-13. When a delegated agent call hangs, both timeouts fire simultaneously, and the outer timeout may shadow the delegate's more specific error message. Additionally, the hardcoded 120 should be extracted to a named constant for maintainability.
Proposed fix: extract constant and adjust value
+/// Timeout for individual tool execution to prevent hangs.
+/// Set slightly above the longest internal tool timeout (delegate: 120s).
+const TOOL_EXECUTION_TIMEOUT_SECS: u64 = 125;
+
// ... in the tool execution block:
- match tokio::time::timeout(
- std::time::Duration::from_secs(120),
- tool.execute(call.arguments.clone()),
- )
+ match tokio::time::timeout(
+ std::time::Duration::from_secs(TOOL_EXECUTION_TIMEOUT_SECS),
+ tool.execute(call.arguments.clone()),
+ )Note: The timeout does not propagate a cancellation signal to the tool—tools continue running after the outer timeout fires. This is acceptable given the Tool trait lacks a cancellation mechanism, but be aware that timed-out operations may complete in the background.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| // Execute with a 120-second timeout to prevent hangs. | |
| match tokio::time::timeout( | |
| std::time::Duration::from_secs(120), | |
| tool.execute(call.arguments.clone()), | |
| ) | |
| .await | |
| { | |
| Ok(Ok(r)) => { | |
| if r.success { | |
| tracing::debug!( | |
| iteration, | |
| tool = call.name.as_str(), | |
| output_len = r.output.len(), | |
| "[agent_loop] tool succeeded" | |
| ); | |
| scrub_credentials(&r.output) | |
| } else { | |
| format!("Error: {}", r.error.unwrap_or(r.output)) | |
| let err_msg = r.error.unwrap_or(r.output); | |
| tracing::warn!( | |
| iteration, | |
| tool = call.name.as_str(), | |
| "[agent_loop] tool returned error: {err_msg}" | |
| ); | |
| format!("Error: {err_msg}") | |
| } | |
| } | |
| Err(e) => { | |
| Ok(Err(e)) => { | |
| tracing::error!( | |
| iteration, | |
| tool = call.name.as_str(), | |
| "[agent_loop] tool execution failed: {e}" | |
| ); | |
| format!("Error executing {}: {e}", call.name) | |
| } | |
| Err(_) => { | |
| tracing::error!( | |
| iteration, | |
| tool = call.name.as_str(), | |
| "[agent_loop] tool execution timed out after 120s" | |
| ); | |
| format!("Error: tool '{}' timed out after 120 seconds", call.name) | |
| } | |
| /// Timeout for individual tool execution to prevent hangs. | |
| /// Set slightly above the longest internal tool timeout (delegate: 120s). | |
| const TOOL_EXECUTION_TIMEOUT_SECS: u64 = 125; | |
| // Execute with a 120-second timeout to prevent hangs. | |
| match tokio::time::timeout( | |
| std::time::Duration::from_secs(TOOL_EXECUTION_TIMEOUT_SECS), | |
| tool.execute(call.arguments.clone()), | |
| ) | |
| .await | |
| { | |
| Ok(Ok(r)) => { | |
| if r.success { | |
| tracing::debug!( | |
| iteration, | |
| tool = call.name.as_str(), | |
| output_len = r.output.len(), | |
| "[agent_loop] tool succeeded" | |
| ); | |
| scrub_credentials(&r.output) | |
| } else { | |
| let err_msg = r.error.unwrap_or(r.output); | |
| tracing::warn!( | |
| iteration, | |
| tool = call.name.as_str(), | |
| "[agent_loop] tool returned error: {err_msg}" | |
| ); | |
| format!("Error: {err_msg}") | |
| } | |
| } | |
| Ok(Err(e)) => { | |
| tracing::error!( | |
| iteration, | |
| tool = call.name.as_str(), | |
| "[agent_loop] tool execution failed: {e}" | |
| ); | |
| format!("Error executing {}: {e}", call.name) | |
| } | |
| Err(_) => { | |
| tracing::error!( | |
| iteration, | |
| tool = call.name.as_str(), | |
| "[agent_loop] tool execution timed out after 120s" | |
| ); | |
| format!("Error: tool '{}' timed out after 120 seconds", call.name) | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/openhuman/agent/loop_/tool_loop.rs` around lines 288 - 329, Extract the
hardcoded 120s into a named constant (e.g., TOOL_EXEC_TIMEOUT_SECS) and use it
in the tokio::time::timeout call wrapping tool.execute(call.arguments.clone());
ensure the constant value is changed so it does not equal DELEGATE_TIMEOUT_SECS
(from delegate.rs) to avoid simultaneous firing (pick a slightly shorter or
longer value and document the choice in a comment), and update any related log
messages to reference the new constant name rather than the literal 120.
…tool call parser
Root cause: ReliableProvider did not implement the chat(ChatRequest)
trait method. The agent loop called provider.chat() which fell through
to the default trait implementation — this used chat_with_history()
which strips native tool support and sends raw tool-role messages
without the required assistant tool_calls, causing the backend Jinja
template to reject the request with "Message has tool role, but there
was no previous assistant message with a tool call!"
Fixes:
- Add chat(ChatRequest) to ReliableProvider with full retry/failover
logic, matching the existing chat_with_system/chat_with_history
implementations. Delegates to inner provider's chat() which properly
converts messages to native OpenAI format with tool_calls.
- Add [TOOL_CALL]/[/TOOL_CALL] bracket format to the tool call parser
(parse.rs) — some models emit this format instead of <tool_call> XML.
- Add parse_bracket_tool_call() for the pseudo-syntax format:
{tool => "name", args => { --key "value" }}
Verified with real staging backend (agentic-v1 model):
- Shell tool calls execute successfully
- File read tool calls return real content
- Knowledge questions return without tool calls
- No Jinja template errors
Closes tinyhumansai#144
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (2)
src/openhuman/providers/reliable.rs (1)
522-630: Implementation is correct but duplicates retry/failover logic.The new
chatmethod correctly implements the same retry/failover pattern aschat_with_system,chat_with_history, andchat_with_tools. The error classification, backoff computation, key rotation, and failure aggregation are all consistent.However, this introduces a fourth copy of nearly identical retry/failover logic (~100 lines). Consider extracting a generic helper in a future refactor to reduce maintenance burden:
async fn with_retry<F, T, Fut>(&self, model: &str, operation: F) -> anyhow::Result<T> where F: Fn(&dyn Provider, &str) -> Fut, Fut: std::future::Future<Output = anyhow::Result<T>>,This is not blocking for this PR since it matches the existing code style, but worth considering for maintainability.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/openhuman/providers/reliable.rs` around lines 522 - 630, The chat method duplicates the same retry/failover loop used in chat_with_system, chat_with_history, and chat_with_tools; extract that logic into a shared helper to avoid repetition. Implement an async helper like with_retry(&self, model: &str, operation: F) -> anyhow::Result<T> (where F: Fn(&dyn Provider, &str) -> Fut and Fut: Future<Output = anyhow::Result<T>>) that contains the nested loops over model_chain, providers, attempts, backoff computation (compute_backoff), key rotation (rotate_key), failure aggregation (push_failure/failures), non-retryable/rate-limit handling (is_non_retryable, is_rate_limited, is_non_retryable_rate_limit, is_context_window_exceeded), and tracing; then have chat, chat_with_system, chat_with_history, and chat_with_tools call with_retry passing a closure that invokes provider.chat (or the appropriate provider method) so each method only constructs the provider request and delegates retries to the shared helper.src/openhuman/agent/loop_/parse.rs (1)
85-120: Bracket syntax is undocumented—models are not instructed to use it.The system instructions in
instructions.rs(lines 6-40 per context snippet 3) explicitly mandate<tool_call>XML tags with JSON format. There's no mention of the bracket syntax{tool => "name", args => { --key "value" }}.This fallback parser handles non-standard model outputs gracefully, which is good for resilience. However, consider:
- Adding a comment clarifying this is a compatibility fallback for models that don't follow instructions
- The bracket syntax only produces string values (line 103), which works for tools expecting strings but will fail for tools expecting numbers, booleans, or nested objects
📝 Suggested documentation comment
/// Parse bracket-style pseudo-syntax emitted by some models: /// `{tool => "shell", args => { --command "ls" }}` /// /// Extracts tool name and converts `--key "value"` args into a JSON object. +/// +/// NOTE: This is a compatibility fallback for models that emit non-standard syntax +/// despite being instructed to use `<tool_call>` XML tags with JSON. All argument +/// values are extracted as strings; tools requiring typed parameters (numbers, +/// booleans, objects) should use the primary JSON format. fn parse_bracket_tool_call(inner: &str) -> Option<ParsedToolCall> {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/openhuman/agent/loop_/parse.rs` around lines 85 - 120, Add a short doc comment above parse_bracket_tool_call that this is a compatibility fallback for non-conforming model outputs, and change the argument parsing so values are not always treated as strings: when extracting values in parse_bracket_tool_call attempt to interpret each captured value as a JSON scalar/object by trying serde_json::from_str (or a small helper that tries parsing as number/bool/null/object/array) and on parse failure fallback to a String; update the ARG_RE capture logic if needed to allow unquoted primitives or nested braces and ensure ParsedToolCall.arguments stores the parsed serde_json::Value (not always String) so tools that expect numbers/booleans/nested objects receive proper types.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/openhuman/agent/loop_/parse.rs`:
- Around line 89-104: The loop in parse_bracket_tool_call currently uses the ?
operator on arg_cap.get(1) and arg_cap.get(2) inside the for over
ARG_RE.captures_iter, which causes the whole function to return None on any
malformed argument; change the loop to explicitly check those captures (e.g.,
with if let Some(k) = arg_cap.get(1) { ... } else { continue } and similarly for
get(2)) and skip malformed pairs instead of returning, so that only the bad
argument is ignored and the rest are inserted into the serde_json::Map before
the function returns the ParsedToolCall.
In `@src/openhuman/providers/reliable.rs`:
- Around line 522-630: Add unit tests for the new chat method mirroring
scenarios covered by chat_with_system and chat_with_history (successful provider
response, retries, rate-limit key rotation, non-retryable errors, and
context-window exceeded) and extend MockProvider to implement chat() so tests
can drive provider behavior directly; update logging statements inside chat (the
tracing::info! and tracing::warn! calls around provider recovery, rate-limit key
rotation, retries, exhausted retries, and non-retryable errors) to use the
project's preferred levels (debug or trace) to comply with coding guidelines,
and ensure helper methods referenced (rotate_key, compute_backoff,
is_context_window_exceeded) are exercised by the tests.
---
Nitpick comments:
In `@src/openhuman/agent/loop_/parse.rs`:
- Around line 85-120: Add a short doc comment above parse_bracket_tool_call that
this is a compatibility fallback for non-conforming model outputs, and change
the argument parsing so values are not always treated as strings: when
extracting values in parse_bracket_tool_call attempt to interpret each captured
value as a JSON scalar/object by trying serde_json::from_str (or a small helper
that tries parsing as number/bool/null/object/array) and on parse failure
fallback to a String; update the ARG_RE capture logic if needed to allow
unquoted primitives or nested braces and ensure ParsedToolCall.arguments stores
the parsed serde_json::Value (not always String) so tools that expect
numbers/booleans/nested objects receive proper types.
In `@src/openhuman/providers/reliable.rs`:
- Around line 522-630: The chat method duplicates the same retry/failover loop
used in chat_with_system, chat_with_history, and chat_with_tools; extract that
logic into a shared helper to avoid repetition. Implement an async helper like
with_retry(&self, model: &str, operation: F) -> anyhow::Result<T> (where F:
Fn(&dyn Provider, &str) -> Fut and Fut: Future<Output = anyhow::Result<T>>) that
contains the nested loops over model_chain, providers, attempts, backoff
computation (compute_backoff), key rotation (rotate_key), failure aggregation
(push_failure/failures), non-retryable/rate-limit handling (is_non_retryable,
is_rate_limited, is_non_retryable_rate_limit, is_context_window_exceeded), and
tracing; then have chat, chat_with_system, chat_with_history, and
chat_with_tools call with_retry passing a closure that invokes provider.chat (or
the appropriate provider method) so each method only constructs the provider
request and delegates retries to the shared helper.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 63f5ea0b-5fc6-4481-8558-beae7f7b6d1e
📒 Files selected for processing (2)
src/openhuman/agent/loop_/parse.rssrc/openhuman/providers/reliable.rs
| fn parse_bracket_tool_call(inner: &str) -> Option<ParsedToolCall> { | ||
| static BRACKET_TOOL_RE: LazyLock<Regex> = LazyLock::new(|| { | ||
| Regex::new(r#"(?s)\{?\s*tool\s*=>\s*"([^"]+)"\s*,\s*args\s*=>\s*\{(.*?)\}\s*\}?"#).unwrap() | ||
| }); | ||
| static ARG_RE: LazyLock<Regex> = LazyLock::new(|| Regex::new(r#"--(\w+)\s+"([^"]*)"#).unwrap()); | ||
|
|
||
| let cap = BRACKET_TOOL_RE.captures(inner.trim())?; | ||
| let name = cap.get(1)?.as_str().to_string(); | ||
| let args_body = cap.get(2)?.as_str(); | ||
|
|
||
| let mut args = serde_json::Map::new(); | ||
| for arg_cap in ARG_RE.captures_iter(args_body) { | ||
| let key = arg_cap.get(1)?.as_str().to_string(); | ||
| let value = arg_cap.get(2)?.as_str().to_string(); | ||
| args.insert(key, serde_json::Value::String(value)); | ||
| } |
There was a problem hiding this comment.
Early return on None from captures_iter inside loop can skip remaining arguments.
Line 101-102 uses ? operator inside the for loop, which causes the entire function to return None if any single argument capture group fails to extract. This means if one --key "value" pair is malformed, all previously parsed arguments are discarded.
This is likely unintentional—typically you'd want to skip the malformed argument and continue parsing the rest.
🔧 Proposed fix to continue on malformed arguments
let mut args = serde_json::Map::new();
for arg_cap in ARG_RE.captures_iter(args_body) {
- let key = arg_cap.get(1)?.as_str().to_string();
- let value = arg_cap.get(2)?.as_str().to_string();
- args.insert(key, serde_json::Value::String(value));
+ if let (Some(key_match), Some(value_match)) = (arg_cap.get(1), arg_cap.get(2)) {
+ let key = key_match.as_str().to_string();
+ let value = value_match.as_str().to_string();
+ args.insert(key, serde_json::Value::String(value));
+ }
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/openhuman/agent/loop_/parse.rs` around lines 89 - 104, The loop in
parse_bracket_tool_call currently uses the ? operator on arg_cap.get(1) and
arg_cap.get(2) inside the for over ARG_RE.captures_iter, which causes the whole
function to return None on any malformed argument; change the loop to explicitly
check those captures (e.g., with if let Some(k) = arg_cap.get(1) { ... } else {
continue } and similarly for get(2)) and skip malformed pairs instead of
returning, so that only the bad argument is ignored and the rest are inserted
into the serde_json::Map before the function returns the ParsedToolCall.
| async fn chat( | ||
| &self, | ||
| request: ChatRequest<'_>, | ||
| model: &str, | ||
| temperature: f64, | ||
| ) -> anyhow::Result<ChatResponse> { | ||
| let models = self.model_chain(model); | ||
| let mut failures = Vec::new(); | ||
|
|
||
| for current_model in &models { | ||
| for (provider_name, provider) in &self.providers { | ||
| let mut backoff_ms = self.base_backoff_ms; | ||
|
|
||
| for attempt in 0..=self.max_retries { | ||
| let req = ChatRequest { | ||
| messages: request.messages, | ||
| tools: request.tools, | ||
| system_prompt_cache_boundary: request.system_prompt_cache_boundary, | ||
| }; | ||
| match provider.chat(req, current_model, temperature).await { | ||
| Ok(resp) => { | ||
| if attempt > 0 || *current_model != model { | ||
| tracing::info!( | ||
| provider = provider_name, | ||
| model = *current_model, | ||
| attempt, | ||
| original_model = model, | ||
| "Provider recovered (failover/retry)" | ||
| ); | ||
| } | ||
| return Ok(resp); | ||
| } | ||
| Err(e) => { | ||
| let non_retryable_rate_limit = is_non_retryable_rate_limit(&e); | ||
| let non_retryable = is_non_retryable(&e) || non_retryable_rate_limit; | ||
| let rate_limited = is_rate_limited(&e); | ||
| let failure_reason = failure_reason(rate_limited, non_retryable); | ||
| let error_detail = compact_error_detail(&e); | ||
|
|
||
| push_failure( | ||
| &mut failures, | ||
| provider_name, | ||
| current_model, | ||
| attempt + 1, | ||
| self.max_retries + 1, | ||
| failure_reason, | ||
| &error_detail, | ||
| ); | ||
|
|
||
| if rate_limited && !non_retryable_rate_limit { | ||
| if let Some(new_key) = self.rotate_key() { | ||
| tracing::info!( | ||
| provider = provider_name, | ||
| error = %error_detail, | ||
| "Rate limited, rotated API key (key ending ...{})", | ||
| &new_key[new_key.len().saturating_sub(4)..] | ||
| ); | ||
| } | ||
| } | ||
|
|
||
| if non_retryable { | ||
| tracing::warn!( | ||
| provider = provider_name, | ||
| model = *current_model, | ||
| error = %error_detail, | ||
| "Non-retryable error, moving on" | ||
| ); | ||
|
|
||
| if is_context_window_exceeded(&e) { | ||
| anyhow::bail!( | ||
| "Request exceeds model context window; retries and fallbacks were skipped. Attempts:\n{}", | ||
| failures.join("\n") | ||
| ); | ||
| } | ||
|
|
||
| break; | ||
| } | ||
|
|
||
| if attempt < self.max_retries { | ||
| let wait = self.compute_backoff(backoff_ms, &e); | ||
| tracing::warn!( | ||
| provider = provider_name, | ||
| model = *current_model, | ||
| attempt = attempt + 1, | ||
| backoff_ms = wait, | ||
| reason = failure_reason, | ||
| error = %error_detail, | ||
| "Provider call failed, retrying" | ||
| ); | ||
| tokio::time::sleep(Duration::from_millis(wait)).await; | ||
| backoff_ms = (backoff_ms.saturating_mul(2)).min(10_000); | ||
| } | ||
| } | ||
| } | ||
| } | ||
|
|
||
| tracing::warn!( | ||
| provider = provider_name, | ||
| model = *current_model, | ||
| "Exhausted retries, trying next provider/model" | ||
| ); | ||
| } | ||
| } | ||
|
|
||
| anyhow::bail!( | ||
| "All providers/models failed. Attempts:\n{}", | ||
| failures.join("\n") | ||
| ) | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check if MockProvider or ModelAwareMock implement chat method
rg -n -A5 'impl Provider for (Mock|ModelAware)' src/openhuman/providers/reliable.rsRepository: tinyhumansai/openhuman
Length of output: 491
🏁 Script executed:
#!/bin/bash
# Check the full impl Provider for MockProvider block
rg -n -A50 'impl Provider for MockProvider' src/openhuman/providers/reliable.rs | head -70Repository: tinyhumansai/openhuman
Length of output: 1969
🏁 Script executed:
#!/bin/bash
# Search for any chat method tests (not chat_with_system or chat_with_history)
rg -n 'fn test.*chat\(' src/openhuman/providers/reliable.rs | grep -v 'chat_with_system\|chat_with_history'Repository: tinyhumansai/openhuman
Length of output: 48
🏁 Script executed:
#!/bin/bash
# Check the exact lines 522-630 to inspect logging
sed -n '522,630p' src/openhuman/providers/reliable.rsRepository: tinyhumansai/openhuman
Length of output: 5057
Consider adding tests for the new chat method, and adjust logging levels to match coding guidelines.
The new chat method has no dedicated tests. While its implementation mirrors the well-tested chat_with_system and chat_with_history methods, explicit test coverage would help catch regressions.
The existing MockProvider doesn't implement chat(), which defaults to the trait's fallback. Consider extending MockProvider to support testing chat() directly in a follow-up.
Additionally, the logging in this method uses tracing::info!() and tracing::warn!() levels. Per coding guidelines, new/changed flows should use debug or trace level for Rust files. Consider adjusting log levels accordingly to align with project standards.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/openhuman/providers/reliable.rs` around lines 522 - 630, Add unit tests
for the new chat method mirroring scenarios covered by chat_with_system and
chat_with_history (successful provider response, retries, rate-limit key
rotation, non-retryable errors, and context-window exceeded) and extend
MockProvider to implement chat() so tests can drive provider behavior directly;
update logging statements inside chat (the tracing::info! and tracing::warn!
calls around provider recovery, rate-limit key rotation, retries, exhausted
retries, and non-retryable errors) to use the project's preferred levels (debug
or trace) to comply with coding guidelines, and ensure helper methods referenced
(rotate_key, compute_backoff, is_context_window_exceeded) are exercised by the
tests.
…i#156 The actual fix in tinyhumansai#156 was adding chat(ChatRequest) to ReliableProvider. The prompt changes in instructions.rs and the bracket tool call parser in parse.rs were added during investigation but are not needed — the model uses native tool calls when ReliableProvider properly delegates to the inner provider.
The actual fix in #156 was adding chat(ChatRequest) to ReliableProvider. The prompt changes in instructions.rs and the bracket tool call parser in parse.rs were added during investigation but are not needed — the model uses native tool calls when ReliableProvider properly delegates to the inner provider.
Summary
Problem
<tool_call>tags — loop exits with no executioncontext_guard.rs) but was not wired intorun_tool_call_loop()— context could silently overflowSolution
instructions.rs: Replace single "CRITICAL" line with 6 explicit rules: always use tags, never narrate without calling, use exact names, multiple calls ok, continue after results, only skip tools for knowledge-only answerstool_loop.rs:context_guard.check()before each LLM request; abort onContextExhaustedUsageInfofrom eachChatResponsetool.execute()intokio::time::timeout(120s)tracing::debug/warn/errorat: loop start, LLM response (with token counts), tool call parsing, each tool execution, unknown tools, timeouts, final responsecontext_guard.rs: MakeCOMPACTION_TRIGGER_THRESHOLDpub(crate)for log messagesSubmission Checklist
cargo check+cargo fmtpass; existing context_guard tests unaffected── Context guard ──,── Approval hook ──)Impact
[agent_loop]prefixed logs enable grep-based diagnosis across sidecar outputRelated
Summary by CodeRabbit
Bug Fixes & Improvements
New Features