feat: rewrite prompt templates, add prompt caching and tool-loop trimming#188
Conversation
…ming - Overhaul oracle/cmd_error/error_detect prompts for better AI interaction - Add render_static_core/render_env_block for cache-friendly prompt splitting - Add inject_knowledge_stable for idempotent knowledge injection - Add Anthropic prompt caching (CacheControl) on system messages - Improve Langfuse observability: session-level trace, per-iteration spans - Add trim_tool_loop_messages to prevent unbounded context growth - New guess_command prompt template
|
Thanks for the pull request. A maintainer will review it when available. Please keep the PR focused, explain the why in the description, and make sure local checks pass before requesting review. Contribution guide: https://github.com/AI-Shell-Team/aish/blob/main/CONTRIBUTING.md |
|
This pull request description looks incomplete. Please update the missing sections below before review. Missing items:
|
|
Warning Rate limit exceeded
You’ve run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Repository UI Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (4)
📝 WalkthroughWalkthroughThis PR enhances the LLM interaction loop with stable prompt caching and improved session-level observability. It introduces idempotent knowledge injection, Anthropic cache-control headers, session-scoped Langfuse traces, context-growth limits for tool calls, split prompt rendering to preserve cache boundaries, and system-wide integration of CWD tracking and deterministic system info. ChangesStable Prompt Caching & Langfuse Session Tracing
🎯 4 (Complex) | ⏱️ ~75 minutes Possibly Related PRs
Suggested Labels
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 7
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
crates/aish-llm/src/session.rs (1)
1747-1756:⚠️ Potential issue | 🔴 Critical | ⚡ Quick winPotential slice panic when preserved-recent overlaps system prefix
At Line 1756,
messages[system_count..recent_start]can panic whenrecent_start < system_count(possible with many leading system messages and largepreserve_recent).Proposed fix
- let recent_start = messages.len().saturating_sub(preserve_recent); + let recent_start = messages + .len() + .saturating_sub(preserve_recent) + .max(system_count);🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/aish-llm/src/session.rs` around lines 1747 - 1756, The slice messages[system_count..recent_start] can panic when recent_start < system_count; fix by clamping the middle slice bounds before taking it — compute a safe middle_start = system_count.min(recent_start) (or check if recent_start <= system_count and set middle to empty) and then build middle from messages[middle_start..recent_start]. Update any logic that relies on middle (the variable named middle and the surrounding token calculations using system_count, recent_start, preserve_recent, estimate_tokens, and max_tokens) so you never create an out-of-bounds slice.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@crates/aish-context/src/manager.rs`:
- Around line 875-933: The new tests for ContextManager (functions
inject_knowledge_stable_first_call_adds,
inject_knowledge_stable_same_content_is_noop,
inject_knowledge_stable_different_content_updates,
inject_knowledge_stable_empty_content_clears,
inject_knowledge_stable_empty_twice_is_noop) are not formatted to project
standard; run rustfmt (cargo fmt --all) to reformat the test block and commit
the resulting changes so the CI cargo fmt --check passes, ensuring any minor
whitespace/indentation differences around the inject_knowledge_stable tests are
fixed.
- Around line 440-444: The retain call currently removes any
MemoryType::Knowledge entry whose content equals old (cached.clone()) regardless
of tag; change the predicate in self.messages.retain to also match the tag
(e.g., compare m.tag to cached.tag) so only the exact tagged knowledge entry is
deleted—locate the retain usage around self.messages.retain, the variables
cached/old, and the MemoryType::Knowledge/m.content checks and add an additional
check for m.tag == cached.tag (or equivalent) to avoid deleting unrelated
knowledge entries.
In `@crates/aish-llm/src/session.rs`:
- Around line 260-277: The failing formatting is due to unformatted Rust code in
the modified block (and other ranges); run rustfmt by executing `cargo fmt
--all` (or `rustfmt`/`cargo +stable fmt`) and commit the changes so the sections
referencing langfuse, langfuse_session_id, langfuse_turn_counter, and the
trace_session call are properly formatted; ensure locks/unwrapping lines around
self.langfuse_session_id.lock().unwrap(), the fetch_add on
self.langfuse_turn_counter, and the async trace_session invocation keep
idiomatic spacing and line breaks after running the formatter.
- Around line 264-272: You're holding the std::sync::MutexGuard from
langfuse_session_id across an .await inside the async path (involving
langfuse.trace_session), which blocks other tasks; fix by not awaiting while the
mutex is held: acquire the lock and check if session id is None, and if so
take/drop the guard (e.g., let needs_init = session_id_guard.is_none();
drop(session_id_guard); if needs_init { let id =
langfuse.trace_session(...).await; let mut session_id_guard =
self.langfuse_session_id.lock().unwrap(); if session_id_guard.is_none() {
*session_id_guard = Some(id); } } — this ensures trace_session is awaited
without holding the mutex and still sets langfuse_session_id (use the symbols
langfuse, trace_session, langfuse_session_id, session_id_guard).
In `@crates/aish-shell/src/ai_handler.rs`:
- Around line 623-629: The code is double-wrapping the recall text with the
long-term-memory tags causing a duplicate closing tag when the prior branch
already appended </long-term-memory>; update the construction of content passed
to self.context_manager.inject_knowledge_stable so it does not add the wrapper
if text already contains the closing tag (i.e. check the string or use the
already-wrapped variable), ensuring inject_knowledge_stable("memory_recall",
&content) receives a single well-formed <long-term-memory>...</long-term-memory>
block; locate the content variable and the call to
self.context_manager.inject_knowledge_stable in ai_handler.rs to implement the
conditional or avoid re-wrapping.
In `@crates/aish-shell/src/app.rs`:
- Around line 1472-1483: The code injects raw shell output and cwd into XML-like
tags (see variables builtin_output, self.state.cwd and the constructed entry
passed to self.ai_handler.add_shell_context), which allows insertion of fake
tags; fix by creating a shared escaping helper (e.g., escape_for_xml_like or
similar) that replaces &, <, and > with safe entities and use it when formatting
both <output> and <cwd> fields (also apply the same helper at the other site
noted around lines 1855-1862) so all serialized shell data is escaped before
calling add_shell_context.
---
Outside diff comments:
In `@crates/aish-llm/src/session.rs`:
- Around line 1747-1756: The slice messages[system_count..recent_start] can
panic when recent_start < system_count; fix by clamping the middle slice bounds
before taking it — compute a safe middle_start = system_count.min(recent_start)
(or check if recent_start <= system_count and set middle to empty) and then
build middle from messages[middle_start..recent_start]. Update any logic that
relies on middle (the variable named middle and the surrounding token
calculations using system_count, recent_start, preserve_recent, estimate_tokens,
and max_tokens) so you never create an out-of-bounds slice.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro Plus
Run ID: 08783f97-976c-4ffc-83af-365f5b335a37
📒 Files selected for processing (8)
crates/aish-context/src/manager.rscrates/aish-llm/src/langfuse.rscrates/aish-llm/src/session.rscrates/aish-llm/src/types.rscrates/aish-llm/tests/llm_integration_test.rscrates/aish-prompts/src/manager.rscrates/aish-shell/src/ai_handler.rscrates/aish-shell/src/app.rs
- Fix potential slice panic in trim_messages when recent_start < system_count - Fix double-closing </long-term-memory> tag in recall text truncation - Fix tool name mismatch: bash_exec/python_exec → bash in prompt templates - Run cargo fmt to fix CI formatting failures
Summary
render_static_core/render_env_blockfor cache-friendly prompt splittinginject_knowledge_stablefor idempotent knowledge injection in context managerCacheControl) on system messagestrim_tool_loop_messagesto prevent unbounded context growth during agent loopsguess_commandprompt templateTest plan
cargo testin affected crates)Summary by CodeRabbit
New Features
Improvements