Skip to content

feat: rewrite prompt templates, add prompt caching and tool-loop trimming#188

Merged
jexShain merged 2 commits into
AI-Shell-Team:rustfrom
jexShain:rust-prompt-caching
May 15, 2026
Merged

feat: rewrite prompt templates, add prompt caching and tool-loop trimming#188
jexShain merged 2 commits into
AI-Shell-Team:rustfrom
jexShain:rust-prompt-caching

Conversation

@jexShain
Copy link
Copy Markdown
Collaborator

@jexShain jexShain commented May 15, 2026

Summary

  • Overhaul oracle/cmd_error/error_detect prompts for better AI interaction
  • Add render_static_core/render_env_block for cache-friendly prompt splitting
  • Add inject_knowledge_stable for idempotent knowledge injection in context manager
  • Add Anthropic prompt caching (CacheControl) on system messages
  • Improve Langfuse observability: session-level trace, per-iteration spans
  • Add trim_tool_loop_messages to prevent unbounded context growth during agent loops
  • New guess_command prompt template

Test plan

  • Verify existing unit tests pass (cargo test in affected crates)
  • Verify prompt templates render correctly
  • Verify Langfuse traces group correctly per session
  • Verify tool-loop trimming works for long agent sessions

Summary by CodeRabbit

  • New Features

    • Added prompt caching support for improved LLM response performance
    • Enhanced long-term memory recall with source attribution formatting
  • Improvements

    • Improved working directory tracking during command execution
    • Better handling of whitespace-only AI responses
    • Enhanced command result context injection and formatting
    • Refined skill presentation in AI context
    • More stable AI session management across multiple turns

Review Change Stack

…ming

- Overhaul oracle/cmd_error/error_detect prompts for better AI interaction
- Add render_static_core/render_env_block for cache-friendly prompt splitting
- Add inject_knowledge_stable for idempotent knowledge injection
- Add Anthropic prompt caching (CacheControl) on system messages
- Improve Langfuse observability: session-level trace, per-iteration spans
- Add trim_tool_loop_messages to prevent unbounded context growth
- New guess_command prompt template
@github-actions
Copy link
Copy Markdown
Contributor

Thanks for the pull request. A maintainer will review it when available.

Please keep the PR focused, explain the why in the description, and make sure local checks pass before requesting review.

Contribution guide: https://github.com/AI-Shell-Team/aish/blob/main/CONTRIBUTING.md

@github-actions
Copy link
Copy Markdown
Contributor

This pull request description looks incomplete. Please update the missing sections below before review.

Missing items:

  • User-visible Changes
  • Compatibility
  • Testing
  • Change Type
  • Scope

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 15, 2026

Warning

Rate limit exceeded

@jexShain has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 45 minutes and 32 seconds before requesting another review.

You’ve run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 4ac8fcfa-6ef7-45de-b8f4-2b907f6def63

📥 Commits

Reviewing files that changed from the base of the PR and between 890ba24 and f8fa76e.

📒 Files selected for processing (4)
  • crates/aish-llm/src/session.rs
  • crates/aish-prompts/src/manager.rs
  • crates/aish-shell/src/ai_handler.rs
  • crates/aish-shell/src/app.rs
📝 Walkthrough

Walkthrough

This PR enhances the LLM interaction loop with stable prompt caching and improved session-level observability. It introduces idempotent knowledge injection, Anthropic cache-control headers, session-scoped Langfuse traces, context-growth limits for tool calls, split prompt rendering to preserve cache boundaries, and system-wide integration of CWD tracking and deterministic system info.

Changes

Stable Prompt Caching & Langfuse Session Tracing

Layer / File(s) Summary
Stable Knowledge Injection Foundation
crates/aish-context/src/manager.rs
ContextManager::inject_knowledge_stable detects content changes via knowledge_cache and idempotently updates or removes knowledge messages only when needed; new unit tests cover insertion, no-ops, replacement, and clearing scenarios.
Prompt Caching Infrastructure
crates/aish-llm/src/types.rs, crates/aish-llm/tests/llm_integration_test.rs
New CacheControl type and optional cache_control field on ChatMessage enable Anthropic prompt caching; field is omitted from JSON when unset and includes ephemeral markers when set.
Langfuse Session-Level Tracing and Span Naming
crates/aish-llm/src/langfuse.rs, crates/aish-llm/src/session.rs
Session-level trace ID created once per session and reused across turns; monotonic per-turn counter used for span naming; span_generation accepts name: &str parameter; system messages receive ephemeral cache-control on Anthropic models.
Context Management and Tool-Call Trimming
crates/aish-llm/src/session.rs
New trim_tool_loop_messages helper removes old tool-call rounds while preserving stable prefix and leading system messages; trims both non-streaming and streaming response paths to prevent unbounded message growth.
Prompt Template Refactoring and Split Rendering
crates/aish-prompts/src/manager.rs
New render_static_core and render_env_block methods split prompt rendering into session-static and per-call components; embedded prompts rewritten with new variable placeholders ({{uname_info}}, {{basic_env_info}}, etc.) for cache stability.
AI Handler: Knowledge Injection and System Info Helpers
crates/aish-shell/src/ai_handler.rs
Splits system prompt into static core and env block for stable caching; integrates stable knowledge injection for memory recall and skills; adds cached system info helpers (uname_info, basic_env_info, output_language) using OnceLock.
App-Level Integration: CWD Tracking and Response Handling
crates/aish-shell/src/app.rs
Whitespace-only responses treated as empty; builtin command results immediately injected into context; external command CWD changes tracked and synchronized; SSH prompt variables updated to use new system info and role helpers.

🎯 4 (Complex) | ⏱️ ~75 minutes

Possibly Related PRs

  • AI-Shell-Team/aish#131: Both PRs refactor Langfuse integration in aish-llm; #131 introduces langfuse-ergonomic while this PR adds session-level trace IDs and span naming on top of that foundation.

Suggested Labels

size: XL, experienced-contributor

Poem

🐰 Hop through caches with stable core,
Sessions traced from shore to shore,
Prompts split neat—static, then live,
Context trimmed, tools forgive,
Memory recalled, skills in view,
One cache to bind them all—how true!

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the three main changes: prompt template rewrite, prompt caching support, and tool-loop message trimming—all of which are substantial components of this changeset.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
crates/aish-llm/src/session.rs (1)

1747-1756: ⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Potential slice panic when preserved-recent overlaps system prefix

At Line 1756, messages[system_count..recent_start] can panic when recent_start < system_count (possible with many leading system messages and large preserve_recent).

Proposed fix
-    let recent_start = messages.len().saturating_sub(preserve_recent);
+    let recent_start = messages
+        .len()
+        .saturating_sub(preserve_recent)
+        .max(system_count);
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/aish-llm/src/session.rs` around lines 1747 - 1756, The slice
messages[system_count..recent_start] can panic when recent_start < system_count;
fix by clamping the middle slice bounds before taking it — compute a safe
middle_start = system_count.min(recent_start) (or check if recent_start <=
system_count and set middle to empty) and then build middle from
messages[middle_start..recent_start]. Update any logic that relies on middle
(the variable named middle and the surrounding token calculations using
system_count, recent_start, preserve_recent, estimate_tokens, and max_tokens) so
you never create an out-of-bounds slice.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@crates/aish-context/src/manager.rs`:
- Around line 875-933: The new tests for ContextManager (functions
inject_knowledge_stable_first_call_adds,
inject_knowledge_stable_same_content_is_noop,
inject_knowledge_stable_different_content_updates,
inject_knowledge_stable_empty_content_clears,
inject_knowledge_stable_empty_twice_is_noop) are not formatted to project
standard; run rustfmt (cargo fmt --all) to reformat the test block and commit
the resulting changes so the CI cargo fmt --check passes, ensuring any minor
whitespace/indentation differences around the inject_knowledge_stable tests are
fixed.
- Around line 440-444: The retain call currently removes any
MemoryType::Knowledge entry whose content equals old (cached.clone()) regardless
of tag; change the predicate in self.messages.retain to also match the tag
(e.g., compare m.tag to cached.tag) so only the exact tagged knowledge entry is
deleted—locate the retain usage around self.messages.retain, the variables
cached/old, and the MemoryType::Knowledge/m.content checks and add an additional
check for m.tag == cached.tag (or equivalent) to avoid deleting unrelated
knowledge entries.

In `@crates/aish-llm/src/session.rs`:
- Around line 260-277: The failing formatting is due to unformatted Rust code in
the modified block (and other ranges); run rustfmt by executing `cargo fmt
--all` (or `rustfmt`/`cargo +stable fmt`) and commit the changes so the sections
referencing langfuse, langfuse_session_id, langfuse_turn_counter, and the
trace_session call are properly formatted; ensure locks/unwrapping lines around
self.langfuse_session_id.lock().unwrap(), the fetch_add on
self.langfuse_turn_counter, and the async trace_session invocation keep
idiomatic spacing and line breaks after running the formatter.
- Around line 264-272: You're holding the std::sync::MutexGuard from
langfuse_session_id across an .await inside the async path (involving
langfuse.trace_session), which blocks other tasks; fix by not awaiting while the
mutex is held: acquire the lock and check if session id is None, and if so
take/drop the guard (e.g., let needs_init = session_id_guard.is_none();
drop(session_id_guard); if needs_init { let id =
langfuse.trace_session(...).await; let mut session_id_guard =
self.langfuse_session_id.lock().unwrap(); if session_id_guard.is_none() {
*session_id_guard = Some(id); } } — this ensures trace_session is awaited
without holding the mutex and still sets langfuse_session_id (use the symbols
langfuse, trace_session, langfuse_session_id, session_id_guard).

In `@crates/aish-shell/src/ai_handler.rs`:
- Around line 623-629: The code is double-wrapping the recall text with the
long-term-memory tags causing a duplicate closing tag when the prior branch
already appended </long-term-memory>; update the construction of content passed
to self.context_manager.inject_knowledge_stable so it does not add the wrapper
if text already contains the closing tag (i.e. check the string or use the
already-wrapped variable), ensuring inject_knowledge_stable("memory_recall",
&content) receives a single well-formed <long-term-memory>...</long-term-memory>
block; locate the content variable and the call to
self.context_manager.inject_knowledge_stable in ai_handler.rs to implement the
conditional or avoid re-wrapping.

In `@crates/aish-shell/src/app.rs`:
- Around line 1472-1483: The code injects raw shell output and cwd into XML-like
tags (see variables builtin_output, self.state.cwd and the constructed entry
passed to self.ai_handler.add_shell_context), which allows insertion of fake
tags; fix by creating a shared escaping helper (e.g., escape_for_xml_like or
similar) that replaces &, <, and > with safe entities and use it when formatting
both <output> and <cwd> fields (also apply the same helper at the other site
noted around lines 1855-1862) so all serialized shell data is escaped before
calling add_shell_context.

---

Outside diff comments:
In `@crates/aish-llm/src/session.rs`:
- Around line 1747-1756: The slice messages[system_count..recent_start] can
panic when recent_start < system_count; fix by clamping the middle slice bounds
before taking it — compute a safe middle_start = system_count.min(recent_start)
(or check if recent_start <= system_count and set middle to empty) and then
build middle from messages[middle_start..recent_start]. Update any logic that
relies on middle (the variable named middle and the surrounding token
calculations using system_count, recent_start, preserve_recent, estimate_tokens,
and max_tokens) so you never create an out-of-bounds slice.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 08783f97-976c-4ffc-83af-365f5b335a37

📥 Commits

Reviewing files that changed from the base of the PR and between 7567317 and 890ba24.

📒 Files selected for processing (8)
  • crates/aish-context/src/manager.rs
  • crates/aish-llm/src/langfuse.rs
  • crates/aish-llm/src/session.rs
  • crates/aish-llm/src/types.rs
  • crates/aish-llm/tests/llm_integration_test.rs
  • crates/aish-prompts/src/manager.rs
  • crates/aish-shell/src/ai_handler.rs
  • crates/aish-shell/src/app.rs

Comment thread crates/aish-context/src/manager.rs
Comment thread crates/aish-context/src/manager.rs
Comment thread crates/aish-llm/src/session.rs
Comment thread crates/aish-llm/src/session.rs
Comment thread crates/aish-prompts/src/manager.rs
Comment thread crates/aish-shell/src/ai_handler.rs Outdated
Comment thread crates/aish-shell/src/app.rs
- Fix potential slice panic in trim_messages when recent_start < system_count
- Fix double-closing </long-term-memory> tag in recall text truncation
- Fix tool name mismatch: bash_exec/python_exec → bash in prompt templates
- Run cargo fmt to fix CI formatting failures
@jexShain jexShain merged commit df4808b into AI-Shell-Team:rust May 15, 2026
6 of 7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant