Skip to content

[Refactor] improve context compression budgeting#874

Open
dingyi222666 wants to merge 8 commits into
v1-devfrom
feat/context-compression-refactor
Open

[Refactor] improve context compression budgeting#874
dingyi222666 wants to merge 8 commits into
v1-devfrom
feat/context-compression-refactor

Conversation

@dingyi222666
Copy link
Copy Markdown
Member

@dingyi222666 dingyi222666 commented May 23, 2026

This pr refactors ChatLuna context compression to better account for chat history, tool messages, and agent scratchpad tokens.

New Features

  • Add smarter context compression flow for chat history token budgeting.
  • Support agent scratchpad compression in the legacy executor path.
  • Use actual usage_metadata.input_tokens as the scratchpad compression trigger baseline.

Bug fixes

  • Count AI and tool messages in the same round when deriving baseline token usage.
  • Adjust scratchpad compression threshold so compression triggers closer to the configured token budget.
  • Clean up stale token counter and unused prompt imports found by lint.

Other Changes

  • Streamline context compression formatting and type annotations.
  • Apply review-feedback formatting and warning text cleanup.
  • Validation: yarn lint-fix completed with no errors. Existing max-len warnings remain in read_chat_message.ts.

…rt token counting

- Rewrite infinite_context.ts: class -> function, structured output (summary + recent messages)
- Rewrite infinite_context_chain.ts: class -> simple compressChunk function
- Add scratchpad compression in agent loop (legacy-executor.ts)
- Extract shared countMessageTokens/countMessagesTokens to utils/count_tokens.ts
  with usage_metadata baseline optimization
- Update chat_history.ts and model.ts cropMessages to use baseline optimization
- Fix multimodal warning: 'chatluna-multimodal-service' -> 'multimodal-service'
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 23, 2026

Review Change Stack

Warning

Review limit reached

@dingyi222666, we couldn't start this review because you've used your available PR reviews for now.

Your plan currently allows 2 reviews/hour. Refill in 13 seconds.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more review capacity refills, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than trial, open-source, and free plans. In all cases, review capacity refills continuously over time.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: ee85ccbb-1e33-4552-8c96-50fa8299c117

📥 Commits

Reviewing files that changed from the base of the PR and between 9256c50 and c8b84c1.

📒 Files selected for processing (6)
  • packages/core/src/llm-core/agent/legacy-executor.ts
  • packages/core/src/llm-core/chat/app.ts
  • packages/core/src/llm-core/chat/infinite_context.ts
  • packages/core/src/llm-core/platform/model.ts
  • packages/core/src/llm-core/utils/count_tokens.ts
  • packages/core/src/middlewares/chat/read_chat_message.ts

Walkthrough

本PR重构ChatLuna的上下文压缩管道,将基于类的InfiniteContextManager改为函数式compressIfNeededAPI,新增token计数工具并引入基线驱动的截断优化,在代理执行中集成scratchpad压缩能力,并更新多模态插件名称。

Changes

无限上下文和Scratchpad压缩系统重构

Layer / File(s) Summary
Token计数工具函数提取与重导出
packages/core/src/llm-core/utils/count_tokens.ts, packages/core/src/llm-core/prompt/system_prompts.ts
从系统提示模块提取countMessageTokenscountMessagesTokens到独立工具模块并重导出。新增单条消息token计数(支持base64图片移除)和消息列表计数(支持baseline优化)函数。
Token截断中的基线优化机制
packages/core/src/llm-core/platform/model.ts
cropMessages新增baseline搜索策略,定位最后一条AI消息的已知input_tokens,用于轮次批量加入时加速token累计,避免重复计数;未找到baseline时回退为逐轮累加。
压缩链从类式改为函数式设计
packages/core/src/llm-core/chain/infinite_context_chain.ts
ChatLunaInfiniteContextChain类重构为纯函数compressChunk,新增CompressChunkResult接口,移除类封装,增强COMPRESS_PROMPT支持工具调用和结果摘要。
无限上下文管理器重构为函数API
packages/core/src/llm-core/chat/infinite_context.ts
InfiniteContextManager类重构为compressIfNeeded函数,新增CompressContextOptionsCompressContextResult接口。实现消息轮次分割(保留至多3轮)、过期工具结果占位压缩、transcript格式化(保留tool_calls并截断参数)和压缩结果组装。
ChatInterface集成新压缩API
packages/core/src/llm-core/chat/app.ts
删除_infiniteContextManager成员和_ensureInfiniteContextManager()工厂方法,直接调用compressIfNeeded函数。在processChat发起链调用前进行上下文压缩,在compressContext中应用新API并传入force和阈值参数,新增独立的压缩错误捕获。
Scratchpad压缩实现与集成
packages/core/src/llm-core/agent/legacy-executor.ts
runAgent每轮工具调用后新增压缩触发逻辑:当scratchpad长度>6且input_tokens>=模型上下文85%时,调用compressScratchpad将早期条目与chat_history组合生成transcript,调用compressChunk获得摘要,用HumanMessage替换chat_history并保留最近3条条目。

多模态插件名称更新

Layer / File(s) Summary
中间件中的插件名称更新
packages/core/src/middlewares/chat/read_chat_message.ts
图片处理和音频处理拦截器中,将多模态服务插件名称从chatluna-multimodal-service更新为multimodal-service,包括检测、告警和GIF处理文案。

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

  • ChatLunaLab/chatluna#820: 无限上下文压缩流程的重写与过期工具结果占位压缩逻辑在两个PR中高度重叠。
  • ChatLunaLab/chatluna#845: 两个PR都修改cropMessages的轮次/token截断逻辑(本PR新增基线驱动路径,目标PR修改轮次边界检测)。
  • ChatLunaLab/chatluna#572: 本PR更新read_chat_message.ts的GIF/图片处理中间件插件名称,与目标PR对该中间件的重构/扩展相关。

Poem

🐰 压缩之梦化为函数纯,
基线优化减重无需辛。
Scratchpad告急自知早截断,
ChatLuna的对话从此轻松呼吸,
多模态插件名字更新好,
代理执行一路畅通无阻!✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 63.16% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed PR标题准确概括了核心变更:改进上下文压缩的预算机制,与整个PR重构上下文压缩流程、优化token预算计算的主要目标相符。
Description check ✅ Passed PR描述详细说明了新功能(智能压缩流程、agent scratchpad压缩、使用usage_metadata)、bug修复(baseline token计算、压缩阈值调整)和其他改进,与代码变更内容相关且信息充分。
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/context-compression-refactor

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the infinite context management system, moving from a class-based manager to functional utilities like compressIfNeeded and compressChunk. It introduces scratchpad compression for the agent executor to handle long tool-call loops and optimizes token counting by leveraging usage_metadata from previous AI responses as a baseline. Feedback focuses on ensuring that AbortSignal is correctly propagated through the new asynchronous compression paths to prevent unnecessary background processing and addressing a logic error in the token counting optimization that skips valid baseline messages. Additionally, it was noted that compression thresholds should be unified across the codebase.

Comment thread packages/core/src/llm-core/utils/count_tokens.ts Outdated
Comment thread packages/core/src/llm-core/agent/legacy-executor.ts Outdated
Comment thread packages/core/src/llm-core/agent/legacy-executor.ts
Comment thread packages/core/src/llm-core/agent/legacy-executor.ts Outdated
Comment thread packages/core/src/llm-core/chat/infinite_context.ts
Comment thread packages/core/src/llm-core/chat/infinite_context.ts Outdated
Comment thread packages/core/src/llm-core/chat/app.ts
Comment thread packages/core/src/llm-core/agent/legacy-executor.ts Outdated
…n trigger

Instead of estimating tokens by formatting scratchpad text, use the real
input_tokens from the AI message's usage_metadata returned by the LLM call.
This is accurate since it's what the model actually consumed.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 9256c50b33

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread packages/core/src/llm-core/chat/infinite_context.ts Outdated
Comment thread packages/core/src/llm-core/prompt/chat_history.ts Outdated
Comment thread packages/core/src/llm-core/prompt/chat_history.ts Outdated
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@packages/core/src/llm-core/agent/legacy-executor.ts`:
- Around line 381-395: The compression condition uses only scratchpadTokens
(from formatScratchpadForCount and tokenCounter) against maxTokenLimit * 0.84,
but the actual prompt also includes input['chat_history']; update the check in
legacy-executor.ts to include chat history tokens: format and count
input['chat_history'] (using the same tokenCounter), then compute either
totalTokens = scratchpadTokens + chatHistoryTokens and compare totalTokens to
maxTokenLimit * 0.84, or compute remainingBudget = maxTokenLimit -
chatHistoryTokens and compare scratchpadTokens to remainingBudget * 0.84;
trigger compression when the combined/remaining-based threshold is exceeded
(adjust the existing if that currently tests scratchpadTokens).

In `@packages/core/src/llm-core/chat/infinite_context.ts`:
- Around line 111-123: 当前实现使用 splitMessages() 固定按轮次(1~3)保留最近消息并在
compressIfNeeded() 仅记录 outputTokens 而不再校验
threshold/maxTokenLimit,导致若保留的最近轮次很长仍会超预算并在下次调用失败。请改为按 token 预算从后往前回填最近轮次:在
splitMessages 或 compressIfNeeded 中引入基于 threshold/maxTokenLimit 的预算计算(使用
threshold 和 maxTokenLimit、inputTokens、outputTokens),逐轮累加最近完整轮次直到累加的 tokens
达到预算上限为止;在生成 resultMessages 后重新计算并设置 outputTokens、compressed
标志、remainingMessageCount 和 messages 字段以反映真实压缩结果(引用符号:splitMessages,
compressIfNeeded, resultMessages, outputTokens, threshold, maxTokenLimit,
remainingMessageCount)。

In `@packages/core/src/llm-core/platform/model.ts`:
- Around line 835-891: 当前把 baselineTokens 直接一次性加到 totalTokens(在使用
baselineIdx/baselineRoundIdx 时)会低估同一轮中 baseline 之后的 AI 回复和 tool 消息的代价。修复方法:不要使用
baselineTokens 作为整个 0..baselineRoundIdx 的成本;在处理到 i <= baselineRoundIdx 且
selectedRounds 为空的分支里,逐轮调用 countRoundTokens(conversationRounds[j]) 累加
0..baselineRoundIdx 每一轮的真实 token 数并据此判断 exceedsLimit/truncated,然后将这些轮逐个 unshift
到 selectedRounds(而不是直接加 baselineTokens 并一次性 unshift 重复
baselineRoundIdx)。参考符号:baselineIdx, baselineRoundIdx, baselineTokens,
conversationRounds, selectedRounds, totalTokens, countRoundTokens,
maxTokenLimit。

In `@packages/core/src/llm-core/prompt/chat_history.ts`:
- Around line 72-137: The baseline calculation underestimates historical tokens
because findBaseline/baseline.tokens is treated as the full cost up to baseline
while runtime.usedTokens has already subtracted the current request
(input/scratchpad) and the baseline AI reply token count is not added back; this
causes selectedRounds to include too much history when chatHistory ends with an
AI message. Fix by computing the true baseline cost as baseline.tokens plus the
token count of the baseline AI message if that message is not already included
in runtime.usedTokens (i.e., when current request tokens were removed), or
alternatively recompute the baseline segment by calling countMessagesTokens on
rounds[0..baselineRoundIdx] instead of trusting baseline.tokens; update the
logic in the loop that unwraps the bulkRounds (the block using baselineRoundIdx,
baseline.tokens, runtime.usedTokens, selectedRounds, availableLimit and
countMessagesTokens) and likewise apply the same correction in the analogous
code at lines 198-217 so usedTokens correctly reflects all messages up to and
including the baseline AI message before comparing to availableLimit.

In `@packages/core/src/middlewares/chat/read_chat_message.ts`:
- Line 252: The warning strings reference the old plugin name
"chatluna-multimodal-service" while the code checks for
ctx.chatluna.getPlugin('multimodal-service'); update all warning/error messages
in this file that mention "chatluna-multimodal-service" (the messages near the
checks around ctx.chatluna.getPlugin('multimodal-service')) to use
"multimodal-service" so the logged/printed plugin name matches the actual plugin
id the code looks up (apply to the other similar messages in the same file).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: b2d3d4ce-1642-4765-a201-18f725a050f0

📥 Commits

Reviewing files that changed from the base of the PR and between f5422c9 and 9256c50.

📒 Files selected for processing (9)
  • packages/core/src/llm-core/agent/legacy-executor.ts
  • packages/core/src/llm-core/chain/infinite_context_chain.ts
  • packages/core/src/llm-core/chat/app.ts
  • packages/core/src/llm-core/chat/infinite_context.ts
  • packages/core/src/llm-core/platform/model.ts
  • packages/core/src/llm-core/prompt/chat_history.ts
  • packages/core/src/llm-core/prompt/system_prompts.ts
  • packages/core/src/llm-core/utils/count_tokens.ts
  • packages/core/src/middlewares/chat/read_chat_message.ts

Comment thread packages/core/src/llm-core/agent/legacy-executor.ts Outdated
Comment thread packages/core/src/llm-core/chat/infinite_context.ts Outdated
Comment thread packages/core/src/llm-core/platform/model.ts Outdated
Comment thread packages/core/src/llm-core/prompt/chat_history.ts Outdated
Comment thread packages/core/src/middlewares/chat/read_chat_message.ts
- count_tokens.ts: allow baseline when it's the last message (baselineIdx >= 0)
- Pass AbortSignal through compression chain (app.ts -> infinite_context -> compressChunk, legacy-executor -> compressScratchpad -> compressChunk)
- Unify compression threshold to 0.85
- Fix compacted messages detection: use reference equality (compacted !== messages) instead of length comparison
- Revert chat_history.ts baseline optimization (unreliable in prompt pipeline context where system tokens differ between calls)
…text

- cropMessages baseline now counts the AI message itself and subsequent
  tool messages in the same round (usage_metadata.input_tokens only covers
  messages before the AI response)
- Update warning messages to show both plugin names for clarity
@dingyi222666 dingyi222666 changed the title [Refactor] streamline context compression [Refactor] improve context compression budgeting May 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant