[Refactor] Simplify infinite context compression system#774
Conversation
- Simplify compression prompt and logic to focus on conversation summarization - Replace chunked compression with single-pass summary of entire conversation - Add CompressContextResult interface to track compression metrics (input tokens, output tokens, reduction percentage) - Add manual compression command via 'chatluna.chat.compress' with force flag - Update room and chat commands to display compression metrics in messages - Update i18n messages for both English and Chinese to show token reduction stats - Improve compression flow to handle edge cases and provide better feedback - Extract formatTranscript helper function for cleaner code
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
|
Caution Review failedThe pull request is closed. ℹ️ Recent review info⚙️ Run configurationConfiguration used: Repository UI Review profile: CHILL Plan: Pro Run ID: ⛔ Files ignored due to path filters (2)
📒 Files selected for processing (7)
总体概述本次变更引入了新的聊天压缩命令和改进的压缩流程。新增 文件变更
序列图sequenceDiagram
participant User as 用户
participant ChatCmd as 聊天命令
participant RoomCmd as 房间命令
participant Middleware as 压缩中间件
participant Service as 聊天服务
participant Manager as 无限上下文管理器
participant Chain as 压缩链
participant LLM as 语言模型
User->>ChatCmd: 执行 chatluna.chat.compress -r room1
ChatCmd->>Middleware: 触发 compress_room 中间件
Middleware->>Service: compressContext(room, force=true)
Service->>Manager: compressIfNeeded(wrapper, force=true)
Manager->>Manager: 统计消息令牌数
alt 需要压缩
Manager->>Chain: 创建压缩任务
Chain->>LLM: 调用 LLM 压缩对话
LLM-->>Chain: 返回压缩摘要和用量
Chain-->>Manager: 返回 ChatLunaInfiniteContextChunkResult
Manager->>Manager: 更新聊天历史记录
Manager-->>Service: 返回 CompressContextResult
else 跳过压缩
Manager-->>Service: 返回 CompressContextResult (compressed=false)
end
Service-->>Middleware: 返回压缩结果
Middleware-->>User: 显示压缩统计信息
代码审查工作量估计🎯 3 (中等) | ⏱️ ~25 分钟 相关 PR
诗歌
✨ Finishing Touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly refactors the infinite context compression system to be more streamlined and user-friendly. The core change involves simplifying the underlying compression mechanism from a multi-chunk approach to a single-pass summarization of the entire conversation. This not only reduces complexity but also enhances the clarity of the compression process by providing detailed token reduction statistics to the user, improving the overall feedback and control over conversation history management. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request refactors the chat history compression mechanism, introducing a new chatluna.chat.compress command and updating the existing chatluna.room.compress command to support forced compression and detailed output. The core compression logic in InfiniteContextManager has been simplified, moving from a complex chunking and merging strategy to a direct summarization of the entire conversation transcript by the LLM, with the return type now including detailed token usage metadata. Localization messages have been updated to display these new compression statistics. However, the formatTranscript function is vulnerable to prompt injection due to unsanitized user input, which could allow attackers to manipulate the summarization LLM. Additionally, the compressIfNeeded function contains duplicated logic for handling skipped compression, which could be improved with a helper function for better maintainability.
| function formatTranscript(messages: BaseMessage[]) { | ||
| return messages | ||
| .map((message) => { | ||
| const role = message.getType().toUpperCase() | ||
| const name = message.name ? ` (${message.name})` : '' | ||
| const content = getMessageContent(message.content).trim() | ||
| return `[${role}${name}]\n${content || '(empty)'}` | ||
| }) | ||
| .join('\n\n---\n\n') | ||
| } |
There was a problem hiding this comment.
The formatTranscript function constructs a conversation transcript by concatenating message roles and contents using a simple string format with \n\n---\n\n as a separator. This transcript is then passed to an LLM for summarization. Because user-provided message content is not sanitized or escaped, an attacker can inject the separator and fake role headers (e.g., [SYSTEM]) into their messages. This allows them to spoof the conversation history seen by the summarization LLM, potentially leading to a manipulated summary. Since this summary is added back to the chat history and used as context for future AI responses, it can be used to influence the bot's behavior or misrepresent the conversation state.
function formatTranscript(messages: BaseMessage[]) {
return messages
.map((message) => {
const role = message.getType().toUpperCase()
const name = message.name ? ` (${message.name})` : ''
const content = getMessageContent(message.content).trim()
// Escape the separator to prevent transcript spoofing
const escapedContent = content.replace(/\n\n---\n\n/g, '\n\n - -- \n\n')
return `[${role}${name}]\n${escapedContent || '(empty)'}`
})
.join('\n\n---\n\n')
}| if (!model) { | ||
| return | ||
| return { | ||
| inputTokens: 0, | ||
| outputTokens: 0, | ||
| reducedTokens: 0, | ||
| reducedPercent: 0, | ||
| compressed: false | ||
| } | ||
| } | ||
|
|
||
| const messages = await this.options.chatHistory.getMessages() | ||
|
|
||
| if (messages.length === 0) { | ||
| return | ||
| } | ||
|
|
||
| const invocation = model.invocationParams() | ||
| const maxTokenLimit = | ||
| invocation.maxTokenLimit && invocation.maxTokenLimit > 0 | ||
| ? invocation.maxTokenLimit | ||
| : model.getModelMaxContextSize() | ||
|
|
||
| if (!maxTokenLimit || maxTokenLimit <= 0) { | ||
| return | ||
| } | ||
|
|
||
| const presetMessages = Array.isArray( | ||
| this.options.preset?.value?.messages | ||
| ) | ||
| ? (this.options.preset?.value.messages as BaseMessage[]) | ||
| : [] | ||
|
|
||
| const presetTokens = await this._calculateMessageTokenStats( | ||
| model, | ||
| presetMessages | ||
| ).then((stats) => | ||
| stats.reduce((sum, current) => sum + current.tokens, 0) | ||
| ) | ||
|
|
||
| const threshold = Math.floor( | ||
| maxTokenLimit * (this.options.threshold ?? 0.85) | ||
| ) | ||
|
|
||
| const stats = await this._calculateMessageTokenStats(model, messages) | ||
| const totalTokens = | ||
| stats.reduce((sum, current) => sum + current.tokens, 0) + | ||
| presetTokens | ||
|
|
||
| if (totalTokens <= threshold) { | ||
| return | ||
| return { | ||
| inputTokens: 0, | ||
| outputTokens: 0, | ||
| reducedTokens: 0, | ||
| reducedPercent: 0, | ||
| compressed: false | ||
| } | ||
| } |
There was a problem hiding this comment.
There are multiple places in this function where you return a CompressContextResult for cases where compression is skipped. This logic is duplicated. To improve maintainability and reduce code duplication, consider creating a helper function to generate this "uncompressed" result object.
For example:
function createUncompressedResult(inputTokens: number): CompressContextResult {
return {
inputTokens,
outputTokens: inputTokens,
reducedTokens: 0,
reducedPercent: 0,
compressed: false,
};
}Then you could simplify the early returns, for example:
if (!model || messages.length === 0) {
return createUncompressedResult(0);
}And for other cases:
if (!maxTokenLimit || maxTokenLimit <= 0) {
return createUncompressedResult(inputTokens);
}
This PR refactors the infinite context compression system to be simpler and more straightforward.
New Features
chatluna.chat.compresswith optional room selection and force flagCompressContextResultinterface to track compression metrics (input tokens, output tokens, reduction percentage)Bug fixes
Other Changes
formatTranscripthelper function for cleaner code