Skip to content

[Refactor] Simplify infinite context compression system#774

Merged
dingyi222666 merged 1 commit into
v1-devfrom
fix/conversation-compact
Mar 10, 2026
Merged

[Refactor] Simplify infinite context compression system#774
dingyi222666 merged 1 commit into
v1-devfrom
fix/conversation-compact

Conversation

@dingyi222666
Copy link
Copy Markdown
Member

This PR refactors the infinite context compression system to be simpler and more straightforward.

New Features

  • Added manual compression command via chatluna.chat.compress with optional room selection and force flag
  • Added CompressContextResult interface to track compression metrics (input tokens, output tokens, reduction percentage)
  • Improved compression feedback with detailed metrics displayed in user-facing messages

Bug fixes

  • Fixed compression logic to handle edge cases and empty conversations properly
  • Improved error handling in compression workflow with better i18n support

Other Changes

  • Simplified compression prompt and logic to focus on conversation summarization
  • Replaced chunked compression approach with single-pass summary of entire conversation
  • Removed complex message filtering and token-based chunking logic
  • Extracted formatTranscript helper function for cleaner code
  • Updated i18n messages for both English and Chinese to display token reduction statistics
  • Updated room.ts and chat.ts commands to support i18n_base option for flexible message localization
  • Improved compression flow to provide better feedback to users about compression results

- Simplify compression prompt and logic to focus on conversation summarization
- Replace chunked compression with single-pass summary of entire conversation
- Add CompressContextResult interface to track compression metrics (input tokens, output tokens, reduction percentage)
- Add manual compression command via 'chatluna.chat.compress' with force flag
- Update room and chat commands to display compression metrics in messages
- Update i18n messages for both English and Chinese to show token reduction stats
- Improve compression flow to handle edge cases and provide better feedback
- Extract formatTranscript helper function for cleaner code
@chatgpt-codex-connector
Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Mar 10, 2026

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: c44d2ebe-834c-4c64-9b56-23cd1507f023

📥 Commits

Reviewing files that changed from the base of the PR and between 6bb9d81 and 054b296.

⛔ Files ignored due to path filters (2)
  • packages/core/src/locales/en-US.yml is excluded by !**/*.yml
  • packages/core/src/locales/zh-CN.yml is excluded by !**/*.yml
📒 Files selected for processing (7)
  • packages/core/src/commands/chat.ts
  • packages/core/src/commands/room.ts
  • packages/core/src/llm-core/chain/infinite_context_chain.ts
  • packages/core/src/llm-core/chat/app.ts
  • packages/core/src/llm-core/chat/infinite_context.ts
  • packages/core/src/middlewares/room/compress_room.ts
  • packages/core/src/services/chat.ts

总体概述

本次变更引入了新的聊天压缩命令和改进的压缩流程。新增 chatluna.chat.compress 命令,修改了压缩相关方法的签名以支持强制压缩选项,并引入了 CompressContextResult 返回类型以提供详细的压缩指标。

文件变更

内聚组 / 文件 变更摘要
新增聊天压缩命令
packages/core/src/commands/chat.ts
添加新的 chatluna.chat.compress 子命令,支持 -r 房间选项,调用链处理压缩请求。
压缩命令和中间件
packages/core/src/commands/room.ts, packages/core/src/middlewares/room/compress_room.ts
修改压缩命令处理流程,添加 forcei18n_base 参数支持,改进了返回结果的结构化处理和国际化消息适配。
压缩链和结果类型
packages/core/src/llm-core/chain/infinite_context_chain.ts, packages/core/src/llm-core/chat/infinite_context.ts
重构压缩流程:引入新的 ChatLunaInfiniteContextChunkResult 接口,修改 compressChunk 返回类型为包含文本和用量元数据的结构;重写 compressIfNeeded 方法,添加基于令牌计数的预检查和新的 CompressContextResult 返回类型。
压缩 API 服务层
packages/core/src/llm-core/chat/app.ts, packages/core/src/services/chat.ts
更新 compressContext 方法签名以支持可选的 force 参数,修改返回类型从 Promise<boolean>Promise<CompressContextResult>,并在服务层传递该参数。

序列图

sequenceDiagram
    participant User as 用户
    participant ChatCmd as 聊天命令
    participant RoomCmd as 房间命令
    participant Middleware as 压缩中间件
    participant Service as 聊天服务
    participant Manager as 无限上下文管理器
    participant Chain as 压缩链
    participant LLM as 语言模型

    User->>ChatCmd: 执行 chatluna.chat.compress -r room1
    ChatCmd->>Middleware: 触发 compress_room 中间件
    Middleware->>Service: compressContext(room, force=true)
    Service->>Manager: compressIfNeeded(wrapper, force=true)
    Manager->>Manager: 统计消息令牌数
    alt 需要压缩
        Manager->>Chain: 创建压缩任务
        Chain->>LLM: 调用 LLM 压缩对话
        LLM-->>Chain: 返回压缩摘要和用量
        Chain-->>Manager: 返回 ChatLunaInfiniteContextChunkResult
        Manager->>Manager: 更新聊天历史记录
        Manager-->>Service: 返回 CompressContextResult
    else 跳过压缩
        Manager-->>Service: 返回 CompressContextResult (compressed=false)
    end
    Service-->>Middleware: 返回压缩结果
    Middleware-->>User: 显示压缩统计信息
Loading

代码审查工作量估计

🎯 3 (中等) | ⏱️ ~25 分钟

相关 PR

  • PR #682:同时修改了手动压缩流程(命令和压缩中间件)以及 compressContext 相关的调用路径和选项,存在直接关联。
  • PR #602:主要修改了无限上下文压缩系统的核心逻辑(包括 compressIfNeededcompressContext API 及命令处理器),与本次变更的压缩流程改进直接相关。
  • PR #664:修改了 infinite_context.ts 文件中的 compressIfNeeded 流程,涉及压缩辅助方法的变更,与本 PR 的流程重构相关。

诗歌

🐰 新的压缩命令如春风拂过,
强制选项让流程更加灵活,
令牌计数精确又细致,
中文提示温暖又贴心,
聊天记忆在这里舞动,
压缩的智慧闪闪发光✨

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fix/conversation-compact

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@dingyi222666 dingyi222666 merged commit 3ec82d6 into v1-dev Mar 10, 2026
2 of 3 checks passed
@dingyi222666 dingyi222666 deleted the fix/conversation-compact branch March 10, 2026 18:32
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly refactors the infinite context compression system to be more streamlined and user-friendly. The core change involves simplifying the underlying compression mechanism from a multi-chunk approach to a single-pass summarization of the entire conversation. This not only reduces complexity but also enhances the clarity of the compression process by providing detailed token reduction statistics to the user, improving the overall feedback and control over conversation history management.

Highlights

  • New Manual Compression Command: Added a new manual compression command via chatluna.chat.compress which supports optional room selection and a force flag.
  • Enhanced Compression Metrics: Introduced a CompressContextResult interface to track and display detailed compression metrics, including input tokens, output tokens, and reduction percentage, in user-facing messages.
  • Simplified Compression Logic: Refactored the infinite context compression system to use a simpler, single-pass summarization approach for the entire conversation, replacing the previous complex chunked compression and message filtering logic.
  • Improved Internationalization (i18n) Support: Updated i18n messages for both English and Chinese to display token reduction statistics and added i18n_base options to commands for flexible message localization.
  • Bug Fixes and Error Handling: Addressed compression logic edge cases and empty conversations, and improved error handling within the compression workflow.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • packages/core/src/commands/chat.ts
    • Added a new chatluna.chat.compress command.
  • packages/core/src/commands/room.ts
    • Modified the chatluna.room.compress command to support force and i18n_base options.
  • packages/core/src/llm-core/chain/infinite_context_chain.ts
    • Updated imports to include AIMessage and UsageMetadata.
    • Introduced the ChatLunaInfiniteContextChunkResult interface.
    • Simplified the compression prompt for conversation summarization.
    • Modified the compressChunk method to return ChatLunaInfiniteContextChunkResult and removed the _isAlreadyCompressed method.
  • packages/core/src/llm-core/chat/app.ts
    • Imported CompressContextResult.
    • Modified the compressContext method to accept a force parameter, return CompressContextResult, and throw a ChatLunaError if chat history is not initialized.
  • packages/core/src/llm-core/chat/infinite_context.ts
    • Defined the CompressContextResult interface.
    • Extracted the formatTranscript helper function.
    • Refactored the compressIfNeeded method to implement single-pass summarization, accept a force parameter, and return CompressContextResult.
    • Removed several complex internal methods related to chunking, message filtering, and summary building.
  • packages/core/src/locales/en-US.yml
    • Updated English localization messages for compression success and skipped states to include token statistics.
    • Added new localization entries for the chatluna.chat.compress command.
  • packages/core/src/locales/zh-CN.yml
    • Updated Chinese localization messages for compression success and skipped states to include token statistics.
    • Added new localization entries for the chatluna.chat.compress command.
  • packages/core/src/middlewares/room/compress_room.ts
    • Adjusted the compression middleware to utilize the i18n_base option for messages.
    • Updated success and skipped messages to display detailed token reduction statistics.
    • Passed the force option to the compressContext service call.
  • packages/core/src/services/chat.ts
    • Modified the compressContext service method to accept and propagate a force parameter.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the chat history compression mechanism, introducing a new chatluna.chat.compress command and updating the existing chatluna.room.compress command to support forced compression and detailed output. The core compression logic in InfiniteContextManager has been simplified, moving from a complex chunking and merging strategy to a direct summarization of the entire conversation transcript by the LLM, with the return type now including detailed token usage metadata. Localization messages have been updated to display these new compression statistics. However, the formatTranscript function is vulnerable to prompt injection due to unsanitized user input, which could allow attackers to manipulate the summarization LLM. Additionally, the compressIfNeeded function contains duplicated logic for handling skipped compression, which could be improved with a helper function for better maintainability.

Comment on lines +20 to 29
function formatTranscript(messages: BaseMessage[]) {
return messages
.map((message) => {
const role = message.getType().toUpperCase()
const name = message.name ? ` (${message.name})` : ''
const content = getMessageContent(message.content).trim()
return `[${role}${name}]\n${content || '(empty)'}`
})
.join('\n\n---\n\n')
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-medium medium

The formatTranscript function constructs a conversation transcript by concatenating message roles and contents using a simple string format with \n\n---\n\n as a separator. This transcript is then passed to an LLM for summarization. Because user-provided message content is not sanitized or escaped, an attacker can inject the separator and fake role headers (e.g., [SYSTEM]) into their messages. This allows them to spoof the conversation history seen by the summarization LLM, potentially leading to a manipulated summary. Since this summary is added back to the chat history and used as context for future AI responses, it can be used to influence the bot's behavior or misrepresent the conversation state.

function formatTranscript(messages: BaseMessage[]) {
    return messages
        .map((message) => {
            const role = message.getType().toUpperCase()
            const name = message.name ? ` (${message.name})` : ''
            const content = getMessageContent(message.content).trim()
            // Escape the separator to prevent transcript spoofing
            const escapedContent = content.replace(/\n\n---\n\n/g, '\n\n - -- \n\n')
            return `[${role}${name}]\n${escapedContent || '(empty)'}`
        })
        .join('\n\n---\n\n')
}

Comment on lines 49 to 69
if (!model) {
return
return {
inputTokens: 0,
outputTokens: 0,
reducedTokens: 0,
reducedPercent: 0,
compressed: false
}
}

const messages = await this.options.chatHistory.getMessages()

if (messages.length === 0) {
return
}

const invocation = model.invocationParams()
const maxTokenLimit =
invocation.maxTokenLimit && invocation.maxTokenLimit > 0
? invocation.maxTokenLimit
: model.getModelMaxContextSize()

if (!maxTokenLimit || maxTokenLimit <= 0) {
return
}

const presetMessages = Array.isArray(
this.options.preset?.value?.messages
)
? (this.options.preset?.value.messages as BaseMessage[])
: []

const presetTokens = await this._calculateMessageTokenStats(
model,
presetMessages
).then((stats) =>
stats.reduce((sum, current) => sum + current.tokens, 0)
)

const threshold = Math.floor(
maxTokenLimit * (this.options.threshold ?? 0.85)
)

const stats = await this._calculateMessageTokenStats(model, messages)
const totalTokens =
stats.reduce((sum, current) => sum + current.tokens, 0) +
presetTokens

if (totalTokens <= threshold) {
return
return {
inputTokens: 0,
outputTokens: 0,
reducedTokens: 0,
reducedPercent: 0,
compressed: false
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There are multiple places in this function where you return a CompressContextResult for cases where compression is skipped. This logic is duplicated. To improve maintainability and reduce code duplication, consider creating a helper function to generate this "uncompressed" result object.

For example:

function createUncompressedResult(inputTokens: number): CompressContextResult {
    return {
        inputTokens,
        outputTokens: inputTokens,
        reducedTokens: 0,
        reducedPercent: 0,
        compressed: false,
    };
}

Then you could simplify the early returns, for example:

if (!model || messages.length === 0) {
    return createUncompressedResult(0);
}

And for other cases:

if (!maxTokenLimit || maxTokenLimit <= 0) {
    return createUncompressedResult(inputTokens);
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant