Skip to content

fix(multi-review): filter lock files from PR diff to prevent LLM context overflow#195

Merged
Svtter merged 1 commit into
mainfrom
farm/d9cd57e8/multi-review-filter-lock-files
Jun 5, 2026
Merged

fix(multi-review): filter lock files from PR diff to prevent LLM context overflow#195
Svtter merged 1 commit into
mainfrom
farm/d9cd57e8/multi-review-filter-lock-files

Conversation

@Svitter

@Svitter Svitter commented Jun 5, 2026

Copy link
Copy Markdown
Contributor

Repro

PR with pnpm-lock.yaml (790KB) + bun.lock (411KB) = ~1.2MB total diff. The multi-review action loads the full diff (PR diff loaded from pre-fetched file: 1207658 chars) and concatenates it inline into a single text part sent to all 4 reviewers. Within ~280ms, every reviewer fails with Unexpected server error — the request body exceeds model limits.

Cause

orchestrator.ts:76 constructs the prompt by appending the raw prDiff string directly:

parts: [{ type: "text", text: reviewer.prompt + "\n\nPR Diff:\n```\n" + prDiff + "\n```" }]

No filtering occurs between diff load and LLM submission. Lock files routinely exceed 500KB each, pushing total diff well past typical model request body limits.

Fix

  • New: multi-review/src/diff-filter.ts — Parses unified diff headers, extracts basenames from diff --git a/<path> b/<path> lines, and strips sections for files matching lock/auto-generated patterns (*.lock, *.lockb, pnpm-lock.yaml, yarn.lock, package-lock.json, bun.lock, go.sum, Cargo.lock, Gemfile.lock, uv.lock, poetry.lock, composer.lock, Pipfile.lock, requirements.lock, flake.lock).
  • Edit: multi-review/src/index.ts — Calls filterDiff() after diff load, logs excluded files, passes filtered diff to reviewers.
  • New: multi-review/src/diff-filter.test.ts — 14 tests covering lock file patterns, nested paths, empty diffs, all-lock-file diffs, and non-lock files with "lock" in the name.
  • Edit: multi-review/package.json — Adds diff-filter.test.ts to test script.
  • Build: multi-review/dist/index.cjs — Regenerated bundle.

Verification

$ cd multi-review && bun test
32 pass, 0 fail across 3 files

$ bun run check   # tsc --noEmit
(clean)

Fixes #194

…ext overflow

Lock files (pnpm-lock.yaml, bun.lock, package-lock.json, yarn.lock, etc.)
can push diffs past 1MB. The entire diff was sent inline as a single text
part to each reviewer, exceeding model request limits and causing all 4
reviewers to fail with 'Unexpected server error'.

Add diff-filter.ts that parses unified diff headers and strips sections
for files matching known lock file / auto-generated patterns. Apply the
filter in index.ts before the diff reaches the LLM.

Fixes #194
@github-actions

github-actions Bot commented Jun 5, 2026

Copy link
Copy Markdown

Let me verify the issues against the current code first.
Now I have all the context needed. Here is the synthesized report:


最终决策:可合并 / CAN MERGE

所有 4 位 reviewer 均同意合并,无阻塞项。该 PR 功能清晰、实现轻量、测试覆盖完整,且实际上通过过滤大锁文件改善了整体性能。

阻塞项:无 / Blocking Issues: None

建议项 / Suggestions:

  1. LOCK_PATTERNS 存在冗余条目 — 已确认 (quality + performance + architecture 一致指出)/\.lockb?$/ 已覆盖所有 .lock/.lockb 后缀文件,后续 ^(uv|poetry|Gemfile|Cargo|composer)\.lock$^(Pipfile|requirements)\.lock$^flake\.lock$、以及 yarn.lock/bun.lock 均属重复匹配。建议精简以降低维护认知负担。

  2. multi-review/src/index.ts:89 const diffForReview = reviewDiff; 多余变量重赋值filterDiff 的返回值 { filtered: reviewDiff } 可直接传入 runParallelReviewers,无需额外强制命名。

  3. multi-review/src/diff-filter.ts:44 正则 (.+?)(?:\s|$) 遇含空格文件名会截断 — 虽然锁文件极少含空格,但若未来扩展过滤范围可能导致误判。可考虑解析 git diff 引用格式的文件名。

  4. 未来可考虑将 pattern 列表参数化 — 当前硬编码在模块内足以满足需求,若需支持更多文件类型或允许用户配置,可注入或从配置文件读取。

  5. 超大 diff 的流式处理考量 — 当前 split + join 对超大 diff 有额外 O(n) 字符串分配。若未来成为瓶颈,可改逐行扫描匹配 diff --git 前缀以避免一次性 split。


📋 各 Reviewer 详细审查结果
quality

可合并 / CAN MERGE

总结

此 PR 在 multi-review action 中添加了 filterDiff 函数,用于在发送给 LLM 之前从 PR diff 中过滤掉锁文件(如 package-lock.jsonyarn.lockpnpm-lock.yaml 等),避免大文件超出模型上下文窗口导致「Unexpected server error」。同时添加了完整的单元测试。代码结构清晰,测试覆盖全面。


阻塞项:无

建议项

  1. src/index.ts:87 — 多余变量重赋值const diffForReview = reviewDiff; 是多余的,可以直接将 reviewDiff 传给 runParallelReviewers,无需再赋给一个新变量。

  2. src/diff-filter.ts:13 — 正则匹配弱覆盖含空格文件名:正则 (.+?)(?:\s|$) 遇到文件名中的空格会在第一个空格处截断,捕获不完整的文件名。对于锁文件虽不常见,但如果未来需要扩展过滤范围,可能导致误判。可考虑使用 git diff 的引用格式(双引号包裹)来处理带空格的文件名。

  3. src/diff-filter.ts:9-23 — LOCK_PATTERNS 冗余:由于 \.lockb?$ 已覆盖所有 .lock / .lockb 后缀文件,后面的 ^(uv|poetry|Gemfile|Cargo|composer)\.lock$^(Pipfile|requirements)\.lock$ 是冗余的(但不影响正确性)。只有 go.sumflake.lock 需要单独列出。可以考虑精简或添加注释说明显式列出的意图。

  4. src/diff-filter.ts — 函数职责单一:目前 filterDiff 只过滤文件名,功能已足够。未来如果需要过滤更多类型(如 go.sum、二进制文件等),可考虑将 pattern 列表作为参数注入或从配置文件读取,增强可扩展性。

security

安全无虞 / SECURE

该 PR 实现了从 PR diff 中过滤锁定文件(lock files)的功能,用于控制 LLM 请求大小。代码逻辑简单清晰,仅对 diff 文本进行字符串分割和正则匹配,不涉及数据库查询、外部命令执行或网络请求。

  • 注入风险:diff 内容来自 GitHub API 的 git diff 输出,非用户自由输入;basename 仅用于模式匹配和 console.log 日志输出,不传入任何执行上下文(shell、eval 等),无注入风险。
  • 认证授权:未修改相关逻辑,不受影响。
  • 敏感数据:不处理密钥、token 或 PII。
  • 依赖安全:无新增外部依赖。

阻塞项:无

建议项:无

performance

性能良好 / GOOD

该 PR 的主要变更(filterDiff 函数)实际上改善了性能——通过过滤大幅缩减发送给 LLM 的 diff 体积,避免超大 lock file 导致 API 超时或上下文窗口溢出。实现本身轻量高效:

  • 算法复杂度:O(n×m),n 为 diff 段数(通常几十至几百),m 为锁文件模式数(~10),可忽略
  • 内存分配:对 diff 做 split 创建数组,再 join 重建,对大 diff 有额外 O(n) 的 string 分配,但相较后续 LLM API 调用可忽略
  • 无 I/O/网络/数据库操作:纯内存计算,无阻塞风险
  • 无并发问题:纯同步函数,无共享状态

阻塞项:无

建议项:

  1. LOCK_PATTERNS 中部分模式存在冗余——\.lockb?$ 已覆盖 *.lock/*.lockb,后面多个特定模式(如 flack\.lock$Cargo\.lock)实际是重复匹配。虽不产生功能性影响,但可考虑简化去掉冗余条目以减少每次正则遍历。
  2. 对于超大 diff(数万段),sections = diff.split(...) 会产生大量中等长度字符串。若未来成为瓶颈,可改为逐行扫描匹配 diff --git 前缀的流式处理,避免一次性 split 全部。
architecture

架构合理 / SOUND

该 PR 引入 diff-filter.ts 作为独立模块,从 diff 字符串中过滤锁文件,降低 LLM 请求大小避免超时。整体架构质量良好。

分析要点:

  • 耦合度diff-filter.ts 是纯函数工具,零依赖,仅由 index.ts 导入,耦合度极低。
  • 模块放置:位于 multi-review/src/diff-filter.ts,命名清晰(diff-filter 准确表达职责),与现有模块(platform.tsreviewers.tsorchestrator.ts)平级,位置合理。
  • 分层:职责单一明确——纯文本处理,与 GitHub API 调用(platform.ts)、评审编排(orchestrator.ts)完全解耦,没有跨域混合。
  • 接口设计filterDiff(diff: string) => { filtered, removedFiles } 接口简洁,输入输出类型清晰,抽象层次合适。
  • 改动范围:新增 2 个文件 + 3 行 index.ts 修改 + 1 行 package.json 修改,改动高度局部化,无散弹式修改。
  • 一致性:采用与现有模块相同的 src/*.ts + *.test.ts 测试伴生模式,与代码库风格一致。

阻塞项:无

建议项:

  1. LOCK_PATTERNS/\.lockb?$/ 已经覆盖了所有 *.lock/*.lockb 文件,后续专门列出的模式(如 yarn.lockbun.lock 等)是冗余的。虽然不会造成错误,但可考虑整理去重以降低维护认知负担。
  2. 当前仅匹配 basename,若未来出现同名非锁文件位于不同目录的场景可能需要更精确的路径匹配策略(目前无此问题)。

@github-actions

github-actions Bot commented Jun 5, 2026

Copy link
Copy Markdown

无遗漏

分析总结: PR 实现了 Issue #194 中建议的方案一(硬编码排除列表)。具体实现了:

  • diff-filter.ts:通过 diff --git 头信息解析 basename,匹配 \.lockb?$go.sum 及特定生态锁文件名(pnpm-lock.yaml、bun.lock、Cargo.lock、Gemfile.lock、poetry.lock 等 10+ 种),过滤掉对应的 diff 段
  • index.ts 第 85-89 行:在 diff 加载后调用 filterDiff(),记录被排除的文件,并将过滤后的 diff 传入后续审查流程
  • diff-filter.test.ts:14 个测试覆盖锁文件模式、嵌套路径、空 diff、全锁文件 diff、非锁文件含 "lock" 名称等边界
  • package.json:测试脚本已包含新测试文件
  • dist/index.cjsfilterDiff 函数和调用均已正确打包(第 5451-5485 行、第 5543-5547 行)

Issue #194 中列出的三个建议为或选方案("1. ... 2. Or ... 3. Or ..."),PR 选择方案一实现完整,所有被提及的锁文件类型均已覆盖,集成路径正确,无遗漏。

New%20session%20-%202026-06-05T14%3A58%3A13.168Z
opencode session  |  github run

@Svtter Svtter merged commit 25cf539 into main Jun 5, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Multi-review fails when PR diff contains large lock files (>1MB)

2 participants