Skip to content

fix(runtime): 修复 resume_verify_closure 首轮 verify 启动失败并完善状态机闭环#666

Merged
phantom5099 merged 5 commits into
1024XEngineer:mainfrom
Cai-Tang-www:feat/feishu-approval-strict-fsm
May 20, 2026
Merged

fix(runtime): 修复 resume_verify_closure 首轮 verify 启动失败并完善状态机闭环#666
phantom5099 merged 5 commits into
1024XEngineer:mainfrom
Cai-Tang-www:feat/feishu-approval-strict-fsm

Conversation

@Cai-Tang-www
Copy link
Copy Markdown
Collaborator

@Cai-Tang-www Cai-Tang-www commented May 19, 2026

变更背景

当前 Runtime 的 run-state 机制已接近完整,但在 resume 场景中存在一个关键启动缺口:

  • 当恢复策略命中 resume_verify_closure 时,会在新 Run 首轮将 base lifecycle 覆盖为 verify
  • 若此时生命周期仍是初始空值,setBaseRunState 会触发 "" -> "verify" 转移;
  • 现有转移校验拒绝该路径,导致恢复流程在第一次 provider 调用前提前失败。

该问题已经被 review 明确指出,属于功能正确性问题(非风格问题)。

本 PR 做了什么

1) 修复 resume_verify_closure 启动失败

internal/runtime/run.go 中,将首轮 base state 设置改为通过新引导入口执行:

  • 由直接调用 setBaseRunState 改为 applyTurnBaseRunState

internal/runtime/run_lifecycle.go 新增:

  • applyTurnBaseRunState(ctx, state, next)
    • state.lifecycle == ""next == verify 时,先安全引导到 plan,再进入 verify
    • 其余路径保持原有行为。

该方案不放宽全局状态转移规则,避免把“特殊恢复场景”扩散为“通用状态机语义放宽”。

2) 增加回归测试

internal/runtime/run_lifecycle_test.go 新增:

  • TestApplyTurnBaseRunStateBootstrapsVerifyFromEmpty
    • 断言空态下目标为 verify 时会经历 "" -> "plan" -> "verify",并最终稳定在 verify

3) 同次提交内包含的状态机完整化改动

本分支还包含前一提交中对 runtime 状态机的完整化增强(已在分支历史中):

  • 新增 waiting_user_question 生命周期状态并接入优先级计算。
  • ask_user 执行链路中引入显式挂起/恢复与兜底清理,防止 pending 状态泄漏。
  • 新增 resume_applied 事件及恢复策略投影(replay_plan / resume_verify_closure)。
  • 补齐 verify 生命周期事件(started / stage_finished / finished)。
  • 修复 Feishu 配置测试环境变量污染导致的不稳定性。

改动价值

  • 修复真实故障路径resume_verify_closure 现在可以在新 Run 首轮正常启动,不再在 provider 调用前失败。
  • 保持状态机边界清晰:不修改 ValidateRunStateTransition 的通用规则,最小化副作用。
  • 提升可观测性与可恢复性:ask_user、resume、verify 的生命周期事件闭环更完整。
  • 提升测试稳定性:消除环境变量污染带来的伪失败。

风险与兼容性评估

  • 风险面较小:新增引导仅在“空生命周期且目标 verify”场景触发。
  • 事件层面会多一次显式阶段变化("" -> "plan" -> "verify"),属于可解释、可消费的状态机演进。
  • 其余生命周期路径和 provider/tool 分层职责不变。

验证

已在本地完成:

  • go test ./internal/runtime/...
  • go test ./...

均通过。

关键文件

  • internal/runtime/run.go
  • internal/runtime/run_lifecycle.go
  • internal/runtime/run_lifecycle_test.go
  • internal/runtime/permission.go
  • internal/runtime/checkpoint_resume.go
  • internal/runtime/verification_events.go
  • internal/config/feishu_test.go

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Credits must be used to enable repository wide code reviews.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 19, 2026

Codecov Report

❌ Patch coverage is 93.61702% with 9 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
internal/runtime/permission.go 85.71% 2 Missing and 2 partials ⚠️
internal/runtime/checkpoint_resume.go 95.65% 1 Missing and 1 partial ⚠️
internal/runtime/run_lifecycle.go 88.88% 1 Missing and 1 partial ⚠️
internal/runtime/run.go 88.88% 0 Missing and 1 partial ⚠️

📢 Thoughts on this report? Let us know!

fennoai[bot]

This comment was marked as outdated.

@Cai-Tang-www Cai-Tang-www changed the base branch from feat/feishu-approval-strict-fsm to main May 19, 2026 08:45
@Cai-Tang-www Cai-Tang-www changed the title feat(runtime): 完整化 run-state 机器(ask_user/restore/verify)并修复 Feishu 配置测试稳定性 fix(runtime): 修复 resume_verify_closure 首轮 verify 启动失败并完善状态机闭环 May 19, 2026
@Cai-Tang-www
Copy link
Copy Markdown
Collaborator Author

/code 提高测试覆盖率到90,完成后推送到我的分支

@fennoai
Copy link
Copy Markdown

fennoai Bot commented May 19, 2026

Result

已补充并提交测试覆盖改进,提交为 d680c60d,代码已推到 fork 分支 fork-pr-666-1779179913

What Changed

  • internal/runtime/checkpoint_flow_test.go 增加 resume checkpoint 的兜底分支测试、deriveResumeBaseLifecycle 表驱动测试,以及一条 Run 级别的 resume_verify_closure 首轮引导回归测试。
  • internal/runtime/run_lifecycle_test.go 补充 applyTurnBaseRunState 的非引导路径和非法状态分支测试。
  • internal/runtime/verification_events_test.go 补充 accepted 场景、nil guard,以及 permission/timeout/not found 错误分类覆盖。

Verification

  • go test ./... 通过。
  • 关键覆盖率结果:checkpoint_resume.go 96.3%,deriveResumeBaseLifecycle 100%,verification_events.go 100%。

Fork PR

已创建 fork 内回灌 PR:Cai-Tang-www#65

View job run

xgopilot and others added 2 commits May 19, 2026 09:31
Generated with [codeagent](https://github.com/qbox/codeagent)
Co-authored-by: Cai-Tang-www <106404101+Cai-Tang-www@users.noreply.github.com>
test(runtime): raise resume verify coverage
@Cai-Tang-www
Copy link
Copy Markdown
Collaborator Author

/review

Copy link
Copy Markdown

@fennoai fennoai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found 2 noteworthy issues after code quality, performance, security, and documentation/contract passes.

return
}

resume, err := s.checkpointStore.GetLatestResumeCheckpoint(ctx, sessionID)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Medium: applyResumeCheckpoint auto-applies the latest row for session_id, but it never validates the stored workspace_key or transcript_revision before forcing resumeNextBaseLifecycle/pendingSystemReminder. Because SetResumeCheckpoint persists both fields, a stale checkpoint from before a restore or workdir change can still steer an unrelated next run into plan/verify. This should be gated on the checkpoint still matching the current session/workspace state before applying it.

Comment thread internal/runtime/run.go
}

report := s.evaluateAcceptGate(ctx, &state, turnOutput.assistant)
s.emitVerificationLifecycleEvents(ctx, &state, completionState, report)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Medium: this PR starts emitting verification_started, verification_stage_finished, and verification_finished, but the downstream TUI contract is not wired for them yet. internal/tui/services/gateway_stream_client.go still falls through to the default branch for these event types, and internal/tui/core/app/update.go has no handlers registered, so the payloads arrive as raw maps and are effectively ignored. As written, the new observability path is incomplete for existing clients.

@phantom5099 phantom5099 merged commit 6de9eaa into 1024XEngineer:main May 20, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants