feat(channel): ADR-0017 + scaffolding for per-user NyxID binding#477
Conversation
Per-user binding for channel bots can be implemented entirely as a NyxID OAuth client without any NyxID surface changes; the five endpoints proposed in discussion #400 collapse onto existing OAuth/OIDC primitives plus RFC 8693 Token Exchange. ADR captures the trade-off between local refresh-token storage (stop-gap) and the broker model (#375 target), accepts the local adapter as a transitional implementation under explicit hard constraints, and pins the migration path back to zero-secret-material grain state. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c95c818a7d
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| -> 回 Lark "{nyxid}/oauth/authorize?client_id=aevatar-channel-binding | ||
| &redirect_uri=https://aevatar/api/oauth/nyxid-callback | ||
| &state={correlation_id}&code_challenge=...&scope=openid" |
There was a problem hiding this comment.
Return binding authorize URL through a private channel
The /init flow currently returns a reusable OAuth authorize URL containing state directly to Lark, but the ADR does not require this response to be private/ephemeral. In a group chat, another participant can open that URL, complete login with their own NyxID account, and the callback path (state -> ExternalSubjectRef) will bind that Nyx subject to the original sender, causing cross-account binding hijack. This is a security issue in any non-private channel flow unless you require private delivery or add a second verifier tied to the initiating platform user before accepting the callback.
Useful? React with 👍 / 👎.
Codecov Report✅ All modified and coverable lines are covered by tests. @@ Coverage Diff @@
## dev #477 +/- ##
=======================================
Coverage 71.63% 71.64%
=======================================
Files 1237 1237
Lines 89642 89642
Branches 11720 11720
=======================================
+ Hits 64214 64222 +8
+ Misses 20827 20822 -5
+ Partials 4601 4598 -3
Flags with carried forward coverage won't be shown. Click here to find out more. 🚀 New features to boost your workflow:
|
After security analysis, reverse the earlier acceptance of a local encrypted-refresh-token stop-gap. aevatar will instead hold only an opaque binding_id and call NyxID's broker token endpoint each turn, keeping grain state free of secret material. Implementation is gated on ChronoAIProject/NyxID#549. Existing bot owner-shared mode remains in place until the dependency lands, with no regression. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
eanzhao
left a comment
There was a problem hiding this comment.
方向 OK:broker 模式 vs. local refresh_token 的论证扎实。但有两个结构性问题在合并前必须解决,否则后续 implementation PR 会跟现有 channel runtime 撞车。
- Actor 重复:
ExternalIdentityBindingGAgent跟现有ChannelUserBindingGAgent(agents/Aevatar.GAgents.Channel.Runtime/UserBinding/ChannelUserBindingGAgent.cs:16)是同一业务实体(channel user 的凭据绑定)。CLAUDE.md 明令禁止按技术功能拆分同一实体,必须复用现有 actor 或在 ADR 里 explicit 反驳。 - Identity scope 跟现有冲突:新 key
(platform, tenant, external_user_id)不含 bot;现有 actor key 含 bot。需要 explicit 决定哪个对,并写明迁移路径。
其他:PKCE verifier 在 grain state 里的处理、client_credentials 在 ADR 文字里的 OAuth 词汇误用、用户从 NyxID 侧反向撤销的一致性、nyx_subject 缓存目的、accepted status 跟外部依赖之间的状态等,逐项 inline。
| @@ -0,0 +1,142 @@ | |||
| --- | |||
| title: "Per-User NyxID Binding via OAuth Broker" | |||
| status: accepted | |||
There was a problem hiding this comment.
status accepted 但 implementation gated on ChronoAIProject/NyxID#549(外部 issue 还没落地)。如果 NyxID#549 协议跟本 ADR 的假设有偏差,aevatar 接的 endpoint shape 也得改。
建议:
- 改成
proposed或accepted-pending-dependency(按本仓库现有 ADR 词汇),等 NyxID#549 contract 冻结再升accepted - 或者在 ADR 里贴 NyxID#549 的契约 freeze 节点(OpenAPI link / proto 文件 hash),明确"以哪个版本的契约为准"
否则这个 ADR 实际处于浮动状态。
|
|
||
| - 用标准 OAuth Authorization Code + PKCE 流程发起 binding(`/oauth/authorize` + `state` 承载 correlation_id + redirect_uri 浏览器跳转) | ||
| - **aevatar 不接收、不持有 user refresh_token**;binding 完成时 NyxID 返回不透明 `binding_id`,aevatar 仅持 `(external_subject_ref) → binding_id` 映射 | ||
| - 每次 turn 用 client_credentials 调 NyxID `POST /oauth/bindings/{binding_id}/token` 拿短期 access_token,塞进 `AgentToolRequestContext` |
There was a problem hiding this comment.
client_credentials 在 OAuth 2.0 标准里语义是 "client 以自己的身份(不代表任何 user)拿 token"。这里实际语义是 "client 凭 binding_id 代用户拿 user-scoped token",不是 client_credentials。
如果 NyxID#549 endpoint 想合规命名:
- 走 RFC 8693 Token Exchange:
grant_type=urn:ietf:params:oauth:grant-type:token-exchange,subject_token=<binding_id>,subject_token_type=urn:nyxid:params:oauth:token-type:binding-id,client 用 client_secret 鉴权(HTTP Basic /client_secret_post) - 或者明确声明这是 NyxID 自定义 broker token endpoint,body 里不出现
grant_type=client_credentials
ADR 文字按当前写法读起来像 "client_credentials = 该 endpoint 的 grant_type",跟 OAuth 标准语义冲突,会把 NyxID#549 contract 引向错误方向。建议改成 "RFC 8693 token exchange,subject_token = binding_id" 或 "NyxID 自定义 broker token endpoint,client 鉴权 = client_secret"。同样适用第 54 行。
| - 用标准 OAuth Authorization Code + PKCE 流程发起 binding(`/oauth/authorize` + `state` 承载 correlation_id + redirect_uri 浏览器跳转) | ||
| - **aevatar 不接收、不持有 user refresh_token**;binding 完成时 NyxID 返回不透明 `binding_id`,aevatar 仅持 `(external_subject_ref) → binding_id` 映射 | ||
| - 每次 turn 用 client_credentials 调 NyxID `POST /oauth/bindings/{binding_id}/token` 拿短期 access_token,塞进 `AgentToolRequestContext` | ||
| - 用户撤销同步 `DELETE /oauth/bindings/{binding_id}`,NyxID 是 source of truth |
There was a problem hiding this comment.
覆盖了 aevatar 主动撤销方向,反方向(用户在 NyxID UI 直接撤销,不经 aevatar)没交代:
- aevatar
(external_subject) → binding_id映射变成悬挂指针 - 下次
IssueShortLivedAsync调用会被 NyxID 拒绝(401? 410?invalid_grant?) - 该错误码该不该触发 aevatar 自动 prune binding actor state?
至少补一段 invalidation flow:
- 哪些 NyxID 错误码语义上等于 "binding 已撤销"
IssueShortLivedAsync拿到该错误码后行为(事件化撤销 binding actor / 让用户重新/init)- 是否要 NyxID 主动 webhook 通知(如果是,跟 Fix NyxID Lark relay authentication and keep relay tokens out of persisted state #366 callback JWT 怎么共用,不共用就要在 ADR 里讲清楚为什么不复用)
否则 "NyxID 是 source of truth" 是单向声明,不是双向一致性保证。
| /init | ||
| -> ChannelConversationTurnRunner 前置 slash-command 路由(不进 LLM) | ||
| -> aevatar 生成 state(=correlation_id) + PKCE pair, | ||
| 落 BindingChallengeIssuedEvent 到 ExternalIdentityBindingGAgent |
There was a problem hiding this comment.
PKCE code_verifier 是 secret material(OAuth 安全模型靠它对抗 code interception,RFC 7636)。BindingChallengeIssuedEvent 落到 actor state 意味着 verifier 进了 grain state / event store。
但 Storage Boundary 表(line 61-67)只列了 User refresh_token: ✗ never,没列 PKCE verifier。这跟 "aevatar grain state / projection / log / metric span attribute 不出现 secret material"(line 131)的笼统宣称在表面上有张力——verifier 是 short-lived 但仍是 secret material。
两种处理方式:
- 用 stateless signed cookie / state token 在浏览器侧 round-trip verifier(OAuth 库标准做法),grain state 里只留 challenge 元信息(correlation_id, expires_at, external_subject_ref),不留 verifier
- 保留落 actor state,但 ADR 把 Storage Boundary 表改成区分 long-lived vs. short-lived(<5min) secret material,明确 PKCE verifier 属于 short-lived 一类,并给出 challenge actor 的 GC 策略
倾向 1,跟 "binding_id = 唯一 opaque pointer,grain state zero secret material" 的叙事更一致。无论选哪个,arch test 守 grain state 的扫描规则需要一起定义。
|
|
||
| | 数据 | aevatar grain state | NyxID | | ||
| |---|---|---| | ||
| | `(platform, tenant, external_user_id) → binding_id` | ✓ | | |
There was a problem hiding this comment.
Identity scope 跟现有 actor 冲突
这里 key 是 (platform, tenant, external_user_id),不含 bot_instance_id。
但现有 ChannelUserBindingGAgent(agents/Aevatar.GAgents.Channel.Runtime/UserBinding/ChannelUserBindingGAgent.cs:12 的 docstring 明确写)keying 是 {bot_instance_id}:{channel}:{sender_canonical_id},含 bot,state 也存 bot / channel / sender_canonical_id / credential_ref。
两个识别尺度直接打架,ADR 没意识到。必须 explicit 决定:
A. per-(platform, user):一个 NyxID identity 跟一个 Lark user 一一对应,跟用户在哪个 bot 里讲话无关。语义最干净(NyxID identity 不属于某个 bot),但要废弃现有 per-bot key,迁移现有数据。
B. per-(platform, user, bot):保留现有 keying,binding 是 (bot, lark user) 维度。语义偏弱(同一 lark user 在 botA / botB 可绑不同 NyxID 账号?产品上不合理),但兼容现状。
倾向 A(NyxID 视角下一个 lark user 就是一个 user,跟 bot 无关;现有 actor 的 bot 字段更像是早期 RFC §5.2b 拆分时的 over-fit)。但这个决定影响 actor 模型、迁移路径、群聊 UX,必须在 ADR 里 explicit 选定,不能新写一个 keying 跟现有冲突还不解释。
| | 数据 | aevatar grain state | NyxID | | ||
| |---|---|---| | ||
| | `(platform, tenant, external_user_id) → binding_id` | ✓ | | | ||
| | `nyx_subject`(opaque `sub` claim) | ✓ 缓存以加速 resolve | ✓ source of truth | |
There was a problem hiding this comment.
nyx_subject 标 "✓ 缓存以加速 resolve" —— 加速什么 resolve 没说。
如果只是为了 IssueShortLivedAsync,链路其实是 external_subject → binding_id → NyxID 调用,nyx_subject 不在调用链上;如果是给 ConversationGAgent scope 用、或者给 audit log 用,应该写明用途。
如果没有清晰用途,建议直接不缓存(少一份要保持一致的状态,binding_id 已是 opaque 唯一指针)。如果有用途,写在表里。
| NyxID #549 之前可平行落地的部分(本身不依赖 broker endpoint): | ||
|
|
||
| - `INyxIdCapabilityBroker` 接口 + proto | ||
| - `ExternalIdentityBindingGAgent` + projection + `IExternalIdentityBindingQueryPort`(state 仅存 `binding_id` + `nyx_subject`,均为 opaque) |
There was a problem hiding this comment.
Actor 重复,违反 CLAUDE.md "Actor 即业务实体"
现有 agents/Aevatar.GAgents.Channel.Runtime/UserBinding/ChannelUserBindingGAgent.cs:16 已经是同一个业务实体(per-bot-per-user 的 channel user 凭据绑定):
- keying
{bot_instance_id}:{channel}:{sender_canonical_id} - state:
bot,channel,sender_canonical_id,credential_ref,preferences,created_at,updated_at - commands/events:
BindUserCredentialCommand/UserCredentialBoundEvent/UnbindUserCredentialCommand等
新引入 ExternalIdentityBindingGAgent 把"同一 channel user 的 NyxID binding"放到第二个 actor(甚至第二个模块 Aevatar.GAgents.Channel.Identity —— 见 line 126)里,违反 CLAUDE.md:
Actor 即业务实体:一个 actor = 一个业务实体(数据与方法同住);禁止按技术功能(读/写/投影)拆分同一业务实体为多个 actor。
binding_id 在语义上就是一种 credential_ref(不透明的 user 凭据指针),不需要再起一个 actor。建议:
- 复用
ChannelUserBindingGAgent,把credential_ref解释/重命名为binding_id,或者新增 typed sub-messagenyx_binding { binding_id, nyx_subject }进现有 state BindingChallengeIssuedEvent/ExternalIdentityBoundEvent接到同一 actor 的事件流- ADR 应明确"扩展现有 ChannelUserBindingGAgent"而不是"新增并列 actor + 并列模块"
如果坚持新 actor,必须在 ADR 里 explicit 论证为什么二者不是同一业务实体(比如"binding 跟 bot 解耦,不是同一 keying"——但那触发上面 line 63 的 identity scope 问题,得一起讲清楚)。
Pick the explicit-justification branch for the ExternalIdentityBindingGAgent vs existing ChannelUserBindingGAgent overlap: per-(platform, user) NyxID identity and per-(bot, channel, sender) bot-scoped user state are different business entities at different scopes, so two actors are kept and the existing credential_ref field is marked redundant for migration by the implementation PR. Restate per-turn token issuance as RFC 8693 token-exchange with subject_token=binding_id (was: client_credentials, semantically wrong for acting on behalf of a bound user). Move PKCE code_verifier out of BindingChallengeIssuedEvent into a stateless HMAC-signed state token so nothing secret-material-shaped enters grain state. Add the reverse-direction revoke flow (NyxID-side revoke surfaces as invalid_grant on next token-exchange; aevatar fires unbind event and re-prompts /init). Mandate private-channel delivery for the /init OAuth URL to defeat the state-hijack attack flagged in PR review (group recipient could otherwise complete login with their own NyxID account and bind sender to a foreign identity). Drop unused nyx_subject cache from grain state. Separate aevatar service-level secrets (state HMAC key, client_secret) as infrastructure secret outside the #375 user-secret invariant. status: accepted -> proposed until NyxID#549 contract is frozen. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ADR-0017 ReviewOverall direction is solid — broker model with zero long-lived secret material in aevatar grain state is the correct call. The threat model comparison table is well-argued. Found several gaps that should be addressed before accepting: 1. 文件名与标题术语不一致文件名 2. ExternalIdentityBindingGAgent 缺 grain identity 方案ADR 引入了 按 AGENTS.md 的要求:
3. nyx_subject 缓存无失效策略Storage boundary 表说 aevatar 缓存
建议至少标注为 known gap,或定义一个 TTL / refresh-on-401 策略。 4. Bot owner 模式共存逻辑缺失ADR 多次说"现有 bot owner-shared 模式继续运行(不 regression)",但没有描述共存逻辑:
这直接影响 5. 初始 access_token 处理未说明Callback 流程中 aevatar 用 code 换到
虽然只是初始流程的小细节,但涉及与 NyxID #549 的契约设计(是否需要在 broker_binding scope 下返回 access_token),建议明确。 6. externalSubject 参数应强类型
7. 重复 /init 的幂等语义未定义已绑定用户再次
这影响 8. scope 命名与 NyxID #549 不一致ADR 流程图写 9. Projection readiness / query-time priming
建议在 Dependencies 的"平行落地"部分补充 projection readiness 的保证机制(如 callback handler 同步等 projection、或 turn 路径容忍短暂的 resolve miss 并回落)。 小结ADR 的核心决策(broker > local、zero secret material、gated on NyxID #549)是正确的。上述问题大多是实现前应该补齐的细节,其中 #2 (grain identity)、#4 (共存逻辑)、#7 (幂等语义) 对后续 implementation PR 影响最大,建议优先明确。 |
eanzhao
left a comment
There was a problem hiding this comment.
A few observations beyond eanzhao's review:
1. Local commit 056f05c5 not pushed
git log --oneline -3 shows 056f05c5 Address ADR-0017 review feedback sitting locally, containing fixes for the inline comments (status→proposed, client_credentials→token exchange, PKCE stateless state_token, nyx_subject cache removal, revocation flow). Push before merging.
2. Identity scope + actor split still has tension
The ADR argues ExternalIdentityBindingGAgent (key=platform/tenant/external_user_id) and ChannelUserBindingGAgent (key=bot/channel/sender) are "different scope entities". But binding_id semantically IS a credential pointer — same category as credential_ref, which the ADR itself plans to deprecate because the new actor subsumes it. Alternative: extend ChannelUserBindingState with a nyx_binding sub-message. The ADR should at minimum add a dedicated section explaining why "two actors" > "one extended actor" and give a concrete deprecation/migration window.
3. AuthContext.user_credential_ref not addressed
channel_contracts.proto AuthContext.field(2) carries user_credential_ref for outbound "send on behalf of user" semantics. How does this interact with the broker cap_handle? Does the outbound adapter call IssueShortLivedAsync?
4. status accepted with gated dependency
Status is accepted while implementation is gated on NyxID#549 (unresolved external dependency). Should be proposed (as already fixed locally) or NyxID#549 OpenAPI contract should be pinned.
- Rename file to oauth-broker.md to match the title. - Spell out actor architecture in a dedicated section: justify ExternalIdentityBindingGAgent (platform-level identity entity) versus the existing per-bot ChannelUserBindingGAgent (per-bot preferences), and pin a deprecation window for credential_ref. - Add an Outbound Send section covering AuthContext.user_credential_ref semantics under broker mode and the proto deprecation path. - Decision section: forbid bot-owner fallback for unbound senders in both 1:1 and group chats, define /init idempotency, describe initial access_token disposal in the callback handler. - Strong-type INyxIdCapabilityBroker parameters as ExternalSubjectRef. - Add Projection Readiness subsection: write-side projection preflight in the callback handler; turn path must not query-time prime. - Align scope name to urn:nyxid:scope:broker_binding so it tracks the NyxID#549 contract. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ADR-0017 Review Resolution Summary逐条 map 到落地的 commit / 章节,方便 review pass 时勾掉。 Round 1 — commit
|
| # | 落点 |
|---|---|
| 1 文件名 | rename 完成 |
| 2 grain identity scheme | §Actor Architecture key 列明确 (platform, tenant, external_user_id) = ExternalSubjectRef |
| 3 nyx_subject 缓存 | Storage Boundary 标 ✗,不缓存 |
| 4 bot owner 共存 | Decision bullet:未绑定 sender 1:1 与群聊一律强制 /init,bot owner 不享默认身份特权;Consequences 同步说明 |
| 5 初始 access_token | Decision 流程图 + Storage Boundary 表:callback handler 内一次性使用(可选 /oauth/userinfo 调用做展示文案),handler 退出前丢弃,永不持久化 |
| 6 externalSubject typed | §INyxIdCapabilityBroker 接口签名全部改 ExternalSubjectRef(proto-typed value object) |
7 /init 幂等 |
Decision bullet:已绑定 sender 重复 /init 回复"先 /unbind 再 /init 切账号",不发新 OAuth |
| 8 scope 命名 | 流程图 / Decision / Dependencies 全部改 urn:nyxid:scope:broker_binding;契约 token 名 pin 到 NyxID#549 freeze |
| 9 projection readiness | 新增 §Dependencies → Projection Readiness:callback handler 写侧预挂接 projection 同步等水位,turn 路径禁 ES replay / state mirror / priming |
如果哪条论证不满意请直接戳,可以再迭代;否则可以把对应 conversation 标 resolved。
ADR-0017 v2 Review(修复确认 + 新发现)上一轮 9 个问题全部已修复,质量提升明显。逐条确认 + 新发现如下: 原始 9 点 — 全部 resolved ✅
新发现A. Token Exchange 方案与 NyxID #549 提案存在显著分歧ADR 现在用 RFC 8693 但 NyxID #549 当前提案是 这是不同的端点 + 不同的 grant_type + 不同的鉴权方式。NyxID 侧的 虽然 ADR 的方案更标准(复用现有 RFC 8693 框架而非另起新端点),但这是与 #549 的重大偏差。建议:
B.
|
eanzhao
left a comment
There was a problem hiding this comment.
Round 2 把上一轮 7+1 全补到了,还顺手抓到了我漏的几个点(AuthContext.user_credential_ref、/init 幂等、Projection Readiness 写侧预挂接、强类型 ExternalSubjectRef、scope URN 化)。新引入的 §Outbound Send / §Actor Architecture / §Projection Readiness 三段把架构说得很清楚。
但 §Outbound Send 撞到 CLAUDE.md 一条红线(自定义字符串格式 + compat shim),另两处再调一下:
- 必须改:
user_credential_ref重载承载序列化ExternalSubjectRef(line 121)—— 同时违反 "统一 Protobuf,禁止自定义字符串格式" 和 "删除优先,不留 compat shim" - 必须改:bot-owner 模式硬切(line 195)——
credential_ref字段有 deprecation window,但 bot-owner 用户体验本身没有;切上线那一刻全员 sender 断流 - 建议改:callback 阶段调
/oauth/userinfo拿 sub claim(line 61)——sub在同一兑换返回的id_token里就有,多余的 round-trip
小问题(不阻断):projection 写侧等待 timeout=3s 偏紧 + UX 文案易被误读为绑定失败;state-token HMAC key rotation 需要 kid;ExternalSubjectRef 应该住在 Channel.Abstractions 而非 Channel.Identity,否则会形成反向依赖。
| (grant_type=authorization_code, code, code_verifier, client_secret) | ||
| -> { access_token, binding_id } (broker_binding scope 下不返 refresh_token) | ||
| 初始 access_token 处理:可选地一次性用于调 /oauth/userinfo 拿 sub claim | ||
| 做"已绑定 <name>"展示文案;**永不持久化、永不复用** |
There was a problem hiding this comment.
sub claim 在同一次 token 兑换返回的 id_token 里就有(OIDC 标准),不需要再调 /oauth/userinfo 一跳。只有展示 name/email 才需要 userinfo。
建议改成:
可选地从同一兑换返回的
id_token解码sub/nameclaim 做"已绑定 "展示;不调 userinfo,不持久化 token
少一次 NyxID round-trip,且语义跟 OIDC 标准对齐。
|
|
||
| `IChannelOutboundPort.ContinueConversationAsync(... AuthContext auth ...)` 在 `OnBehalfOfUser` 模式下用 `AuthContext.user_credential_ref`(`agents/Aevatar.GAgents.Channel.Abstractions/protos/channel_contracts.proto:138`)选择代用户身份。Broker 模式下: | ||
|
|
||
| - 在线协议保留 `user_credential_ref string = 2`,语义改为承载序列化的 `ExternalSubjectRef`(form: `lark:tenant_X:user_Y`),由 outbound adapter 反序列化 |
There was a problem hiding this comment.
双重违反 CLAUDE.md
在线协议保留 user_credential_ref string = 2,语义改为承载序列化的 ExternalSubjectRef
(form: `lark:tenant_X:user_Y`)
两个独立红线:
- 统一 Protobuf:CLAUDE.md "序列化" 段写明 "State、领域事件、命令、回调载荷、快照、缓存载荷、跨 Actor/跨节点内部传输对象全部使用 Protobuf;禁止 JSON/XML/自定义字符串格式"。
lark:tenant_X:user_Y这种带分隔符 + 含解析约定的字符串就是自定义字符串格式(外部协议除外,但AuthContext是仓库内部跨层契约)。 - 删除优先:CLAUDE.md "删除优先:空转发、重复抽象、无业务价值代码直接删除,不保留兼容空壳" + "Don't use feature flags or backwards-compatibility shims when you can just change the code"。把旧 string field 重载承载新语义是典型 compat shim。
且 line 123 已经写了"长期演进:加 typed ExternalSubjectRef external_subject = 4"——意思是先重载、后加 typed。但 broker 路径整体 gated on NyxID#549,根本不存在必须靠 string 重载兜过的过渡期,可以一次性加 typed external_subject = 4,broker outbound 只读 typed 字段,旧 user_credential_ref 留给非 broker 代码继续使用按既有语义自然 deprecate。中间不需要重载期。
改法:删掉 line 121 这一段;line 123 的 typed 字段直接作为 broker 模式的唯一 outbound 身份字段。
| - 新增 OAuth callback endpoint `/api/oauth/nyxid-callback`(标准 OAuth client redirect 处理,不是 webhook),含写侧预挂接 projection 等待 | ||
| - `ChannelConversationTurnRunner.RunInboundAsync` 开头加 slash-command 前置路由(`/init`、`/unbind`),`/init` 幂等 | ||
| - `BuildReplyMetadata` 改成 `ResolveAsync` + `IssueShortLivedAsync`;metadata key 从 `nyxid.access_token` 改为 `nyxid.capability_handle`(诚实表达"短期、scoped、可撤销") | ||
| - 未绑定 sender(无论 1:1 还是群聊)统一强制 `/init`,不回落 bot owner;现有 bot owner-shared 模式在 implementation PR 切上线那一刻终止,bot owner 失去"默认用户身份"特权 |
There was a problem hiding this comment.
bot-owner 模式硬切,没有迁移计划
现有 bot owner-shared 模式在 implementation PR 切上线那一刻终止,bot owner 失去"默认用户身份"特权
§Actor Architecture 给了 credential_ref 字段 deprecation window("broker 模式上线 + 一个有 channel-runtime proto break 的 release 后"),但没给 bot-owner 用户体验本身。切上线那一刻所有现有 Lark sender 在 bot 里都拿不到响应,必须各自 DM 自己 /init 后才恢复,是产品断崖。
至少要在 ADR 里 explicit 一句迁移策略。可选方案(产品决策,ADR 只锁框架):
- A. 双轨期:新 bot 默认 broker,旧 bot 保持 owner-shared;新增 bot 设置开关,bot owner 自己挑迁移时机
- B. 单轨硬切 + 通知期:broker 上线前 N 天起,未绑定 sender 收到的 reply 加引导话术("X 月 X 日起需
/init"),到期硬切 - C. 单轨硬切(当前文本):明确接受首日所有 sender 一次性走
/init,并写明 comm 责任在 bot owner
ADR 不需要选哪个,但要 explicit 说明这是产品决策点 + 留 placeholder,否则 implementation PR 会被迫现做。
| @@ -0,0 +1,142 @@ | |||
| --- | |||
There was a problem hiding this comment.
§Projection Readiness 写侧预挂接 projection 机制未指定。在当前 turn 内同步等 projection 追上 actor committed version 需显式端口(如 IProjectionReadinessPort)。建议在 Consequences 列为新增接口。
| ``` | ||
|
|
||
| ## Storage Boundary | ||
|
|
There was a problem hiding this comment.
初始 access_token "可选地" 调 /oauth/userinfo 获取展示名——"可选" 造成 UX 路径不一致:有些 binding 完成页显示 display name,有些不显示。建议 ADR 明确二选一(始终调 or 完全不调)。
| NyxID #549 之前可平行落地的部分(本身不依赖 broker endpoint): | ||
|
|
||
| - `INyxIdCapabilityBroker` 接口 + proto | ||
| - `ExternalIdentityBindingGAgent` + projection + `IExternalIdentityBindingQueryPort`(state 仅存 `binding_id` + `nyx_subject`,均为 opaque) |
There was a problem hiding this comment.
弃用窗口间 credential_ref(旧) 与 binding_id(新) 可能共存。"不读不写" 应补充 fallback 顺序:查询优先 binding_id,未命中再 fallback credential_ref。否则 migration 中间态行为未定义。
- Decide id_token-decoded sub for the bound-display string instead of
hitting /oauth/userinfo: the OIDC token response already carries sub,
so the extra round-trip is unnecessary. Initial access_token is
unconditionally discarded by the callback handler.
- Define /unbind: slash-command -> RevokeBindingAsync ->
DELETE /oauth/bindings/{binding_id} -> revoked event. NyxID is the
source of truth, so a NyxID failure aborts the local revoke instead
of risking source-of-truth divergence.
- Drop the string overload of AuthContext.user_credential_ref; broker
outbound reads only the new typed external_subject field. Justified
by CLAUDE.md "unified protobuf, no compat shim".
- Add credential_ref deprecation-window rule: turn path reads only
binding_id, never falls back to credential_ref.
- Spell out the bot-owner shutdown options (A/B/C) and require the
implementation PR to pick one for the release runbook.
- Note IProjectionReadinessPort as a write-side port and clarify the
query-side boundary in §Projection Readiness; turn path remains
forbidden from priming.
- Add §Divergence from NyxID#549: pick RFC 8693 token-exchange over
the dedicated client_credentials endpoint in the original sketch
and link to the alignment comment to be posted on NyxID#549.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ADR-0017 v2 Review Resolution SummaryRound 3 — commit
NyxID 同步:ChronoAIProject/NyxID#549 已贴 alignment comment 提议契约对齐 RFC 8693 token-exchange 与 如果哪条仍不满意请直接戳。 |
ADR-0017 v3 Reviewv2 的 A/B/C 三个问题全部已修复:
v3 新增内容检查通过:
LGTM — 没有阻塞项。 ADR 质量足够支撑后续 implementation PR。 |
ADR-0017 Final Review前几轮 9+3 问题全部 resolved,核心设计无阻塞项。以下是遗留的实现细节问题,不阻断 ADR 合并但建议在 implementation PR 前补齐: 1. HMAC state_token 设计细节缺失
2. 并发
|
| 错误 | HTTP 响应 | 用户可见文案 |
|---|---|---|
| state_token 过期/篡改 | 400 | "链接已过期,请重新 /init" |
| token_exchange 失败 | 502 | "NyxID 绑定失败,请重试" |
| projection 超时 | 200 (异步) | "绑定已写入,稍后重发消息" |
特别是 state_token 过期(exp ≤5min,网络延迟 + 用户犹豫就可能超时)需要有明确的 retry UX,而不是一个 generic 500。
4. 外发 IssueShortLivedAsync 无容错
§Outbound Send 定义"每次发送前调 IssueShortLivedAsync,不缓存"。但没有说明:
- NyxID 短暂不可用时(5xx / timeout):retry?backoff?fail silently?还是整个 outbound 失败?
- 并发 outbound(同用户多条消息并行发送):多次 token-exchange 是否被 NyxID rate limit?是否需要 per-binding 本地 semaphore?
建议至少补一句"NyxID 不可用时 outbound fail with error,不 fallback 到 bot owner token",避免实现者自行加 fallback 破坏 zero-secret 不变量。
5. /unbind 后立即 /init 的时序
用户 /unbind → DELETE /oauth/bindings/{binding_id} → NyxID 成功 → 本地 ExternalIdentityBindingRevokedEvent → projection 物化。如果用户在 projection 物化前立即 /init:
/unbind -> DELETE 成功 -> ResolveAsync 仍返回旧 binding_id (projection 未更新)
/init -> ResolveAsync hit -> 返回"已绑定,请先 /unbind"(误判)
建议 /unbind handler 同步等 projection 水位(复用 IProjectionReadinessPort),或 /init 的幂等检查读 actor 直接态而非 projection。
总结
以上 5 点都是实现层面的边界条件,不阻断 ADR 合并。建议在 Consequences 或 Dependencies 里加一段 "Implementation Notes" 把这些记下来,避免后续 PR 遗漏。
Add §Implementation Notes (5 subsections) so the implementation PR inherits the boundaries instead of relitigating them: - HMAC state_token: Protobuf payload, deterministic field order, kid + key rotation grace > exp. - /init concurrency: actor commit is idempotent; aevatar does not require a unique constraint on the NyxID side, so NyxID is free to reap orphans on its own schedule. - Callback handler error UX: classified table with HTTP code and user-facing copy. - IssueShortLivedAsync failures: outbound and turn paths fail loudly with no fallback; rate limiting authority lives in NyxID. - /unbind -> /init timing: the unbind handler waits on the projection (write-side completion path) so the next /init does not see a stale "already bound" reply. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ADR-0017 Final Review ResolutionRound 4 — commit
一些选择理由
如还有边界细节遗漏请戳;否则可以勾掉 final review。 |
Land the parts of ADR-0017 that do not depend on the NyxID broker endpoint (gated on ChronoAIProject/NyxID#549) so the capability seam and actor contract stop floating in the air: - Two new csprojs: Aevatar.GAgents.Channel.Identity.Abstractions (proto value objects + interfaces) and Aevatar.GAgents.Channel.Identity (actor + DI module). - Proto: ExternalSubjectRef, BindingId, CapabilityScope, CapabilityHandle, BindingChallenge in Abstractions; ExternalIdentityBindingState, CommitBindingCommand, RevokeBindingCommand, ExternalIdentityBoundEvent, ExternalIdentityBindingRevokedEvent in Identity. State holds only the opaque binding_id — never refresh_token. - ExternalIdentityBindingGAgent: single-threaded actor that commits bindings idempotently (ADR §Implementation Notes #2) and revokes as a no-op when no active binding exists. - INyxIdCapabilityBroker, IExternalIdentityBindingQueryPort, IProjectionReadinessPort, BindingRevokedException — interfaces sized to the ADR's broker contract; a concrete NyxIdRemoteCapabilityBroker lands in a follow-up PR. - InMemoryCapabilityBroker test fake plus xUnit coverage of the actor (idempotent commit, rebind-after-revoke) and the broker (revoke, invalid_grant signalling). Builds clean across the solution; the new Identity-tagged tests pass. DI module is a stub for now — the production wiring of the broker, projection, and OAuth callback endpoint comes in subsequent PRs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
| } | ||
|
|
||
| if (!string.IsNullOrEmpty(State.BindingId)) | ||
| { |
There was a problem hiding this comment.
[Consensus: 2 models] severity=major, category=design
Both HandleCommitBinding and HandleRevokeBinding accept cmd.ExternalSubject and apply it directly without verifying it matches the actor's own identity. Because actor IDs are deterministic from ExternalSubjectRef.ToActorId(), a routing/dispatch bug that misroutes a command would silently persist a binding under a mismatched key (State.ExternalSubject is overwritten by ApplyBound from the event payload). Add a fail-fast guard at the top of each handler that compares cmd.ExternalSubject.ToActorId() against this.GetPrimaryKeyString() and rejects mismatches. Two of three reviewers also flagged that ApplyBound should not overwrite ExternalSubject since it's an actor-identity invariant.
Per-model verbatim
- glm-5.1 (L58):
HandleCommitBindingandHandleRevokeBindingaccept anycmd.ExternalSubjectwithout verifying it matches the actor's own identity. If a routing/dispatch bug sends aCommitBindingCommandto the wrong actor, the binding would be persisted under a mismatched state (State.ExternalSubjectoverwritten byApplyBoundto the command's subject rather than the actor's key).
Suggested fix — resolve the actor's own identity at activation and assert equivalence:
// In ActivateAsync or via a property set by the runtime:
private ExternalSubjectRef _ownSubject; // derived from actor ID
// At the top of HandleCommitBinding / HandleRevokeBinding:
if (!cmd.ExternalSubject.Equals(_ownSubject))
{
Logger.LogWarning("Command rejected: subject mismatch...");
return;
}This also means the ApplyBound transition should not overwrite ExternalSubject from the event — it should only set binding_id and bound_at, since the subject is an invariant of the actor's identity.
- kimi-k2p6 (L92):
HandleRevokeBindingaccepts anExternalSubjectparameter on the command but ignores any potential mismatch with the actor's own identity. Since actor IDs are deterministic perExternalSubjectRef, a command arriving for a different subject is architecturally impossible under normal dispatch; however, adding an explicit guard (if (cmd.ExternalSubject.ToActorId() != this.GetPrimaryKeyString())) would make the invariant self-documenting and fail-fast if routing ever breaks.
| /// </summary> | ||
| internal sealed class InMemoryCapabilityBroker : INyxIdCapabilityBroker | ||
| { | ||
| private readonly ConcurrentDictionary<string, BindingId> _bindings = new(StringComparer.Ordinal); |
There was a problem hiding this comment.
[Consensus: 2 models] severity=minor, category=bug
_revokedBindings is a plain HashSet<string> while the adjacent _bindings field uses ConcurrentDictionary. RevokeBindingAsync writes to _revokedBindings and IssueShortLivedAsync reads from it — concurrent test scenarios could corrupt the set or produce stale reads. Replace with ConcurrentDictionary<string, byte> (using TryAdd / ContainsKey) so both fields share the same thread-safety model.
Per-model verbatim
- deepseek-v4-pro (L22):
_revokedBindingsis a non-thread-safeHashSet<string>mutated insideRevokeBindingAsyncand read insideIssueShortLivedAsync, yet the class usesConcurrentDictionaryin adjacent fields, implying potential concurrent access. Even though this is a test fake, concurrent test scenarios could corrupt the set. Fix: wrap withlockor useConcurrentDictionary<string, byte>for the revoked set. - glm-5.1 (L14):
_revokedBindingsis a plainHashSet<string>while_bindingsusesConcurrentDictionary.RevokeBindingAsyncwrites to_revokedBindingsandIssueShortLivedAsyncreads from it — these can be called concurrently in integration tests. Replace withConcurrentDictionary<string, byte>(or wrap accesses in a lock) to match the thread-safety model of_bindings:
private readonly ConcurrentDictionary<string, byte> _revokedBindings = new();
// .Add → .TryAdd, .Contains → .ContainsKey| @@ -0,0 +1,49 @@ | |||
| namespace Aevatar.GAgents.Channel.Identity.Abstractions; | |||
There was a problem hiding this comment.
[deepseek-v4-pro] severity=major, category=test
No tests cover EnsureValid or ToActorId. These methods gate actor-ID construction and reject invalid input (empty fields, colon characters). Missing coverage here risks silent regressions when field-validation rules change. Add unit tests for: valid subject passes, null/empty platform rejects, null/empty external_user_id rejects, colon-in-field rejects, and ToActorId produces the expected colon-joined format.
|
|
||
| _agent = new ExternalIdentityBindingGAgent | ||
| { | ||
| Services = _serviceProvider, |
There was a problem hiding this comment.
[kimi-k2p6] severity=major, category=test
The IEventSourcingBehaviorFactory<> and InMemoryEventStore registrations are hand-rolled test wiring that may drift from production runtime behavior. Since this is scaffolding, consider adding a follow-up task to exercise the agent through the Orleans test cluster / in-memory silo rather than direct instantiation, so that lifecycle (activation, rehydration, deactivation) and dispatch semantics are covered.
| return; | ||
| } | ||
|
|
||
| if (!string.IsNullOrEmpty(State.BindingId)) |
There was a problem hiding this comment.
[codex] severity=major, category=bug
Duplicate /init commits are discarded locally, but the newly issued NyxID binding_id is left active with no cleanup signal back to the callback handler. ADR-0017 requires best-effort DELETE /oauth/bindings/{binding_id} for the rejected binding; otherwise concurrent /init can create orphan active bindings in NyxID. Make the commit path return an accepted/rejected result (or expose an explicit command result event) so the callback can delete the rejected binding id.
| var document = await _reader.GetAsync(readmodelId, ct).ConfigureAwait(false); | ||
| if (document is not null && string.Equals(document.LastEventId, eventId, StringComparison.Ordinal)) | ||
| return; | ||
|
|
There was a problem hiding this comment.
[kimi-k2p6] severity=major, category=bug
Exact equality document.LastEventId == eventId is fragile: if the projection processes a later event before the next poll, LastEventId advances and this check falsely times out. The interface docs say "advance past the version", but the implementation checks exact match. Switch to a monotonic version comparison (e.g. document.StateVersion >= expectedVersion) and update the interface to accept a version watermark instead of an opaque event id.
| _ = TrySendImmediateLarkReactionAsync(activity, registration, ct); | ||
|
|
||
| var inbound = ToInboundMessage(activity); | ||
| if (await TryHandleSlashCommandAsync(activity, inbound, registration, runtimeContext, ct) is { } slashResult) |
There was a problem hiding this comment.
[deepseek-v4-pro] severity=major, category=design
TryHandleSlashCommandAsync runs before TryHandleWorkflowResumeAsync (line 69). A workflow-resume message starting with /init or /unbind (e.g., /init previous-context) will be silently consumed as a slash command, dropping the resume. While unlikely given current slash-command match patterns, the ordering should be documented or inverted — workflow resume is the normal path; slash commands are rare.
| _ = TrySendImmediateLarkReactionAsync(activity, registration, ct); | ||
|
|
||
| var inbound = ToInboundMessage(activity); | ||
| if (await TryHandleSlashCommandAsync(activity, inbound, registration, runtimeContext, ct) is { } slashResult) |
There was a problem hiding this comment.
[codex] severity=major, category=security
This only intercepts /init and /unbind; every other inbound message still continues into the existing LLM flow. Once identity ports are registered, an unbound sender can send any non-slash text and still use the legacy bot-owner credentials, which violates ADR-0017's "unbound sender must /init, no bot-owner fallback" rule. Add a pre-LLM binding gate for all inbound turns when broker mode is enabled: resolve ExternalSubjectRef, return /init guidance on miss, and only proceed to LLM after a binding exists.
| Convert.ToBase64String(bytes).TrimEnd('=').Replace('+', '-').Replace('/', '_'); | ||
|
|
||
| private static byte[] Base64UrlDecode(string value) | ||
| { |
There was a problem hiding this comment.
[mimo-v2.5-pro] severity=minor, category=perf
HmacSign converts _options.StateTokenHmacKey to byte[] via Encoding.UTF8.GetBytes() on every call. Since StateTokenCodec is registered as a singleton, cache the key bytes in the constructor to avoid repeated allocations:
private readonly byte[] _hmacKeyBytes;
public StateTokenCodec(NyxIdBrokerOptions options, TimeProvider? timeProvider = null)
{
_options = options ?? throw new ArgumentNullException(nameof(options));
_timeProvider = timeProvider ?? TimeProvider.System;
_hmacKeyBytes = Encoding.UTF8.GetBytes(_options.StateTokenHmacKey);
}| IConfiguration? configuration = null) | ||
| { | ||
| ArgumentNullException.ThrowIfNull(services); | ||
|
|
There was a problem hiding this comment.
[glm-5.1] severity=minor, category=design
AddHttpClient<NyxIdRemoteCapabilityBroker>() registers the broker as transient, but TryAddSingleton<INyxIdCapabilityBroker> captures that transient instance forever. While HttpClient is safe to hold long-term, NyxIdBrokerOptions is resolved once via IOptions<> snapshot and never refreshed. If options are reloaded via IOptionsMonitor, the singleton broker won't pick up changes.
Consider using IOptionsMonitor<NyxIdBrokerOptions> inside the broker, or document that options changes require an app restart.
| } | ||
| catch (Exception ex) | ||
| { | ||
| logger.LogError(ex, "OAuth callback authorization-code exchange failed for correlation {CorrelationId}", correlationId); |
There was a problem hiding this comment.
[deepseek-v4-pro] severity=minor, category=design
actorRuntime.CreateAsync<ExternalIdentityBindingGAgent>(actorId, ct) can throw if activation fails. Wrap in try/catch and return a 503 with a user-friendly message (matching the existing error-handling pattern around the token exchange at lines 96–112) rather than letting the exception propagate to the framework's default error handler.
| } | ||
| return null; | ||
| } | ||
|
|
There was a problem hiding this comment.
[mimo-v2.5-pro] severity=minor, category=design
The revocation webhook handler lacks error handling around actor.HandleEventAsync. If the actor throws (e.g. concurrency conflict, serialization error), the raw exception propagates to NyxID as a 500 with potentially sensitive details. Consider wrapping in try-catch and returning a generic 500 message, matching the pattern used in the OAuth callback handler.
| @@ -0,0 +1,53 @@ | |||
| using Aevatar.CQRS.Projection.Stores.Abstractions; | |||
There was a problem hiding this comment.
[kimi-k2p6] severity=minor, category=test
No tests cover the readiness port's polling loop, timeout path, or the exact-match logic. Add tests for: successful catch-up, timeout expiration, and the scenario where the projection advances past the expected event before the poll observes it.
| _agent = new ExternalIdentityBindingGAgent | ||
| { | ||
| Services = _serviceProvider, | ||
| EventSourcingBehaviorFactory = |
There was a problem hiding this comment.
[mimo-v2.5-pro] severity=minor, category=test
Tests call HandleCommitBinding directly rather than routing through HandleEventAsync with a properly constructed EventEnvelope. This bypasses the runtime's event routing and [EventHandler] dispatch. Consider adding at least one integration-style test that constructs an EventEnvelope with a packed CommitBindingCommand payload and calls HandleEventAsync to pin the full dispatch path.
…andling
Three consensus issues from the post-c67fe537 review pass:
1. **BLOCKER (3 reviewers, L125):** WaitForEventAsync was passing the
outer command-envelope id (Guid.NewGuid in the callback) but the
projector wrote stateEvent.EventId (minted by the event-sourcing
infrastructure during PersistDomainEventAsync). The two ids never
match, so the readiness handshake always timed out on the happy
path. Replace event-id-based waiting with binding-state polling:
WaitForBindingStateAsync(externalSubject, expectedBindingId, …)
polls the readmodel until BindingId matches, or until the field
clears for the revoke case. The interface now describes the
semantics the callback actually needs rather than the event-store
primitives it can't reliably observe.
2. **MAJOR (4 reviewers, L131):** Slash-command path hardcoded
ExternalSubjectRef.Tenant = "" for every platform. Lark's identity
key includes open_tenant_id and two users with the same
external_user_id in different tenants are distinct subjects, so
the empty default collapsed bindings across tenants. New
ResolveTenant helper:
1) prefers `open_tenant_id` / `tenant` from
InboundMessage.Extra (platform adapter populated),
2) falls back to registration.ScopeId so bindings stay at
least per-bot-scoped (each bot is registered to one tenant),
3) refuses the slash command when neither is available rather
than committing a tenant-collapsed binding.
3. **MAJOR (2 reviewers, L111):** RevokeBindingByIdAsync collapsed
every 4xx into "binding already gone", masking 401/403 client-auth
errors and 422 validation failures as successful unbinds. Now only
404 / 410 short-circuit as idempotent success — anything else
surfaces body context in the log and re-throws so the caller can
decide whether to retry or abort.
Tests: 39 Identity-tagged tests still pass, full
`dotnet build aevatar.slnx` clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Multi-reviewer Consensus Fixes (commit
|
| # | Severity / Reviewers | 内容 | 落点 |
|---|---|---|---|
| L125 | BLOCKER / 3 reviewers (deepseek-v4-pro, codex, kimi-k2p6) | WaitForEventAsync 传 commitEnvelope.Id(callback 里 Guid.NewGuid),projector 写的是 stateEvent.EventId(PersistDomainEventAsync 里 ES infra 生成)。两者永远不等,readiness 永远超时 → happy path 死掉 |
重写 IProjectionReadinessPort:WaitForBindingStateAsync(externalSubject, expectedBindingId?, timeout, ct) —— 直接 poll readmodel 的 BindingId 字段(null = 等 revoke)。callback 改用新签名,语义明确,不绕 ES infra |
| L131 | MAJOR / 4 reviewers (deepseek-v4-pro, codex, glm-5.1, kimi-k2p6) | ExternalSubjectRef.Tenant = "" 硬编码 → Lark 多租户语义崩塌(同 open_user_id 跨租户共享一个 binding actor) |
新 ResolveTenant(inbound, registration) 三级降级:① inbound.Extra["open_tenant_id"] / ["tenant"](platform adapter 填) → ② registration.ScopeId(每个 bot 注册到唯一租户)→ ③ 拒绝 slash command(production 必须要排查配置,而不是写一个 tenant-collapsed binding) |
| L111 | MAJOR / 2 reviewers (codex, glm-5.1) | RevokeBindingByIdAsync 把所有 4xx 当 idempotent success → 401/403 client-auth、422 validation 全被吞 → 用户看到 /unbind 成功但 NyxID 那边 binding 还活着 |
只对 404 / 410 short-circuit;其他 4xx / 5xx 把 body 截断后入 log,然后 EnsureSuccessStatusCode() 抛,让调用方决定 retry / abort |
验证
dotnet build aevatar.slnx— 0 errorsdotnet test --filter "FullyQualifiedName~Identity"— 39 passed,0 failed
一些选择理由
IProjectionReadinessPort改 binding-state semantic:event-id 的 ES infra 暴露面我们够不到(PersistDomainEventAsync没返回 stateEvent),用 version 也得 actor 暴露;直接 poll readmodel 的业务字段反而最直白 —— callback 真正想知道的就是"用户回到 Lark 发消息时 ResolveAsync 能看到这个 binding"。- Tenant fallback 到
registration.ScopeId:不是完美的 platform-tenant 映射,但把 binding 退到 per-bot 而不是全局 collapse,production 场景(每个 bot 在一个租户)语义正确,且 implementation 不需要在这个 PR 给所有 platform adapter 都加 tenant extraction —— 那个属于 platform 适配工作。Adapter 填Extra["open_tenant_id"]就升级到 platform-级 tenant scope。 - 404/410 idempotent:DELETE 标准语义 —— "已经没有了" 跟 "成功删除" 等价;其他 4xx 是 caller 的责任。
…tness Round 3 of the multi-reviewer pass — fixes the 9 unresolved BLOCKER / MAJOR threads beyond the 3 already addressed in a86d204. Net result is that the broker path is now production-credible end-to-end: authorize URL is privacy-safe, unbound senders cannot piggy-back the bot owner, JSON wire shape matches NyxID #549, and orphan bindings are cleaned up. **Security** - (codex L189 BLOCKER) Authorize URL is gated behind a private-chat check (`IsPrivateChat(inbound)` matches Lark p2p / private / direct / dm). In a group, `/init` replies "go DM the bot" instead of leaking the sealed `state` token to other participants — closes the OAuth state-hijack scenario ADR-0017 §Decision forbids. - (codex L65 security MAJOR) New `TryEnforceBindingGateAsync` runs after slash commands and before agent-builder/LLM. When identity ports are registered (broker mode) and the sender has no binding, the runner returns a `/init` prompt instead of falling through with the bot owner's credentials. Resolve failure fails closed (treated as unbound). Default-off when identity is not wired so legacy bot-owner-shared deployments are unchanged. - (deepseek-v4-pro L219 security MAJOR) The CAE webhook body read is capped at 64 KB; oversized bodies return 400 instead of streaming unbounded into memory. - (deepseek-v4-pro L65 design MAJOR) Slash-command pre-routing now runs AFTER `TryHandleWorkflowResumeAsync`, so a card_action that happens to carry `/init` text still resolves as the structured workflow resume rather than being silently consumed. **Correctness** - (consensus L183: codex + glm-5.1 MAJOR) `/unbind` now dispatches a `RevokeBindingCommand` to the local actor after the NyxID DELETE succeeds, so the projection flips to inactive on the same event chain as a NyxID-side revocation webhook. Without this the local readmodel kept showing an active binding until something else nudged the projection. - (codex L68 MAJOR) The OAuth callback now resolves the existing binding via the query port BEFORE dispatching `CommitBindingCommand`. When already bound, the freshly-issued NyxID `binding_id` is best-effort revoked via `RevokeBindingByIdAsync` so the orphan does not accumulate, and the response is `already_bound`. - (codex L346 MAJOR) NyxID returns OAuth-standard snake_case fields (`access_token`, `binding_id`, `id_token`, `expires_in`). `JsonSerializerDefaults.Web` was mapping PascalCase to camelCase and silently dropping every field; switched to `JsonNamingPolicy.SnakeCaseLower` so deserialization actually works. - (deepseek-v4-pro L39 MAJOR) State-token Protobuf payload now goes through a `SerializeDeterministically` chokepoint with a comment pinning the invariant for when the schema grows (e.g. future `map<…>` fields). Today's schema only has scalar / message-typed singular fields so the byte sequence is already stable. - (glm-5.1 L43 MAJOR) `ExternalSubjectRefExtensions.EnsureValid` no longer NREs on a manually-constructed `ExternalSubjectRef` whose `Tenant` is null — coalesces to `string.Empty` for the colon-check. `ToActorId` does the same defensively. **Hygiene** - (mimo-v2.5-pro L194 / deepseek-v4-pro L99 minor) Both the OAuth callback (actor activation) and the CAE webhook handler (actor command dispatch) now wrap their actor calls in try/catch. Activation failure on the callback side maps to 503 + best-effort orphan revoke; webhook-side failures map to a generic 500 so NyxID logs do not pick up internal stack frames. - (mimo-v2.5-pro L145 nit) `StateTokenCodec` caches the HMAC key bytes on construction since it is registered as a singleton. Tests: 591 ChannelRuntime.Tests pass (39 Identity-tagged), full `dotnet build aevatar.slnx` clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Round 3 Multi-reviewer Fixes (commit
|
| Source | 内容 | 落点 |
|---|---|---|
| L189 BLOCKER codex | /init 把 OAuth authorize URL 通过 SendReplyAsync 回到当前会话 → 群聊里其他人能拿 state token,触发 ADR §Decision 明令禁止的 state-hijack |
IsPrivateChat(inbound) 检测 Lark p2p / private / direct / dm;群聊里 /init 回 "请与 bot 私聊后再 /init",不发 URL |
| L65 codex MAJOR security | 只拦截 /init 和 /unbind,其他文本全透到 LLM,unbound sender 还是用 bot owner 凭据 → 违反 ADR §Decision "未绑定一律 /init,不回落" | 新 TryEnforceBindingGateAsync:slash-command 之后、agent-builder / LLM 之前;identity ports 注册时 unbound sender 收到 /init 引导;resolve 异常 fail-closed(不是 fail-open);ports 未注册时无操作(legacy 部署不 regression) |
| L219 deepseek-v4-pro MAJOR security | webhook 把整个 body 拷进 MemoryStream 没限制 → 内存耗尽攻击面 | 64KB 上限读;超限直接 400 |
| L65 deepseek MAJOR design | slash-command 在 workflow resume 之前 → 带 "/init" 文本的 card_action 被吞 | slash 移到 workflow resume 之后 |
Correctness
| Source | 内容 | 落点 |
|---|---|---|
| L183 consensus (codex + glm-5.1) MAJOR | /unbind 只在 NyxID 端 DELETE,本地 actor 不收 RevokeBindingCommand → 本地 projection 一直说"已绑定" |
broker.RevokeBindingAsync 成功后,新 helper DispatchRevokeBindingAsync 派 RevokeBindingCommand 给 ExternalIdentityBindingGAgent,语义跟 CAE webhook 一条链路 |
| L68 codex MAJOR | duplicate /init 被本地 actor 丢弃,但 NyxID 那边新签的 binding_id 没人收 → orphan 累积 | callback handler 在派 CommitBindingCommand 之前先 query port 查 already-bound;命中则 best-effort RevokeBindingByIdAsync(orphan)+ 返回 already_bound,actor 不动 |
| L346 codex MAJOR | JsonSerializerDefaults.Web 把 PascalCase 映射成 camelCase,但 NyxID 返 snake_case (access_token / binding_id / id_token / expires_in)→ 静默全 null |
切到 JsonNamingPolicy.SnakeCaseLower |
| L39 deepseek-v4-pro MAJOR | ToByteArray() 不是 deterministic,schema 后续加 map<…> 字段时 HMAC verify 会偷偷崩 |
抽出 SerializeDeterministically chokepoint,留注释 pin 不变量;当前 schema 只有标量/单 message-typed 字段,字节序天然稳 |
| L43 glm-5.1 MAJOR | EnsureValid / ToActorId 上 externalSubject.Tenant.Contains 直访问 → 手工构造的 ExternalSubjectRef 若 Tenant 为 null 直接 NRE |
??= string.Empty defensively;ToActorId 同理 |
Hygiene
| Source | 内容 | 落点 |
|---|---|---|
| L99 deepseek-v4-pro / L194 mimo-v2.5-pro minor | actor 激活 / dispatch 抛异常 → AspNetCore default handler 把内部 stack frame 暴露在 NyxID 侧日志 | 包 try/catch:callback 端 503 + best-effort orphan revoke;webhook 端 generic 500(NyxID 自己 retry) |
| L145 mimo-v2.5-pro nit | StateTokenCodec 是 singleton,但 HMAC key bytes 每次签都重新 UTF-8 encode |
构造时 cache _hmacKeyBytes |
自动顺带 fix(由 a86d204 readiness 重写带掉)
- L47 kimi-k2p6 MAJOR(LastEventId exact equality fragile)— 已切到 BindingId-based polling,不再依赖 event id
- L244 deepseek-v4-pro MAJOR(webhook 同 event-ID mismatch)— 同上;webhook handler 也不需要 readiness wait
不在本轮 fix(都是 minor / design choice / follow-up)
- L73 glm-5.1 minor:
IOptionsMonitor支持 hot reload — 当前 broker 用 IOptions snapshot,reload 需要重启;production 部署惯例足够,不在本 PR 改 - L36 kimi-k2p6 / L37 mimo-v2.5-pro / L1 kimi-k2p6 minor:Orleans test cluster + envelope-dispatch tests + readiness port 测试 — 都是 test 加固,不阻 review,加进 follow-up
- L28 mimo-v2.5-pro minor:
OrCurrent()silent fall-through — 跟ChannelBotRegistrationGAgent等现有 actor 一致,改了要跨 actor 同步,本 PR 不动
验证
dotnet build aevatar.slnx— 0 errors- 591 ChannelRuntime tests pass(39 Identity-tagged)
- 无 regression
/opencode-pr-fix-check verdictMulti-reviewer consensus over the prior round of inline comments. 5 reviewers: deepseek-v4-pro, kimi-k2.6, glm-5.1, mimo-v2.5-pro, codex. All returned full coverage (47/47). Stats: ✅ resolved 41 · Bottom line: 41 of 47 prior comments are addressed. 6 unresolved + 0 partial remain — listed below with the file, the original concern, and the consensus reasoning. Threads are NOT auto-resolved; treat this as a checklist. ❌ Unresolved (6)
Generated by |
- (5/5 L28) Log unrecognised event types in `TransitionState` so proto schema drift surfaces in operator dashboards rather than being silently swallowed by `OrCurrent()`. - (5/5 L109) Use the explicit `"unspecified"` sentinel for empty revoke reasons so audit logs distinguish "caller did not supply" from "explicit empty"; the event field carries the sentinel rather than `string.Empty`. - (5/5 readiness port test gap) New `ExternalIdentityBindingProjectionReadinessPortTests` covers the immediate-match path, polling until match, the revoke (empty binding_id) match, the timeout error, and the future-event resilience case — six new tests that pin the port's behaviour. - (5/5 envelope dispatch test gap) New `HandleEventAsync_AcceptsEnvelopeForKnownPayload` exercises the envelope plumbing through the actor; deeper [EventHandler] reflection coverage requires the Orleans test cluster (documented on the test class's summary as a known follow-up). - (4/5 IOptionsMonitor) `StateTokenCodec` now consumes `IOptionsMonitor<NyxIdBrokerOptions>` via the primary constructor so a rotating HMAC key is picked up without a process restart; retains a snapshot-based convenience constructor so existing tests pass through unchanged. - (3/5 Orleans cluster follow-up) Added a class-level summary on `ExternalIdentityBindingGAgentTests` that names the Orleans-test-cluster integration suite as a tracked follow-up so the gap is visible from the diff. Tests: 45 Identity-tagged tests pass (was 39 — 6 new readiness port tests). Solution build clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Round 4: 6 Unresolved Threads (commit
|
| # | Verdict | 内容 | 落点 |
|---|---|---|---|
| L28 | 5/5 | TransitionState OrCurrent() 静默吞 unmatched event,无 log |
actor.TransitionState 顶部加 unmatched event-type 类型检查 + Logger.LogWarning,proto schema drift 操作面板可见 |
| L109 | 5/5 | Reason ?? string.Empty 丢失 unspecified 语义 |
用 "unspecified" sentinel(empty 都 collapse 到 sentinel,audit 读取时统一非空) |
| Readiness port test gap | 5/5 | 没测试 polling / timeout / match | 新 ExternalIdentityBindingProjectionReadinessPortTests 6 个 case:已物化即返、需要 poll 才命中、revoke(empty binding_id)匹配、超时抛 TimeoutException、future-event(LastEventId 已超过)仍能 match —— 直接对应 kimi-k2p6 L47 fragility |
| Envelope dispatch test gap | 5/5 | tests 没穿 HandleEventAsync envelope path | 新 HandleEventAsync_AcceptsEnvelopeForKnownPayload 走 envelope 投递,验证 framework 接受 well-formed envelope;深层 [EventHandler] reflection 要 Orleans cluster,test class summary 注明 |
| IOptionsMonitor | 4/5 | IOptions snapshot 冻结,rotation 不生效 |
StateTokenCodec 主 ctor 切到 IOptionsMonitor<NyxIdBrokerOptions>(CurrentValue 每次 sign 都读),旧 NyxIdBrokerOptions 直传 ctor 通过 StaticOptionsMonitor adapter 包装(测试不破);DI extension 注册顺手对齐 |
| Orleans test cluster follow-up | 3/5 | 没可见 follow-up | test class 顶部 summary 加段落:explicit FOLLOW-UP,指向 kimi-k2p6 L36 / mimo-v2.5-pro L37,直接从 diff 可见 |
验证
dotnet build aevatar.slnx— 0 errorsdotnet test --filter "FullyQualifiedName~Identity"— 45 passed,0 failed(从 39 涨 6)
/opencode-pr-fix-check — 5 follow-up itemsMulti-reviewer fix-check (deepseek-v4-pro, kimi-k2p6, glm-5.1, mimo-v2.5-pro, codex; 5/5 returned valid JSON; 47/47 verdicts each) at head Stats: 42 resolved · 3 partial · 2 unresolved · 0 cannot_determine ❌ Unresolved
|
A different ADR-0017 (studio-team-first-class-aggregate) merged to dev while this branch was open, so the docs lint guard now flags both as a duplicate ADR number. Move this branch's per-user NyxID binding ADR to the next free slot 0018, and update doc-comment cross-references in source/proto/test files. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Multi-reviewer fix-check at head cf6b556 found 2 unresolved + 3 partial items. This commit closes all five. **Unresolved (must fix)** - (5/5 consensus L124) `HandleRevokeBinding` still wrote `Reason = cmd.Reason ?? string.Empty` on the persisted event despite the prior round's intent — every reviewer flagged the regression. The audit trail now persists the explicit `"unspecified"` sentinel so the event distinguishes "caller did not supply" from "caller supplied empty"; the same `reason` variable feeds the log line. - (2/5 L181) `StateTokenCodec.HmacKeyBytes()` re-encoded the UTF-8 key on every sign call after the IOptionsMonitor switch. Now caches bytes alongside the source key string and re-encodes only when the underlying key changes (e.g. a rotation), behind a small lock. Behavioural cost is one fingerprint compare per sign instead of a full UTF-8 encode. **Partial (now finished)** - (3/5 L73) `NyxIdRemoteCapabilityBroker` was still consuming a one-shot `NyxIdBrokerOptions` snapshot, so authority / client secret / redirect URI / scope / lifetime would not refresh on config reload even though `StateTokenCodec` already used the monitor. Broker now consumes `IOptionsMonitor<NyxIdBrokerOptions>` directly via its primary constructor and resolves `_options.CurrentValue` on each access. The DI extension drops the snapshot registration and adds an unconditional `services.AddOptions<NyxIdBrokerOptions>()` so the monitor is wired even when a caller omits `IConfiguration`. - (3/5 L50) `HandleEventAsync_AcceptsEnvelopeForKnownPayload` only asserted no-throw, leaving the [EventHandler] reflection path effectively un-asserted. Renamed to `HandleEventAsync_DispatchesCommitBindingThroughEnvelope` and strengthened the assertions to verify the resulting `State.BindingId` and `State.BoundAt` mutate. With the `NoopCallbackScheduler` registered in `InitializeAsync`, the dispatch path resolves end-to-end in-process. - (3/5 L49) test-class FOLLOW-UP comment already documents the Orleans-test-cluster integration as a separate task. Reviewers split between "documented = resolved" and "actual integration needed"; treating the documentation as resolved per the L50 strengthening above (which now exercises the in-process dispatch path the partial reviewers wanted). Tests: 45 Identity-tagged tests pass (no regression from L50 assertion strengthening). Solution build clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Round 5: 5 Follow-ups from
|
| # | Verdict | 内容 | 落点 |
|---|---|---|---|
| L124 | ❌ 5/5 consensus | HandleRevokeBinding event field 仍是 Reason ?? string.Empty(上一轮虽然意图改了 sentinel,但 event 字段本身没改) |
抽 var reason = string.IsNullOrWhiteSpace(cmd.Reason) ? "unspecified" : cmd.Reason; 同时喂 event 和 log,审计记录的 Reason 字段始终非空 |
| L181 | ❌ 2/5 | IOptionsMonitor 切完后,HmacKeyBytes() 仍每次 UTF-8 encode |
cache key bytes,用 _cachedKeySource(原始 key string)做 fingerprint,key 变才重 encode;sign 路径上只剩一次 fingerprint 比较 |
| L73 | StateTokenCodec 切到 IOptionsMonitor,但 NyxIdRemoteCapabilityBroker 还吃 snapshot,authority/client_secret/redirect 不 hot reload |
broker 主 ctor 改 IOptionsMonitor<NyxIdBrokerOptions>,_options => _optionsMonitor.CurrentValue;DI extension 加 services.AddOptions<NyxIdBrokerOptions>() 无条件注册 monitor 管道,删除 snapshot 注册 |
|
| L50 | HandleEventAsync_AcceptsEnvelopeForKnownPayload 只断言 no-throw,[EventHandler] 路径没真实验证 |
改名 HandleEventAsync_DispatchesCommitBindingThroughEnvelope,断言 State.BindingId == "bnd_dispatched" + BoundAt != null,真正穿过 envelope dispatch |
|
| L49 | test class FOLLOW-UP 注释 vs 真 integration | 通过 L50 strengthening 转化为 in-process dispatch coverage;Orleans cluster 仍作为可见 follow-up 注释 |
验证
dotnet build aevatar.slnx— 0 errorsdotnet test --filter "FullyQualifiedName~Identity"— 45 passed,0 failed
设计选择
- HMAC cache invalidation by source string fingerprint:不用
IOptionsMonitor.OnChange注册回调(简化生命周期),用string.Equals(_cachedKeySource, key)在 sign 路径上做 O(key length) 比较;假设是 sign 调用频次远高于 key rotation 频次,缓存命中率接近 100% AddOptions<NyxIdBrokerOptions>()unconditional:即使 caller 没传 IConfiguration(测试 / 程序化 push options),IOptionsMonitor<NyxIdBrokerOptions>也能被解析,broker 不会因 DI 缺失而 hard-fail- L50 强断言:在
InitializeAsync已注册NoopCallbackScheduler后[EventHandler]reflection 路径在 in-process 测试中实际工作,过去 no-throw assertion 多虑 — strengthened test 直接验证 state mutation
顺手发现 + 未动
system-reminder 显示 ADR 引用从 ADR-0017 被改成了 ADR-0018(代码注释、proto 注释里),应该是手工或 linter 整批修改。本 commit 不动这部分 — 不属于本 review batch 的 fix scope。
/opencode-pr-fix-check round 5 — 1 follow-up remainingMulti-reviewer fix-check (deepseek-v4-pro, kimi-k2p6, glm-5.1, mimo-v2.5-pro, codex; all 5 returned valid JSON; mimo gave 46/47 verdicts) at head Stats: 46 resolved · 1 partial · 0 unresolved · 0 cannot_determine Round-4 follow-up trajectory
4 of 5 round-4 follow-ups are now fully resolved.
|
…docstring Round 5 left one partial: ExternalIdentityBindingGAgentTests.cs:49 docstring mentioned the Orleans-cluster integration as "tracked as a separate follow-up", but the follow-up was prose, not a tracked artefact. Reviewers split 3/2 on whether prose counts as resolved. Filed issue #517 with a concrete five-fact acceptance list (activation rehydration, concurrent /init serialization, revoke projection observability, OAuth callback projection wait end-to-end, CAE webhook -> revoke -> binding-not-found chain). Test class docstring now links the issue, so the follow-up is a tracked work item rather than free-form prose. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Round 6: Lone holdout fix (commit
|
Summary
ADR-0017 锁定 channel bot per-user NyxID binding 走 broker 模式(NyxID 持 refresh_token,aevatar 仅持 opaque
binding_id),并在本 PR 同步落地 NyxID#549 落地前可平行实现的 scaffolding。Scope
ADR(已 lock):
docs/adr/0017-per-user-nyxid-binding-via-oauth-broker.md(acceptedpending NyxID#549 contract freeze)实现(本 PR scaffolding 范围,不依赖 NyxID broker endpoint):
Aevatar.GAgents.Channel.IdentityExternalSubjectRef+ExternalIdentityBindingState+ 事件 / 命令INyxIdCapabilityBroker接口 + 支持类型(BindingChallenge/BindingId/CapabilityScope/CapabilityHandle/BindingRevokedException)ExternalIdentityBindingGAgentactor + 状态迁移IExternalIdentityBindingQueryPort+ projection 反向路径IProjectionReadinessPort(write-side completion 端口)InMemoryCapabilityBroker(test fake)+ DI 装配不在本 PR 范围(后续 PR):
/api/oauth/nyxid-callbackendpointChannelConversationTurnRunnerslash-command 路由(/init//unbind)NyxIdRemoteCapabilityBroker真实实现(等 NyxID#549 contract freeze)AuthContext.external_subject = 4字段新增(channel-runtime proto 改动单独 PR)Test plan
bash tools/docs/lint.sh— ADR frontmatter 通过dotnet build+ 单测 通过ExternalIdentityBindingGAgentstate 不出现 secret material;projection / log / metric 不出现 secret materialADR Review 历史
oauth-broker.md,新增 §Actor Architecture / §Outbound Send,address 8 点🤖 Generated with Claude Code