Skip to content

[codex] Add tool-driven Aevatar core invocation sources#830

Open
eanzhao wants to merge 6 commits into
devfrom
feature/core-loop
Open

[codex] Add tool-driven Aevatar core invocation sources#830
eanzhao wants to merge 6 commits into
devfrom
feature/core-loop

Conversation

@eanzhao
Copy link
Copy Markdown
Contributor

@eanzhao eanzhao commented May 22, 2026

背景

这个 PR 是 ADR-0026 的 Stage 1:把 aevatar 的核心能力重新定位为 LLM tool source,让模型通过 function call 主动选择何时使用 workflow、GAgent、team、readmodel observation 等能力,而不是继续在入口层维护 ForwardToGAgent / ForwardToTeam 这类并行路由方言。

改动

  • 新增 ADR:docs/adr/0026-tool-first-chat-ingress.md,明确 tool-first chat ingress 的目标、边界和后续阶段。
  • 新增 Aevatar.AI.ToolProviders.AevatarInvocation,提供 5 个 invocation tools:
    • aevatar_invoke_gagent
    • aevatar_invoke_team
    • aevatar_start_workflow
    • aevatar_observe_run
    • aevatar_query_readmodel
  • 新增共享 AevatarInvocationDispatcher,统一做 proto 参数解析、caller scope 注入、调度、readmodel 查询与结构化错误返回。
  • 通过 proto descriptor 生成严格 JSON schema,避免把核心语义塞进无约束 JSON bag。
  • 接入 Mainnet Host DI,让这些 tool sources 能进入现有 IAgentToolSource 发现链路。
  • 补 Lark caller-scope 回归测试,证明 Lark send tool 使用可信的 AgentToolRequestContext.NyxIdAccessToken,payload/外部 metadata 不能覆盖调用者凭据。
  • /v1/responses E2E 测试,证明模型发出的 aevatar_invoke_gagent additive tool call 会走 tool loop 并通过 IActorDispatchPort 投递 actor envelope,而不是走 legacy ForwardToGAgent 静态调用链路。

影响

  • 这是 tool-driven core loop 的第一步,不删除现有 legacy forward path。
  • GAgent / workflow 的 wait=complete 仍返回结构化 wait_complete_unavailable;当前阶段支持 ack / stream,后续由 session actor/观察链路承接长任务 continuation。
  • aevatar_query_readmodel 只允许查询封闭集合 readmodel,不开放任意 document collection。
  • 没有修改 NyxID、chrono-* 等外部仓库。

验证

  • dotnet test test/Aevatar.AI.ToolProviders.AevatarInvocation.Tests/Aevatar.AI.ToolProviders.AevatarInvocation.Tests.csproj --nologo:通过,21 passed。
  • dotnet test test/Aevatar.AI.ToolProviders.Lark.Tests/Aevatar.AI.ToolProviders.Lark.Tests.csproj --nologo:通过,61 passed。
  • dotnet test test/Aevatar.Hosting.Tests/Aevatar.Hosting.Tests.csproj --filter FullyQualifiedName~PostResponses_StreamWithAevatarInvokeGAgentAdditiveTool_ShouldDispatchActorEnvelope --nologo:通过,1 passed。
  • bash tools/ci/test_stability_guards.sh:通过。
  • bash tools/ci/architecture_guards.sh:通过。
  • git diff --check origin/dev..HEAD:通过。

备注:本地测试仍有既有 NuGet source mapping / analyzer warnings,没有测试失败。

eanzhao and others added 4 commits May 22, 2026 16:41
Records the architectural decision to collapse ChatRouteAction to
Reject + ForwardToModel, exposing GAgent/Team/Workflow invocation
as IAgentToolSource tools through the existing ToolCallLoop. Supersedes
ADR-0024 §D5 (v1 action set) and ADR-0025 (voice v1 ForwardToGAgent);
ADR-0024 D1/D2/D3/D4/D6 stand.

Tracked end-to-end in epic #808; voice GA prerequisite
in #809.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Implements ADR-0026 Stage 1 unit-1 (epic #808). New project
src/Aevatar.AI.ToolProviders.AevatarInvocation/ exposes aevatar_invoke_gagent /
_invoke_team / _start_workflow / _observe_run / _query_readmodel as
IAgentToolSource, so the LLM can drive orchestration through the existing
ToolCallLoop instead of parallel router branches.

Design:
- Tool payloads are proto-derived strict JSON-Schema (no map<string,string> bags)
- wait=ack|stream|complete supported; stream is default for long-running tools;
  GAgent/workflow wait=complete returns wait_complete_unavailable until Stage 2
  session actor lands
- Caller scope flows through AgentToolRequestContext only; protected caller-scope
  keys (LLMRequestMetadataKeys.*) are stripped from LLM-supplied payload.headers
  before server values are stamped, so the LLM cannot inject overrides for
  nyxid.access_token / scope_id / owner_subject etc.
- query_readmodel is bounded to a closed registered set
- Dispatch reuses existing surfaces (IActorDispatchPort,
  ITeamEntryMemberResolver + IStaticGAgentStreamInvocationPort<AGUIEvent>,
  ICommandDispatchService<WorkflowChatRunRequest,...>); no new dispatch chain

21 tests pass (4 credential-injection regression + 1 ObserveRun fast-fail
added in post-review hardening); arch_guards + test_stability + docs lint all
PASS.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Implements ADR-0026 Stage 1 unit-2 (D7 prerequisite) for the Lark
outbound caller-scope guarantee. After auditing the existing path
(LarkMessagesSendTool → LarkNyxClient → NyxIdApiClient) no production
refactor was required: the tool already reads
AgentToolRequestContext.NyxIdAccessToken (no credential parameters) and
forwards the caller bearer through NyxID's api-lark-bot proxy, which
exchanges to a Lark tenant_access_token without seeing the caller's
authorization header. The metadata-bag credential-injection surface that
unit-1 had to harden is structurally absent here (no headers/metadata
bag at the dispatch boundary).

Added 2 regression tests:
- Asserts the dispatched NyxID call carries AgentToolRequestContext's
  trusted typed NyxIdAccessToken
- Asserts a malicious LLM payload (smuggled nyx_id_access_token, fake
  headers, ExternalMetadata overriding LLMRequestMetadataKeys.NyxIdAccessToken)
  cannot override the trusted caller token at dispatch

NyxID investigation summary (verified via gh against ChronoAIProject/NyxID
backend source): /api/v1/proxy/s/api-lark-bot/open-apis/im/v1/messages
accepts only the caller's NyxID bearer; NyxID resolves caller's
api-lark-bot binding, exchanges {app_id, app_secret} → tenant_access_token
per channel_adapters/lark.rs::lark_family_token_exchange_config, strips
the inbound authorization, and injects bearer for outbound to Lark.
Semantic: messages post as the caller's bound Lark bot (NyxID-mediated),
not as the human user's OAuth identity and not as Aevatar's service-level
identity. This satisfies ADR-0026 §D7's "lands in the caller's Lark
account" use case.

61/61 tests pass; arch_guards + test_stability + docs lint all PASS.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closes ADR-0026 Stage 1 (epic #808). Integration test
demonstrates the new tool-first ingress path works end-to-end after
units 1+2 landed, without touching any production code.

Test: MainnetResponsesEndpointsTests.PostResponses_StreamWithAevatarInvokeGAgentAdditiveTool_ShouldDispatchActorEnvelope

Scenario:
- /v1/responses streamed request with real DI registration of unit-1's
  AddAevatarInvocationTools (5 production IAgentToolSource instances)
- Stubbed LLM emits aevatar_invoke_gagent tool call with a malicious
  payload that smuggles nyxid.access_token + aevatar.scope_id overrides
- ResponsesCompletionApplicationService executes the local tool call
  inline (not as function_call SSE output — verified against production
  StreamAsync behavior)
- AevatarInvocationDispatcher dispatches through IActorDispatchPort
  (captured by RecordingActorDispatchPort)
- LLM round 2 continues after tool result, SSE lifecycle completes

Assertions:
- Dispatched envelope's Route.PublisherActorId == DirectGAgentPublisherId
- Dispatched ChatRequestEvent.Headers carry the trusted bearer/scope
  (caller-scope protection from unit-1 verified end-to-end)
- ThrowingStaticGAgentStreamInvocationPort.InvocationCount == 0
  (the legacy ForwardToGAgent/ForwardToTeam path in
  ResponsesEndpoints.cs:779-927 is NOT entered)

202/202 tests pass in Aevatar.Hosting.Tests; arch_guards +
test_stability_guards + docs lint all PASS. No production code changes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@eanzhao eanzhao marked this pull request as ready for review May 22, 2026 17:19
@codecov
Copy link
Copy Markdown

codecov Bot commented May 22, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 83.07%. Comparing base (fccb80d) to head (853a8a1).

@@            Coverage Diff             @@
##              dev     #830      +/-   ##
==========================================
+ Coverage   83.06%   83.07%   +0.01%     
==========================================
  Files         981      981              
  Lines       61936    61936              
  Branches     8069     8069              
==========================================
+ Hits        51447    51454       +7     
+ Misses       7009     6996      -13     
- Partials     3480     3486       +6     
Flag Coverage Δ
ci 83.07% <ø> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.
see 2 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant