Skip to content

Lift Lark identifier-ladder retry to ILarkOutboundDispatcher boundary #415

@eanzhao

Description

@eanzhao

Filed as the third architectural follow-up from the PR #412 long-form review. Issues #408 (typed OutboundTarget proto sub-message) and #414 (DRY TrySendWithFallbackAsync between SkillRunner and FeishuCardHumanInteractionPort) cover the other two; this one captures the third.

Problem

The 230002 bot not in chat → fallback retry logic in PR #412 lives in two actor/port-side call sites:

Both call sites today know:

  1. What identifier classes Lark accepts (chat_id / union_id / open_id).
  2. The ordering invariant (chat_id is most specific cross-app cross-tenant, union_id second, open_id last).
  3. Which Lark error code triggers a fallback (230002 bot not in chat).
  4. That 99992361 open_id cross app and 99992364 user id cross tenant are NOT retryable and should propagate with actionable hints.

That platform-specific knowledge should not live in actor/port code per CLAUDE.md "Actor 即业务实体" — actors should only know "send this content to this conversation". The identifier ladder is a transport-layer dispatch concern.

Proposed shape

Introduce an ILarkOutboundDispatcher boundary owned by ChannelRuntime infrastructure:

internal interface ILarkOutboundDispatcher
{
    Task<LarkSendResult> SendAsync(
        LarkOutboundEnvelope envelope,    // typed targets (primary + ordered fallbacks) + payload
        CancellationToken ct);
}

The dispatcher internally:

  • Walks the persisted target list (primary → fallback → fallback…) with the platform's known retry policy (currently: try next on 230002).
  • Surfaces unrecoverable Lark codes (99992361, 99992364) with the right last_error text built once.
  • Carries error / lark code / Nyx HTTP status back as a typed LarkSendResult so callers (SkillRunner, FeishuCardHumanInteractionPort) just observe success/failure without re-implementing the ladder.

Once the dispatcher exists:

  • SkillRunnerGAgent.SendOutputAsync and FeishuCardHumanInteractionPort.SendMessageAsync reduce to "build envelope + dispatch + map result" — no LarkBotErrorCodes.BotNotInChat strings, no TrySendWithFallbackAsync, no per-call-site fallback log lines that drift.
  • Adding a third identifier (open_id as final fallback) is one place, not two.
  • Adding new retryable Lark codes or a new platform (Telegram retryable codes when Telegram outbound lands) is one place, not two.
  • tools/ci/architecture_guards.sh can grep for LarkBotErrorCodes references in non-dispatcher code as a lint.

Dependencies

Out of scope

  • No behavior change. Dispatcher-side retry policy and error mapping must match today's call-site logic exactly. Pin with the existing SkillRunnerGAgent + FeishuCardHumanInteractionPort test suites.

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions