Skip to content

Commit f9af684

Browse files
authored
feat: remote permission msg (#1404)
* feat(remote): add tool interactions * fix(remote): update tg prompt immediately * fix(remote): refine remote interactions
1 parent a05c3f4 commit f9af684

23 files changed

+3181
-124
lines changed
Lines changed: 125 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,125 @@
1+
# Remote Tool Interactions Plan
2+
3+
## Summary
4+
5+
Implement a structured remote interaction loop for Telegram and Feishu so remote endpoints can resolve paused permission and question interactions without falling back to a generic desktop-only notice. The feature stays inside Electron main and reuses the existing `RemoteConversationRunner`, `RemoteCommandRouter`, `FeishuCommandRouter`, and `newAgentPresenter.respondToolInteraction(...)` flow.
6+
7+
## Goals
8+
9+
- Expose `RemoteConversationSnapshot.pendingInteraction` as the canonical paused-interaction state for remote delivery.
10+
- Preserve the current detached-session and bound-endpoint model without adding renderer IPC.
11+
- Let Telegram resolve interactions with inline buttons plus text fallback.
12+
- Let Feishu render interaction cards and fall back to complete plain-text prompts when card delivery fails.
13+
- Keep command/session state safe while an interaction is unresolved.
14+
15+
## Readiness
16+
17+
- No open clarification items remain.
18+
- The feature is ready for implementation and regression verification.
19+
20+
## Rollout Steps
21+
22+
1. Extend remote snapshot and runner contracts to surface `pendingInteraction`.
23+
2. Parse assistant `tool_call_permission` and `question_request` blocks into a shared `RemotePendingInteraction` model.
24+
3. Gate remote command routing around pending interactions and add `/pending`.
25+
4. Add Telegram-specific rendering, callback token state, callback refresh, and text fallback.
26+
5. Add Feishu-specific card rendering, text fallback, and inbound text parsing.
27+
6. Add regression coverage for runner extraction, callback refresh, prompt resend, and channel-specific prompt delivery.
28+
7. Update spec artifacts so acceptance, rollout, and compatibility are reviewable without tracing code.
29+
30+
## Dependencies
31+
32+
- `RemoteConversationSnapshot.pendingInteraction` in `RemoteConversationRunner`
33+
- `newAgentPresenter.respondToolInteraction(...)`
34+
- Existing Telegram outbound edit/send flows in `TelegramPoller`
35+
- Existing Feishu outbound text flow extended with card sending in `FeishuRuntime`
36+
- In-memory callback/token state in `RemoteBindingStore`
37+
38+
## Data And API Changes
39+
40+
- `RemoteConversationSnapshot`
41+
- Add `pendingInteraction: RemotePendingInteraction | null`
42+
- Preserve `text` and `completed` semantics so remote delivery can send visible text plus a follow-up interaction prompt
43+
- `RemoteRunnerStatus`
44+
- Add `pendingInteraction`
45+
- Suppress `isGenerating` while the assistant is explicitly waiting on user action
46+
- `RemotePendingInteraction`
47+
- Include `messageId`, `toolCallId`, `toolName`, `toolArgs`
48+
- Include permission metadata for `tool_call_permission`
49+
- Include question metadata for `question_request`
50+
- `RemoteCommandRouteResult` / `FeishuCommandRouteResult`
51+
- Allow outbound interaction prompt actions in addition to normal replies/conversation execution
52+
53+
## Telegram Rendering Behavior
54+
55+
- Permission interactions render a dedicated prompt with inline `Allow` / `Deny` buttons.
56+
- Single-choice questions render inline option buttons and `Other` when custom text is allowed.
57+
- `question.multiple === true` does not render fake multi-select buttons and instead instructs the user to reply in plain text.
58+
- Text fallback accepts:
59+
- `ALLOW` / `DENY` for permissions
60+
- Exact numeric replies for question options
61+
- Exact option labels for question options
62+
- Custom text when allowed
63+
- Expired callback tokens do not hard-fail if the interaction still exists; the router re-reads the current pending interaction and refreshes the prompt.
64+
- After a button press, Telegram edits the original prompt into a resolved state immediately, then continues any deferred execution in the background.
65+
66+
## Feishu Rendering Behavior
67+
68+
- Pending interactions render as interactive-card style outbound messages when the card API succeeds.
69+
- Card fallback uses the full plain-text prompt, not only a short reply hint, so the user still sees permission/question details.
70+
- Feishu remains text-response only on the inbound side:
71+
- `ALLOW` / `DENY` for permissions
72+
- Exact numeric replies for question options
73+
- Exact option labels for question options
74+
- Custom text when allowed
75+
- `question.multiple === true` always uses plain-text answers.
76+
77+
## Command Gating While Waiting
78+
79+
- Blocked commands while a pending interaction exists:
80+
- `/new`
81+
- `/use`
82+
- `/model`
83+
- Unrelated plain-text new-turn input
84+
- Allowed commands while a pending interaction exists:
85+
- `/help`
86+
- `/status`
87+
- `/open`
88+
- `/pending`
89+
- `/pending` re-sends the current prompt for the endpoint-bound session.
90+
91+
## Migration And Compatibility
92+
93+
- `RemoteConversationSnapshot.pendingInteraction` is additive and does not require a persisted config migration.
94+
- Existing Telegram and Feishu bindings remain valid.
95+
- Existing remote sessions continue to use detached session creation and the same runner/session binding path.
96+
- Telegram keeps inline-button interaction handling; Feishu does not introduce public callback endpoints.
97+
- The former generic "Desktop confirmation is required" message becomes a fallback path only, not the primary remote behavior.
98+
99+
## Risks And Mitigations
100+
101+
- Stale callback tokens
102+
- Mitigation: rebind tokens to `endpointKey + messageId + toolCallId` and refresh prompts when the current interaction still matches.
103+
- Session drift while waiting
104+
- Mitigation: block `/new`, `/use`, `/model`, and unrelated plain-text turns until the interaction is resolved.
105+
- Feishu card delivery failures
106+
- Mitigation: fall back to the full plain-text prompt and keep inbound parsing text-only.
107+
- Telegram callback latency
108+
- Mitigation: edit the prompt immediately and run continuation work off the poll loop.
109+
110+
## Test Strategy
111+
112+
- Runner tests
113+
- Extract `pendingInteraction` from assistant action blocks
114+
- Resume after tool interaction response
115+
- Handle chained interactions on the same assistant message
116+
- Telegram tests
117+
- Button callbacks and text fallback
118+
- Expired callback token refresh
119+
- `/pending` prompt resend
120+
- Prompt edit timing and non-blocking deferred continuation
121+
- Feishu tests
122+
- Card prompt generation
123+
- Plain-text fallback content
124+
- Text parsing for permission/question answers
125+
- Pending command gating and `/pending`
Lines changed: 47 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,47 @@
1+
# Remote Tool Interactions
2+
3+
## Summary
4+
5+
Extend remote control so Telegram and Feishu can surface structured pending tool interactions instead of collapsing them into a generic desktop-only notice. Remote users must be able to resolve permission requests and `user ask` style questions from the chat channel itself, while the desktop app keeps the existing agent execution and permission backends.
6+
7+
## User Stories
8+
9+
- As a Telegram remote user, I can approve or deny a tool permission request directly from inline buttons.
10+
- As a Telegram remote user, I can answer a pending question by tapping an option or replying with text when custom input is allowed.
11+
- As a Feishu remote user, I can see a clear card-style prompt for a pending permission or question and reply with a supported text answer.
12+
- As a desktop user, I do not lose remote session continuity when a tool interaction pauses the assistant.
13+
- As a paired remote user, I can ask the bot to re-show the current pending interaction without opening the desktop app.
14+
15+
## Acceptance Criteria
16+
17+
- `RemoteConversationSnapshot` includes `pendingInteraction` with structured `permission` or `question` data when the latest assistant message is waiting on user action.
18+
- Remote delivery no longer relies on the generic "Desktop confirmation is required" path as the primary behavior.
19+
- Telegram pending permission prompts render inline `Allow` and `Deny` buttons and also accept `ALLOW` / `DENY` text replies.
20+
- Telegram single-choice question prompts render inline option buttons and an `Other` button when custom answers are allowed.
21+
- Telegram multi-answer questions do not render fake multi-select buttons and instruct the user to reply with plain text.
22+
- Expired Telegram interaction callback tokens refresh the prompt when the underlying pending interaction still exists.
23+
- Feishu pending prompts render as interactive-card style outbound messages when possible and fall back to plain text when card delivery fails.
24+
- Feishu accepts `ALLOW` / `DENY`, option numbers, exact option labels, and custom text according to the pending question shape.
25+
- `/pending` re-sends the current prompt for both Telegram and Feishu.
26+
- While a pending interaction exists, `/new`, `/use`, `/model`, and plain new-turn messages are blocked from creating unrelated session state changes.
27+
- `/help`, `/status`, `/open`, and `/pending` remain available while a pending interaction exists.
28+
- Existing remote pairing, binding, `/open`, `/status`, and normal non-interaction conversations continue to work.
29+
30+
## Constraints
31+
32+
- Keep all logic in Electron main; do not add a new renderer IPC surface for this feature.
33+
- Telegram continues to use callback-query buttons; Feishu does not introduce a public HTTP callback service for card clicks.
34+
- Remote bot copy remains English in this increment.
35+
- Each endpoint only resolves the first pending interaction for its bound session at a time.
36+
37+
## Non-Goals
38+
39+
- Feishu clickable approval callbacks.
40+
- Locale negotiation for remote bot messages.
41+
- Arbitrary rich remote workflows beyond permission requests and question requests.
42+
43+
## Compatibility
44+
45+
- Existing Telegram and Feishu bindings remain valid.
46+
- Existing remote sessions continue to use `RemoteConversationRunner` and detached session creation.
47+
- Structured pending interaction handling is additive and only changes how remote channels render and answer paused assistant states.
Lines changed: 103 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,103 @@
1+
# Remote Tool Interactions Tasks
2+
3+
## Readiness
4+
5+
- No open clarification items remain.
6+
- All tasks below map back to the acceptance criteria in [spec.md](./spec.md).
7+
8+
## T0 Spec Artifacts
9+
10+
- [x] Create and align `spec.md`, `plan.md`, and `tasks.md`
11+
- Owner: Remote control maintainer
12+
- Estimate: 0.5d
13+
- Acceptance Criteria:
14+
- Spec acceptance criteria for `pendingInteraction`, channel rendering, `/pending`, and command gating are explicitly represented in the plan/tasks artifacts.
15+
- No unresolved clarification markers remain before the work is marked ready.
16+
17+
## T1 Remote Snapshot And API Changes
18+
19+
- [x] Extend `RemoteConversationSnapshot` with `pendingInteraction`
20+
- [x] Extend runner status to expose `pendingInteraction`
21+
- [x] Parse assistant `tool_call_permission` and `question_request` blocks into `RemotePendingInteraction`
22+
- Owner: Electron main
23+
- Estimate: 1d
24+
- Acceptance Criteria:
25+
- Satisfies spec acceptance criteria for structured `pendingInteraction`.
26+
- Remote delivery no longer depends on the generic desktop confirmation notice as the primary state.
27+
28+
## T2 Electron Main Integration
29+
30+
- [x] Add `RemoteConversationRunner.getPendingInteraction()`
31+
- [x] Add `RemoteConversationRunner.respondToPendingInteraction()`
32+
- [x] Continue polling the same assistant message after tool interaction responses
33+
- Owner: Electron main
34+
- Estimate: 1d
35+
- Acceptance Criteria:
36+
- Satisfies spec acceptance criteria for remote session continuity during paused interactions.
37+
- Chained interactions can surface one at a time without losing the bound session.
38+
39+
## T3 Telegram Buttons, Callback Handling, And Text Fallback
40+
41+
- [x] Render permission prompts with `Allow` / `Deny` inline buttons
42+
- [x] Render single-choice question prompts with option buttons and `Other` when custom input is allowed
43+
- [x] Parse `ALLOW` / `DENY`, exact numeric replies, exact labels, and custom text as appropriate
44+
- [x] Edit the original Telegram prompt into a resolved state immediately after button selection
45+
- Owner: Telegram remote
46+
- Estimate: 1.5d
47+
- Acceptance Criteria:
48+
- Satisfies spec acceptance criteria for Telegram permission buttons, single-choice buttons, and text fallback.
49+
- `question.multiple === true` stays plain-text only.
50+
51+
## T4 Feishu Card Rendering And Full Plain-Text Fallback
52+
53+
- [x] Render pending interactions as Feishu card-style outbound actions
54+
- [x] Fall back to the complete plain-text prompt when card delivery fails
55+
- [x] Parse `ALLOW` / `DENY`, exact numeric replies, exact labels, and custom text as appropriate
56+
- Owner: Feishu remote
57+
- Estimate: 1d
58+
- Acceptance Criteria:
59+
- Satisfies spec acceptance criteria for Feishu card rendering and fallback behavior.
60+
- Card failure still preserves permission/question details in the fallback message.
61+
62+
## T5 Token Refresh And Expired Callback Recovery
63+
64+
- [x] Store Telegram pending interaction callback tokens in `RemoteBindingStore`
65+
- [x] Refresh the pending prompt when an expired callback token is used and the interaction still exists
66+
- Owner: Telegram remote
67+
- Estimate: 0.5d
68+
- Acceptance Criteria:
69+
- Satisfies spec acceptance criteria for expired Telegram callback token refresh.
70+
- Prompt refresh only succeeds when `endpointKey`, `messageId`, and `toolCallId` still match.
71+
72+
## T6 Pending Prompt Re-Send And Command Gating
73+
74+
- [x] Add `/pending` for Telegram and Feishu
75+
- [x] Block `/new`, `/use`, `/model`, and unrelated plain-text turns while waiting
76+
- [x] Keep `/help`, `/status`, `/open`, and `/pending` available while waiting
77+
- Owner: Remote router
78+
- Estimate: 0.5d
79+
- Acceptance Criteria:
80+
- Satisfies spec acceptance criteria for `/pending`.
81+
- Satisfies spec acceptance criteria for blocked and allowed commands while waiting.
82+
83+
## T7 Tests
84+
85+
- [x] Add runner tests for extraction and follow-up execution
86+
- [x] Add Telegram tests for callback handling, `/pending`, prompt refresh, and non-blocking continuation
87+
- [x] Add Feishu tests for text parsing and fallback behavior
88+
- [x] Add binding/token lifecycle tests
89+
- Owner: QA + Electron main
90+
- Estimate: 1d
91+
- Acceptance Criteria:
92+
- Test coverage maps to the acceptance criteria in `spec.md`.
93+
- Regressions in pairing, binding, `/open`, `/status`, and normal non-interaction flows are covered by targeted tests.
94+
95+
## T8 Documentation And Review Notes
96+
97+
- [x] Document compatibility, rollout behavior, and command gating in `plan.md`
98+
- [x] Keep the feature scope explicit: Telegram buttons, Feishu cards, no Feishu callback endpoint
99+
- Owner: Remote control maintainer
100+
- Estimate: 0.5d
101+
- Acceptance Criteria:
102+
- Reviewers can understand rollout steps, dependencies, and compatibility notes without reading implementation files.
103+
- The blocked commands list and allowed commands list match the implemented router behavior and `spec.md`.

package.json

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -73,7 +73,7 @@
7373
"@larksuiteoapi/node-sdk": "^1.60.0",
7474
"@modelcontextprotocol/sdk": "^1.28.0",
7575
"axios": "^1.13.6",
76-
"better-sqlite3-multiple-ciphers": "12.4.1",
76+
"better-sqlite3-multiple-ciphers": "12.8.0",
7777
"cheerio": "^1.2.0",
7878
"chokidar": "^5.0.0",
7979
"compare-versions": "^6.1.1",
@@ -148,7 +148,7 @@
148148
"clsx": "^2.1.1",
149149
"cross-env": "^10.1.0",
150150
"dayjs": "^1.11.19",
151-
"electron": "^37.10.3",
151+
"electron": "^39.8.5",
152152
"electron-builder": "26.0.12",
153153
"electron-vite": "^4.0.1",
154154
"jsdom": "^26.1.0",

src/main/presenter/remoteControlPresenter/feishu/feishuClient.ts

Lines changed: 31 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
import * as Lark from '@larksuiteoapi/node-sdk'
22
import type { EventHandles } from '@larksuiteoapi/node-sdk'
3-
import type { FeishuTransportTarget } from '../types'
3+
import type { FeishuInteractiveCardPayload, FeishuTransportTarget } from '../types'
44

55
const FEISHU_OUTBOUND_TEXT_LIMIT = 8_000
66

@@ -18,6 +18,8 @@ const createTextPayload = (text: string): string =>
1818
text
1919
})
2020

21+
const createCardPayload = (card: FeishuInteractiveCardPayload): string => JSON.stringify(card)
22+
2123
const chunkFeishuText = (text: string): string[] => {
2224
const normalized = text.trim() || '(No text output)'
2325
if (normalized.length <= FEISHU_OUTBOUND_TEXT_LIMIT) {
@@ -157,4 +159,32 @@ export class FeishuClient {
157159
})
158160
}
159161
}
162+
163+
async sendCard(target: FeishuTransportTarget, card: FeishuInteractiveCardPayload): Promise<void> {
164+
const content = createCardPayload(card)
165+
if (target.replyToMessageId) {
166+
await this.sdk.im.message.reply({
167+
path: {
168+
message_id: target.replyToMessageId
169+
},
170+
data: {
171+
content,
172+
msg_type: 'interactive',
173+
reply_in_thread: Boolean(target.threadId)
174+
}
175+
})
176+
return
177+
}
178+
179+
await this.sdk.im.message.create({
180+
params: {
181+
receive_id_type: 'chat_id'
182+
},
183+
data: {
184+
receive_id: target.chatId,
185+
msg_type: 'interactive',
186+
content
187+
}
188+
})
189+
}
160190
}

0 commit comments

Comments
 (0)