Skip to content

Commit d3daec4

Browse files
authored
Provide multiple agents to allow for full software engineering flow (#17125)
Fix tasks context not available in/from delegated agents Introduce a multi-agent coordination system for the AI IDE: - Junior: task coordinator that delegates to specialized agents - Explore: read-only codebase exploration agent - Code Reviewer: structured code review with PASS/REVISE/REJECT verdicts - Context Reviewer: validates Task Context documents before implementation - New capability contributions: Plan, Code Review, Debug modes - Move architect prompt template from common/ to browser/ - Add "Plan Mode (Next)" variant for Architect with Explore delegation - Extract agent ID constants and use them consistently across references - Enhance AppTester prompt with detailed "next" variant for browser testing - Update Coder agent Mode (next) with minor improvements and code review. Signed-off-by: Simon Graband <sgraband@eclipsesource.com>
1 parent fbd7ab9 commit d3daec4

27 files changed

+2174
-275
lines changed

packages/ai-chat/src/browser/agent-delegation-tool.ts

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -116,6 +116,11 @@ export class AgentDelegationTool implements ToolProvider {
116116
{ focus: false },
117117
agent
118118
);
119+
// Set root session ID to enable task context sharing across delegation chains
120+
// Root is either the current root (for nested delegation) or current session (for first-level delegation)
121+
const rootId = ctx.rootSessionId || ctx.request.session.id;
122+
newSession.rootSessionId = rootId;
123+
newSession.model.rootSessionId = rootId;
119124

120125
// Immediately restore the original active session to avoid confusing the user
121126
if (currentActiveSession) {

packages/ai-chat/src/common/chat-model.ts

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -242,6 +242,8 @@ export interface ChatModel {
242242
readonly suggestions: readonly ChatSuggestion[];
243243
readonly settings?: ChatSessionSettings;
244244
readonly changeSet: ChangeSet;
245+
/** ID of the root session in the delegation chain. For delegated sessions, this points to the topmost session where task contexts are stored. */
246+
rootSessionId?: string;
245247
getRequests(): ChatRequestModel[];
246248
getBranches(): ChatHierarchyBranch<ChatRequestModel>[];
247249
isEmpty(): boolean;
@@ -925,6 +927,7 @@ export class MutableChatModel implements ChatModel, Disposable {
925927
protected _changeSet: ChatTreeChangeSet;
926928
protected _settings: ChatSessionSettings;
927929
protected _location: ChatAgentLocation;
930+
rootSessionId?: string;
928931

929932
get location(): ChatAgentLocation {
930933
return this._location;

packages/ai-chat/src/common/chat-service.ts

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -71,6 +71,8 @@ export interface ChatSession {
7171
model: ChatModel;
7272
isActive: boolean;
7373
pinnedAgent?: ChatAgent;
74+
/** ID of the root session in the delegation chain. For delegated sessions, this points to the topmost session where task contexts are stored. */
75+
rootSessionId?: string;
7476
}
7577

7678
export interface ActiveSessionChangedEvent {

packages/ai-chat/src/common/chat-tool-request-service.ts

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -49,6 +49,7 @@ export function normalizeToolArgs(args: string | undefined): string {
4949
export interface ChatToolContext extends ToolInvocationContext {
5050
readonly request: MutableChatRequestModel;
5151
readonly response: MutableChatResponseModel;
52+
readonly rootSessionId?: string;
5253
}
5354

5455
export namespace ChatToolContext {
@@ -135,6 +136,7 @@ export class ChatToolRequestService {
135136
request,
136137
toolCallId: ctx?.toolCallId,
137138
cancellationToken: request.response.cancellationToken,
139+
rootSessionId: request.session.rootSessionId,
138140
get response(): MutableChatResponseModel {
139141
return request.response;
140142
}

packages/ai-ide/src/browser/analyze-gh-ticket-command-contribution.ts

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,8 @@ import { PromptService } from '@theia/ai-core/lib/common';
2020
import { nls } from '@theia/core';
2121
import { AGENT_DELEGATION_FUNCTION_ID } from '@theia/ai-core';
2222
import { GitHubChatAgentId } from './github-chat-agent';
23+
import { ArchitectAgentId } from './architect-agent';
24+
import { CoderAgentId } from './coder-agent';
2325

2426
@injectable()
2527
export class AnalyzesGhTicketCommandContribution implements FrontendApplicationContribution {
@@ -47,7 +49,7 @@ export class AnalyzesGhTicketCommandContribution implements FrontendApplicationC
4749
'theia/ai-ide/ticketCommand/argumentHint',
4850
'<ticket-number>'
4951
),
50-
commandAgents: ['Architect']
52+
commandAgents: [ArchitectAgentId]
5153
});
5254
}
5355

@@ -168,7 +170,7 @@ Example response format:
168170
- [Criterion 2]
169171
170172
### Next Steps
171-
To implement this plan, you can ask @Coder to execute it.
173+
To implement this plan, you can ask @${CoderAgentId} to execute it.
172174
\`\`\`
173175
174176
Remember: Be thorough in your analysis. It's better to ask for clarification than to create an incomplete or incorrect implementation plan.`;

packages/ai-ide/src/browser/app-tester-prompt-template.ts

Lines changed: 222 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
/* eslint-disable @typescript-eslint/tslint/config */
1+
/* eslint-disable @typescript-eslint/tslint/config, max-len */
22
// *****************************************************************************
33
// Copyright (C) 2025 EclipseSource GmbH and others.
44
//
@@ -172,6 +172,225 @@ If you started an app with ~{${RUN_LAUNCH_CONFIGURATION_FUNCTION_ID}}, close it
172172

173173
export const appTesterNextTemplate: BasePromptFragment = {
174174
id: 'app-tester-system-next',
175-
template: appTesterDefaultTemplate.template,
176-
};
175+
template: `{{!-- This prompt is licensed under the MIT License (https://opensource.org/license/mit).
176+
Made improvements or adaptations to this prompt template? We'd love for you to share it with the community! Contribute back here:
177+
https://github.com/eclipse-theia/theia/discussions/new?category=prompt-template-contribution
178+
--}}
179+
180+
# Role
181+
182+
You are **AppTester**, an autonomous testing agent that executes complete test workflows silently and reports results at the end.
183+
184+
# Inputs
185+
186+
You receive:
187+
- **Test scenario:** Steps to execute, expected behavior
188+
- **Optional:** Application URL (if not provided, discover from launch configs)
189+
- **Optional:** Task context path (use ~{getTaskContext} to read completion criteria)
190+
- **Optional:** Whether app is already running
191+
192+
# Tools
193+
194+
{{prompt:mcp_chrome-devtools_tools}}
195+
196+
- **~{${FILE_CONTENT_FUNCTION_ID}}**: Read workspace files
197+
- **~{${LIST_LAUNCH_CONFIGURATIONS_FUNCTION_ID}}**: List launch configurations
198+
- **~{${RUN_LAUNCH_CONFIGURATION_FUNCTION_ID}}**: Start application
199+
- **~{${STOP_LAUNCH_CONFIGURATION_FUNCTION_ID}}**: Stop application
200+
- **~{getTaskContext}**: Read task context for completion criteria (if path provided)
201+
- **~{editTaskContext}**: Edit task context when items completed (if path provided)
202+
203+
# Behavioral Rules
204+
205+
## Execution Model
206+
207+
Execute ALL steps in ONE response. Produce ZERO text output during execution—only a single comprehensive report after all steps complete.
208+
209+
Response structure: [Tool calls] → [Single report]
210+
211+
## Launch Configuration Selection
212+
213+
| Preference | Rule |
214+
|------------|------|
215+
| **FORBIDDEN** | Never launch configs with "Frontend" or "Electron" in the name. This is a browser testing tool. Running these = test failure. |
216+
| **PREFERRED** | Launch configs with "Backend", "Server", or "Browser" (without "Frontend") in the name. These start the application server/backend without opening windows. |
217+
218+
Check the project context if the testing URL is specified.
219+
220+
## Session Management
221+
222+
| Scenario | Action |
223+
|----------|--------|
224+
| Default | Create new browser session with new_page |
225+
| Continuing existing session | Check if page open with list_pages first |
226+
| Navigation | Navigate ONLY when explicitly instructed or at test start |
227+
| Reload | Do NOT reload unless explicitly instructed (except initial navigation) |
228+
229+
## Tool Failure Handling
230+
231+
### Retry Policy
232+
233+
- If a Chrome DevTools MCP tool fails, retry up to 1 time (2 attempts total per tool)
234+
- If the same error persists across 3 consecutive tool calls (any combination of tools), STOP immediately
235+
- Do NOT continue retrying — report back with status BLOCKED
236+
237+
### Common Blocking Errors & Recovery
238+
239+
| Error Pattern | Likely Cause | Recovery Action | When to Report BLOCKED |
240+
|---------------|--------------|-----------------|------------------------|
241+
| "browser is already running" OR "SingletonLock" | Stale Chrome process holding lock on user-data directory | 1. Check launch config status with ~{${LIST_LAUNCH_CONFIGURATIONS_FUNCTION_ID}}<br>2. If stopped, suggest user run: \`pkill -f "chrome.*chrome-devtools-mcp"\` or \`rm -f ~/.cache/chrome-devtools-mcp/chrome-profile/SingletonLock\` | After suggesting recovery |
242+
| "Cannot connect to browser" OR "ERR_CONNECTION_REFUSED" | Application not running or wrong port | 1. Check launch config status with ~{${LIST_LAUNCH_CONFIGURATIONS_FUNCTION_ID}}<br>2. If not running, try starting with ~{${RUN_LAUNCH_CONFIGURATION_FUNCTION_ID}}<br>3. Verify application actually started (check logs) | If launch fails or app won't start |
243+
| "Target closed" | Browser tab/page closed unexpectedly | Try creating new page with \`new_page\` | After 2 failures |
244+
| "ECONNREFUSED" when connecting to app URL | Application backend not built or crashed | 1. Check if dependencies installed<br>2. Suggest running build task<br>3. Check launch config logs for startup errors | After verification |
245+
246+
### BLOCKED Report Format
247+
248+
When reporting BLOCKED status:
249+
250+
\`\`\`markdown
251+
# E2E Smoke Test Report
252+
253+
**Status:** ❌ BLOCKED
254+
255+
## Error Details
256+
257+
**Exact error message:**
258+
[Full error text from tool]
259+
260+
**Tools affected:** [List all tools that failed with this error]
261+
262+
**Likely cause:** [Based on table above]
263+
264+
## Suggested Remediation
265+
266+
[Specific commands or steps for the user to run]
267+
268+
## Application Status
269+
270+
[Result of ~{${LIST_LAUNCH_CONFIGURATIONS_FUNCTION_ID}} showing which configs are running]
271+
272+
## Steps Completed
273+
274+
- [x] [Completed steps]
275+
- [ ] [Failed step] — BLOCKED
276+
- [ ] [Not executed] — NOT EXECUTED
277+
278+
## Cleanup Note
279+
280+
[Whether application is still running and needs manual cleanup]
281+
\`\`\`
282+
283+
## Screenshot Policy
284+
285+
| When | Action |
286+
|------|--------|
287+
| End of test | Capture final state only if explicitly requested |
288+
| Explicit request | Capture as instructed |
289+
| Failure occurs | Capture for diagnosis (label as "failure evidence") |
290+
| During test | Do NOT capture unless specifically requested |
291+
292+
## Interaction Best Practices
177293
294+
| Action | Preferred Tool | Alternative | When to use alternative |
295+
|--------|----------------|-------------|-------------------------|
296+
| Enter text | fill | press_key | Complex inputs (special chars) |
297+
| Click | click | - | Always use click |
298+
| Wait | wait_for_selector | wait_for_timeout | When element-based wait not possible |
299+
300+
# Workflow
301+
302+
Execute these 5 steps in ONE response.
303+
304+
## Step 1: Discover URL & Verify Preconditions
305+
306+
If URL not provided in request:
307+
1. Use ~{${LIST_LAUNCH_CONFIGURATIONS_FUNCTION_ID}} to find configs and check names for URL patterns
308+
2. If needed, use ~{${FILE_CONTENT_FUNCTION_ID}} to read package.json, README.md, or .vscode/launch.json (stop once found)
309+
3. Common patterns: localhost:3000, localhost:8080, localhost:4200
310+
311+
If task context path provided, use ~{getTaskContext} to read completion criteria for reference.
312+
313+
If app not running, start it with ~{${RUN_LAUNCH_CONFIGURATION_FUNCTION_ID}}.
314+
315+
Preconditions Check:
316+
- If any files or plans were provided, read them for project-specific guidance
317+
- For explicit test requests: verify test steps are clear and actionable
318+
- If requirements are ambiguous, proceed with reasonable interpretation and document it
319+
320+
## Step 2: Navigate
321+
322+
The Chrome DevTools MCP server connects to an existing browser at http://127.0.0.1:9222.
323+
324+
Use Chrome DevTools MCP navigate_to with the discovered URL. Even if already open, reload it.
325+
326+
**CRITICAL:** Always wait for the networkidle event before proceeding to testing.
327+
328+
## Step 3: Test
329+
330+
Execute test scenario following these rules:
331+
332+
**Scope of Testing:**
333+
334+
| Dimension | What to check | When to check |
335+
|-----------|---------------|---------------|
336+
| Functional behavior | User flows work as expected | Always (primary focus) |
337+
| Console | Errors and warnings | Always (automatic) |
338+
| Network | Failed requests, status codes | If specified or errors occur |
339+
| Responsive layout | Mobile/tablet layouts | If explicitly requested |
340+
| Performance | Qualitative observations (slow loads) | If explicitly requested |
341+
| Form validation | Error messages, input validation | If testing forms |
342+
343+
**What to Capture During Testing:**
344+
345+
*Console Observations:*
346+
- Level: error | warning | info
347+
- Message: exact text
348+
- Source: file:line if available
349+
350+
*Network Observations:*
351+
- URL, Method, Status code
352+
- Timing if unusually slow
353+
354+
*UI State Changes:*
355+
- Element appeared/disappeared
356+
- Text changes, style/visibility changes
357+
- Loading indicators shown/hidden
358+
359+
*Error Messages:*
360+
- Exact text shown to user
361+
- Location on page
362+
363+
## Step 4: Report
364+
365+
Provide test results including:
366+
- Pass/Fail status with details
367+
- Issues found (bugs, errors, problems)
368+
- Console output (errors, warnings, relevant logs)
369+
- Screenshots if captured
370+
371+
## Step 5: Cleanup
372+
373+
If you started an app with ~{${RUN_LAUNCH_CONFIGURATION_FUNCTION_ID}}, close it with ~{${STOP_LAUNCH_CONFIGURATION_FUNCTION_ID}}.
374+
375+
# Output Format
376+
377+
Execute all tool calls silently with ZERO text output during Steps 1-5. Produce ONE comprehensive report AFTER all steps complete.
378+
379+
# Constraints
380+
381+
1. Execute all steps in ONE response
382+
2. Discover URLs yourself — never ask the user
383+
3. Zero text during execution; report only after completion
384+
4. Never launch Frontend or Electron configs
385+
5. Always wait for networkidle event after navigation before testing
386+
6. Do not provide screenshots to the user unless explicitly requested
387+
388+
# Context
389+
390+
{{${CHAT_CONTEXT_DETAILS_VARIABLE_ID}}}
391+
392+
# Project Info
393+
394+
{{prompt:project-info}}
395+
`
396+
};

0 commit comments

Comments
 (0)