Return harness session IDs from spawns#974
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds sessionId metadata across broker spawn flows and SDKs, pre-creates resumable Codex threads via JSON‑RPC, includes sessionId in broker events and HTTP replies, and surfaces sessionId on SDK agent handles and spawn results. ChangesSession ID support across broker and SDKs
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 911c0b6bce
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| crate::codex_session::create_resumable_codex_thread( | ||
| &resolved_cli, | ||
| cwd, | ||
| &self.worker_env, | ||
| ) |
There was a problem hiding this comment.
Pass effective Codex args into thread bootstrap
The pre-created thread is started with only resolved_cli, cwd, and env, so inline Codex options already parsed into effective_args (for example --profile, --config, or --cwd) are ignored during thread/start. The spawned PTY then runs Codex with those options plus resume <thread_id>, which can point at a different profile/config context than the one used to mint the thread and causes resume failures or incorrect session context.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Addressed in 8350f42. create_resumable_codex_thread now accepts cli_args and forwards the --profile, --config / -c, --cd / --cwd, --sandbox / -s, --ask-for-approval / --approval-policy, --dangerously-bypass-approvals-and-sandbox, and --full-auto flags to the app-server bootstrap so the pre-created thread runs under the same context the PTY will resume into.
Generated by Claude Code
| .await | ||
| .with_context(|| { | ||
| format!( | ||
| "failed to create resumable Codex session for '{}'", | ||
| spec.name | ||
| ) | ||
| })?; |
There was a problem hiding this comment.
Fall back when Codex session precreation fails
Any failure in create_resumable_codex_thread(...) is propagated with ?, which aborts agent spawn before the PTY is launched. That turns optional session metadata into a hard dependency: if codex app-server is unavailable, slow to start, or temporarily errors, Codex spawns now fail entirely instead of starting without sessionId as they did before this change.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Addressed in 8350f42. The ? propagation was replaced with a match that logs a warning and falls through without setting session_id, so a flaky/unavailable codex app-server no longer aborts the agent spawn.
Generated by Claude Code
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
packages/sdk-py/src/agent_relay/relay.py (1)
598-613:⚠️ Potential issue | 🟠 Major | ⚡ Quick winBackfill
session_idfor existing handles inlist_agents.On Line [601], existing handles are returned directly, so Line [609] never updates
session_idfor already-known agents. This can leave stale/missing session metadata.Proposed fix
async def list_agents(self) -> list[Agent]: client = await self._ensure_started() raw_list = await client.list_agents() agents = [] for entry in raw_list: name = entry.get("name", "") - existing = self._known_agents.get(name) - if existing: - agents.append(existing) - else: - agent = Agent( - name=name, - runtime=entry.get("runtime", "pty"), - channels=entry.get("channels", []), - relay=self, - session_id=entry.get("sessionId"), - ) - self._known_agents[name] = agent - agents.append(agent) + agent = self._ensure_agent_handle( + name, + entry.get("runtime", "pty"), + entry.get("channels", []), + session_id=entry.get("sessionId"), + ) + agents.append(agent) return agents🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/sdk-py/src/agent_relay/relay.py` around lines 598 - 613, The loop in list_agents returns existing Agent instances from self._known_agents without updating their session_id, leaving stale/missing session metadata; modify the branch where existing = self._known_agents.get(name) to set/update existing.session_id (using entry.get("sessionId")) before appending to agents so known Agent objects are backfilled with the current session id; reference the list_agents function, the _known_agents dict, and the Agent.session_id attribute when making the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@crates/broker/src/codex_session.rs`:
- Around line 18-25: The timeout can drop the inner future and leave the spawned
codex process running; update the process spawn in
create_resumable_codex_thread_inner (the Command used to run `{codex_bin}
app-server --listen stdio://`) to call kill_on_drop(true) so that dropping the
child handle (when tokio::time::timeout around
create_resumable_codex_thread_inner fires) will terminate the child; keep
existing cleanup logic that does child.kill().await / child.wait().await as-is
for normal shutdown paths.
In `@packages/sdk/src/relay.ts`:
- Around line 1099-1103: In listAgents (the map callback that checks
this.knownAgents), update the existing agent's sessionId from the incoming entry
before returning it so known handles don't keep stale/missing sessionId; i.e.,
when const existing = this.knownAgents.get(entry.name) is truthy, set
existing.sessionId = entry.sessionId (or use the agent's setter) and then return
existing instead of returning it unchanged; keep the makeAgent and
this.knownAgents.set logic for new agents as-is.
In `@src/cli/commands/agent-management.ts`:
- Around line 442-449: The spawn logging currently only uses spawned?.pid and
drops spawnResult?.pid leading to missing PID info; update the code around
sessionId, spawned and spawnResult to compute a pid variable (e.g., const pid =
spawned?.pid ?? spawnResult?.pid) and use pid in the deps.log branches (replace
spawned?.pid checks/mentions with pid) so the log falls back to spawnResult.pid
when spawned is unavailable while preserving sessionId handling.
---
Outside diff comments:
In `@packages/sdk-py/src/agent_relay/relay.py`:
- Around line 598-613: The loop in list_agents returns existing Agent instances
from self._known_agents without updating their session_id, leaving stale/missing
session metadata; modify the branch where existing =
self._known_agents.get(name) to set/update existing.session_id (using
entry.get("sessionId")) before appending to agents so known Agent objects are
backfilled with the current session id; reference the list_agents function, the
_known_agents dict, and the Agent.session_id attribute when making the change.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: fde0b003-f312-42de-8af3-94e8260e72f7
📒 Files selected for processing (20)
CHANGELOG.mdcrates/broker/src/codex_session.rscrates/broker/src/lib.rscrates/broker/src/protocol.rscrates/broker/src/runtime/api.rscrates/broker/src/runtime/relaycast_events.rscrates/broker/src/runtime/spawn_spec.rscrates/broker/src/runtime/tests.rscrates/broker/src/runtime/worker_events.rscrates/broker/src/supervisor.rscrates/broker/src/worker.rspackages/sdk-py/src/agent_relay/relay.pypackages/sdk-swift/Sources/AgentRelaySDK/RelayTypes.swiftpackages/sdk/src/client.tspackages/sdk/src/index.tspackages/sdk/src/lifecycle-hooks.tspackages/sdk/src/protocol.tspackages/sdk/src/relay.tspackages/sdk/src/types.tssrc/cli/commands/agent-management.ts
There was a problem hiding this comment.
4 issues found across 20 files
Reply with feedback, questions, or to request a fix.
Re-trigger cubic
911c0b6 to
bf8d941
Compare
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
packages/sdk-py/src/agent_relay/relay.py (1)
598-610:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winRefresh
session_idfor existing handles inlist_agents.When an agent is already in
_known_agents, the newsessionIdfrom broker list output is dropped, so handles can retain stale/missing session IDs.Suggested fix
for entry in raw_list: name = entry.get("name", "") - existing = self._known_agents.get(name) - if existing: - agents.append(existing) - else: - agent = Agent( - name=name, - runtime=entry.get("runtime", "pty"), - channels=entry.get("channels", []), - relay=self, - session_id=entry.get("sessionId"), - ) - self._known_agents[name] = agent - agents.append(agent) + agent = self._ensure_agent_handle( + name=name, + runtime=entry.get("runtime", "pty"), + channels=entry.get("channels", []), + session_id=entry.get("sessionId"), + ) + agents.append(agent) return agents🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/sdk-py/src/agent_relay/relay.py` around lines 598 - 610, In list_agents, when an entry matches an existing handle in self._known_agents you must refresh the handle's session id with the broker's sessionId instead of dropping it; inside the loop where existing = self._known_agents.get(name) is truthy, set the existing object's session_id (and optionally sessionId attribute name used by Agent) from entry.get("sessionId") before appending the existing handle to agents so it doesn't retain a stale/missing session id.
🧹 Nitpick comments (1)
packages/sdk/src/protocol.ts (1)
24-24: ⚡ Quick winComplete
sessionIdtyping onBrokerStatus.agents.
sessionIdis now modeled on events and list-agent SDK types, butBrokerStatus.agentsstill omits it. That leavesgetStatus()consumers without typed access to the same field.🔧 Suggested patch
export interface BrokerStatus { agent_count: number; agents: Array<{ name: string; runtime: AgentRuntime; provider?: HeadlessProvider; cli?: string; model?: string; + sessionId?: string; team?: string; channels: string[]; parent?: string; pid?: number; last_activity_at?: string;Also applies to: 261-261, 402-402
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/sdk/src/protocol.ts` at line 24, BrokerStatus.agents is missing the sessionId field, causing getStatus() consumers to lack typed access; add sessionId?: string to the agent/interface definition used by BrokerStatus.agents (the same agent type used by event and list-agent SDK types) and update any other agent-related type declarations so they consistently include sessionId; ensure the exported types referenced by getStatus(), the events, and list-agent SDK all reflect this addition.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@packages/sdk-py/src/agent_relay/relay.py`:
- Around line 598-610: In list_agents, when an entry matches an existing handle
in self._known_agents you must refresh the handle's session id with the broker's
sessionId instead of dropping it; inside the loop where existing =
self._known_agents.get(name) is truthy, set the existing object's session_id
(and optionally sessionId attribute name used by Agent) from
entry.get("sessionId") before appending the existing handle to agents so it
doesn't retain a stale/missing session id.
---
Nitpick comments:
In `@packages/sdk/src/protocol.ts`:
- Line 24: BrokerStatus.agents is missing the sessionId field, causing
getStatus() consumers to lack typed access; add sessionId?: string to the
agent/interface definition used by BrokerStatus.agents (the same agent type used
by event and list-agent SDK types) and update any other agent-related type
declarations so they consistently include sessionId; ensure the exported types
referenced by getStatus(), the events, and list-agent SDK all reflect this
addition.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: aaaa048b-597b-450a-91e1-8e8c9d39e96e
📒 Files selected for processing (22)
CHANGELOG.mdcrates/broker/src/codex_session.rscrates/broker/src/lib.rscrates/broker/src/protocol.rscrates/broker/src/runtime/api.rscrates/broker/src/runtime/relaycast_events.rscrates/broker/src/runtime/spawn_spec.rscrates/broker/src/runtime/tests.rscrates/broker/src/runtime/worker_events.rscrates/broker/src/supervisor.rscrates/broker/src/worker.rspackages/sdk-py/src/agent_relay/relay.pypackages/sdk-swift/Sources/AgentRelaySDK/RelayTypes.swiftpackages/sdk/src/__tests__/lifecycle-hooks.test.tspackages/sdk/src/client.tspackages/sdk/src/index.tspackages/sdk/src/lifecycle-hooks.tspackages/sdk/src/protocol.tspackages/sdk/src/relay.tspackages/sdk/src/types.tssrc/cli/commands/agent-management.test.tssrc/cli/commands/agent-management.ts
✅ Files skipped from review due to trivial changes (5)
- crates/broker/src/supervisor.rs
- CHANGELOG.md
- packages/sdk/src/index.ts
- crates/broker/src/runtime/tests.rs
- crates/broker/src/worker.rs
- broker/codex_session: forward profile/config/cwd flags to app-server bootstrap so the pre-created thread runs under the same context the PTY will resume into; add kill_on_drop(true) so timeout cancellation cleanly terminates the child app-server process. - broker/worker: degrade gracefully when Codex session pre-creation fails — log a warning and spawn without sessionId instead of aborting the whole agent spawn; treat dangling value-taking flags as Unknown in codex_session_reference to avoid pre-creating sessions for malformed CLI input. - sdk-swift: add CodingKeys session_id mapping on AgentSpawnedEvent and WorkerReadyEvent so JSON decoding matches the broker's snake_case key. - sdk relay.ts: route listAgents through ensureAgentHandle so known handles refresh sessionId from the broker's view. - cli agent-management: fall back to spawnResult.pid when listAgents is stale so spawn output still surfaces a PID. - cli new.test: include runtime in the /api/spawn mock body so it satisfies the new Zod SpawnAgentResult schema.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@crates/broker/src/worker.rs`:
- Around line 276-301: The pre-created Codex session is minted with the
pre-mutation context (using create_resumable_codex_thread with resolved_cli,
cwd, self.worker_env, effective_args) but later mutations (e.g., --cd/--cwd in
effective_args or injected --dangerously-bypass-approvals-and-sandbox) can
change the real PTY context; fix by moving or re-running the session-creation
step so it uses the final PTY context: either defer calling
create_resumable_codex_thread until after all mutations to effective_args and
any auto-injections are applied, or recompute the final cwd/flags from
resolved_cli and final effective_args and pass those final values into
create_resumable_codex_thread; ensure spec.session_id and harness_session_args
are set only after a successful creation with that final context.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: 35a0fbfb-04a6-4542-858b-8de041a8f12c
📒 Files selected for processing (6)
crates/broker/src/codex_session.rscrates/broker/src/worker.rspackages/sdk-swift/Sources/AgentRelaySDK/RelayTypes.swiftpackages/sdk/src/relay.tssrc/cli/commands/agent-management.tssrc/cli/commands/new.test.ts
| match crate::codex_session::create_resumable_codex_thread( | ||
| &resolved_cli, | ||
| cwd, | ||
| &self.worker_env, | ||
| &effective_args, | ||
| ) | ||
| .await | ||
| { | ||
| Ok(thread_id) => { | ||
| tracing::info!( | ||
| worker = %spec.name, | ||
| session_id = %thread_id, | ||
| "created resumable Codex session for spawned PTY" | ||
| ); | ||
| spec.session_id = Some(thread_id.clone()); | ||
| harness_session_args.push("resume".to_string()); | ||
| harness_session_args.push(thread_id); | ||
| } | ||
| Err(err) => { | ||
| tracing::warn!( | ||
| worker = %spec.name, | ||
| error = %err, | ||
| "failed to pre-create resumable Codex session; spawning without sessionId" | ||
| ); | ||
| } | ||
| } |
There was a problem hiding this comment.
Bootstrap the Codex thread with the final PTY context.
This pre-create path still uses spec.cwd plus the pre-mutation effective_args, but the actual Codex spawn can later run with a different context: --cd/--cwd inside effective_args overrides the working directory, and --dangerously-bypass-approvals-and-sandbox may be auto-injected after this block. That means the minted session_id can belong to a thread created under different cwd/approval settings than the PTY that resumes it.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@crates/broker/src/worker.rs` around lines 276 - 301, The pre-created Codex
session is minted with the pre-mutation context (using
create_resumable_codex_thread with resolved_cli, cwd, self.worker_env,
effective_args) but later mutations (e.g., --cd/--cwd in effective_args or
injected --dangerously-bypass-approvals-and-sandbox) can change the real PTY
context; fix by moving or re-running the session-creation step so it uses the
final PTY context: either defer calling create_resumable_codex_thread until
after all mutations to effective_args and any auto-injections are applied, or
recompute the final cwd/flags from resolved_cli and final effective_args and
pass those final values into create_resumable_codex_thread; ensure
spec.session_id and harness_session_args are set only after a successful
creation with that final context.
Workspace packages with pinned deps (e.g. packages/telemetry pins its own posthog-node ^5.29.2 while @posthog/next pulls in 5.35.0) install into packages/*/node_modules. The previous cache only restored root node_modules, so on a cache hit the conditional `npm ci` step was skipped and the workspace deps were missing — failing the build with "Cannot find module 'posthog-node'". Extend the cache path on both package-validation.yml and workflow-validation.yml to include packages/*/node_modules.
There was a problem hiding this comment.
🧹 Nitpick comments (1)
.github/workflows/workflow-validation.yml (1)
36-38: ⚡ Quick winAdd explanatory comments for consistency.
The
package-validation.ymlworkflow includes detailed comments (lines 69-73) explaining whypackages/*/node_modulesmust be cached. This workflow would benefit from the same documentation for consistency and future maintainability.📝 Suggested comment addition
+ # Cache node_modules - skip npm ci entirely if unchanged. + # Workspace packages with pinned deps (e.g. packages/telemetry pins + # its own posthog-node) install into packages/*/node_modules, so we + # have to cache those too — otherwise the build can't resolve them + # on a cache hit and `npm ci` is skipped. - name: Cache node_modules🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.github/workflows/workflow-validation.yml around lines 36 - 38, Add an explanatory comment above the cache "path" block that lists "node_modules" and "packages/*/node_modules" describing why we cache both the root node_modules and per-package node_modules (to preserve hoisted and workspace-specific dependencies between runs and avoid install churn), mirroring the note in package-validation.yml; update the same path block that contains the entries "node_modules" and "packages/*/node_modules" so maintainers see the rationale for caching both locations.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In @.github/workflows/workflow-validation.yml:
- Around line 36-38: Add an explanatory comment above the cache "path" block
that lists "node_modules" and "packages/*/node_modules" describing why we cache
both the root node_modules and per-package node_modules (to preserve hoisted and
workspace-specific dependencies between runs and avoid install churn), mirroring
the note in package-validation.yml; update the same path block that contains the
entries "node_modules" and "packages/*/node_modules" so maintainers see the
rationale for caching both locations.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: 7b332514-2a43-431b-90cd-121ee2fd9e6c
📒 Files selected for processing (2)
.github/workflows/package-validation.yml.github/workflows/workflow-validation.yml
Summary:
Validation:
Note: