[codex] Add request-scoped environment context#28936
Draft
sayan-oai wants to merge 8 commits into
Draft
Conversation
253a6ba to
bb8869d
Compare
Base automatically changed from
codex/turn-environment-starting-snapshots
to
main
June 19, 2026 05:06
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
Environment availability will eventually change between sampling requests, while
TurnContextis intentionally stable for an entire turn. Before enabling those updates, one model request needs a single frozen environment view so its model-visible context, advertised tools, and actual tool execution cannot disagree.This PR is stacked on #28683.
What changed
StepContextas the request-scoped owner ofTurnEnvironmentSnapshot, and remove environments fromTurnContext.ToolRouterowns the step it was built with.Most handler, runtime, and test changes mechanically replace
turn.environmentswith the frozen step snapshot. The main review path issession/step_context.rs->session/turn.rs->tools/spec_plan.rs/tools/router.rs->tools/context.rs. Code mode and delegated reviews are the non-mechanical pieces.This does not yet recapture steps between sampling requests, persist a
StepContextBaseline, or refresh AGENTS, skills, plugins, and MCP servers when an environment attaches. Those remain follow-up work.Test plan
just test -p codex-core sampling_request_keeps_one_environment_view_for_context_and_tool_executionjust test -p codex-core code_mode_background_keeps_running_on_later_turn_without_waitjust test -p codex-core snapshot_keeps_starting_environment_until_it_can_be_attachedReviewer notes
Overall goal
A turn can involve several model requests:
Environment availability may eventually change between those requests.
TurnContextis intentionally stable for the whole turn, so it is the wrong place for environment state that may change more frequently.This PR introduces
StepContext, which holds the environment snapshot used for a model request. Its central invariant is:Without that invariant, the model could see environment A, receive tools for A, but have an actual tool call run in newly selected environment B.
This is a consistency foundation, not the reactive update feature. Today
run_turn()captures oneStepContextbefore its sampling loop and reuses it. A later PR will capture replacements between sampling requests, inject environment changes, and addStepContextBaseline. AGENTS, skills, plugins, and MCP reconciliation are deliberately not migrated here.Change groups
session/step_context.rs,session/turn_context.rsTurnContext.environmentsis removed.StepContextcontains the environment snapshot, computes tool availability, and chooses an effective cwd.session/turn.rs,session/mod.rstools/spec_plan.rs,tools/router.rs,tools/context.rs,tools/parallel.rsToolInvocationreceives that exact step from the router, making router/execution mismatch difficult to express.context/environment_context.rs,context_manager/updates.rs,prompt_debug.rs,codex_thread.rs,session_startup_prewarm.rsTurnContextItembehavior is intentionally unchanged.compact.rs,compact_remote.rs,compact_remote_v2.rs,tasks/compact.rstools/handlers/{apply_patch,extension_tools,mcp,request_permissions,shell,view_image}.rs,tools/handlers/unified_exec/exec_command.rs,mcp_openai_file.rsturn.environmentswithinvocation.step.environments.tools/{orchestrator,sandboxing}.rs,tools/runtimes/{apply_patch,shell,unified_exec}.rs,unified_exec/{mod,process_manager}.rsToolCtx,ApprovalCtx, andUnifiedExecContextcarry the already-frozen snapshot through retries, sandboxing, and approvals. They do not take a new snapshot.codex_delegate.rs,guardian/{review,review_session}.rs,session/review.rs,tasks/review.rs, multi-agent spawn files, agent-job filestools/code_mode/{delegate,mod,execute_handler}.rssession/mcp.rs,tools/network_approval.rs,tasks/user_shell.rs*_tests.rs,session/tests.rs,spec_plan_tests.rs,router_tests.rsStepContext::local_for_test()after removing environments fromTurnContext.Non-obvious pieces
StepContext::effective_cwd()chooses the attached primary environment's cwd, then the first starting environment's known cwd, then the legacy turn cwd. This lets context mention the intended workspace before its filesystem is usable, while tool availability still counts only attached environments.ToolRouterownsArc<StepContext>rather than receiving one snapshot for construction and another during dispatch. This structurally enforces the main invariant.Child inheritance is deliberately asymmetric: the parent request may know about starting environments, but children receive only attached selections. This PR does not share pending startup operations across threads.
The skills/plugin path accepts
StepContextonly so extension input receives the correct attached environment handles. It does not rediscover skills or plugins.Tests worth reading
sampling_request_keeps_one_environment_view_for_context_and_tool_executionstarts with environment A, changes the live thread selection to B while the response is streaming, then proves the prompt, tools, and actual filesystem write all still use A.code_mode_background_keeps_running_on_later_turn_without_waitproves a yielded cell created in workspace A does not start executing nested tools in workspace B after a later turn begins.For an efficient review, read the core ownership, request lifecycle, router ownership, these two tests, and code mode closely. The tool-consumer and test-construction groups are largely safe to pattern-scan.