[codex] rollout budget implementation (2/N)#28494
Conversation
b984e14 to
0cf2c87
Compare
0cf2c87 to
845074e
Compare
4473161 to
cbb2072
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: cbb207243d
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| self.state.get_or_init(|| { | ||
| Mutex::new(RolloutBudgetState { | ||
| config, | ||
| weighted_tokens_used: 0.0, |
There was a problem hiding this comment.
Seed budget usage when resuming rollouts
When a root thread is resumed or forked from an existing rollout that already contains TokenCount usage, this initialization still starts weighted_tokens_used at zero. The session later restores TokenUsageInfo from the rollout history, but the new budget state is never seeded from it, so the first resumed request can incorrectly tell the model it has the full budget remaining and only counts post-resume usage.
AGENTS.md reference: AGENTS.md:L103-L110
Useful? React with 👍 / 👎.
| }); | ||
| sess.record_conversation_items(turn_context, std::slice::from_ref(&response_item)) | ||
| .await; | ||
| budget.mark_reminder_delivered(sess.thread_id(), window_id, reminder); |
There was a problem hiding this comment.
Re-arm reminders after rollback
When a turn is interrupted or rolled back after this call, the contextual <rollout_budget> developer item can be trimmed from history, but the deliveries entry remains advanced. A later retry in the same thread/window at the same reminder index will skip the reminder even though it is no longer in model-visible context, so crossed budget thresholds can be silently missed after rollback.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
added support for fresh reminder with latest after rollback (but will not increase the shared budget from a rollback to 1. keep complexity low and 2. work was technically done in those turns and should rightfully consume token budget from the ledger
| token_usage, | ||
| } = compaction_output_result?; | ||
| if let Some(token_usage) = token_usage { | ||
| sess.record_rollout_budget_usage(&token_usage); |
There was a problem hiding this comment.
Charge legacy remote compactions against the budget
This budget charge only runs for the RemoteCompactionV2 stream. When a provider uses remote compaction with RemoteCompactionV2 disabled, tasks/compact.rs routes through compact_remote.rs, whose compact_conversation_history path never records compaction token usage, so those potentially large compaction calls do not reduce the shared rollout budget while local/v2 compactions do.
Useful? React with 👍 / 👎.
| config: RolloutBudgetConfig, | ||
| weighted_tokens_used: f64, | ||
| /// Last reminder delivered to each thread, so every thread observes crossed thresholds. | ||
| deliveries: HashMap<ThreadId, ThreadBudgetDelivery>, |
There was a problem hiding this comment.
what happens to these post resume?
cbb2072 to
7df6e0e
Compare
| if !session_configuration.session_source.is_non_root_agent() | ||
| && let Some(rollout_budget) = config.rollout_budget | ||
| { | ||
| agent_control.rollout_budget().configure(rollout_budget); |
There was a problem hiding this comment.
can we do this when we create agent_control?
pakrym-oai
left a comment
There was a problem hiding this comment.
Let's init agent control in one place if possible.
## What This PR defines the structured configuration contract for shared rollout token budgets (across ALL agent threads under 1 rollout). ```toml [features.rollout_budget] enabled = true limit_tokens = 100000 reminder_interval_tokens = 10000 sampling_token_weight = 1.0 prefill_token_weight = 0.1 ``` The reminder interval defaults to 10% of the rollout limit. Sampling and prefill weights default to `1.0`. ## Scope This PR only defines and validates configuration. It does not track usage, inject reminders, or stop a rollout. Accounting and reminders are implemented in the stacked follow-up #28494. The existing `token_budget` feature remains unchanged. `rollout_budget` has its own feature key and configuration type. ## Tests The config test verifies that the structured fields resolve into `RolloutBudgetConfig` and do not enable the existing `token_budget` feature. Local checks: - `just write-config-schema` - `just test -p codex-core load_config_resolves_rollout_budget` - `cargo check -p codex-thread-manager-sample` - `git diff --check` The full workspace test suite was not run locally.
e012ce9 to
0e01b88
Compare
0e01b88 to
9d97673
Compare
1807f7f to
4333d42
Compare
4333d42 to
bc62a74
Compare
Stack
Depends on #28746. This PR implements shared rollout-budget accounting and model-visible reminders using the configuration defined in #28746.
Description / Main changes to Core:
AgentControlwill now be the area where "rollout level" features & accounting will have to live. It is incorrectly named for this responsibility, but I think it can hold all the necessary shared state & features (rollout token budget, mutliple thread interruption responsibilitym etc)In this PR, we have one "token ledger" that each thread will subtract from when sampling. The "charge" will occur when response.completed() is done and the calculation will be done on the responses api usage carrier. The calculation will weigh sampling and pre-fill tokens as specified.
Every time the budget crosses the configured reminder threshold, a developer message is appended before the thread's next request
This remaining budget will always be restated/reminded after a compaction event.
Expiration and fan-out interruption will be in the stacked follow-up (and also live in Agent Control).
Reminders
"You have weighted {session_tokens_left} tokens left in the shared session token budget."
The first request in each thread context receives the current remainder. Later reminders are emitted after aggregate weighted usage crosses a configured interval. If several intervals are crossed before a thread sends another request, Core inserts one reminder with the latest remainder.
Compaction response usage is charged before the next context starts. The next reminder is appended after the compaction summary, leaving the initial context content stable.
Tests
Integration coverage verifies:
Local checks:
just fmtjust test -p codex-core rollout_budgetgit diff --checkThe full workspace test suite was not run locally.