Skip to content

common: add two-phase graceful reasoning budget termination ...#21141

Open
zeel2104 wants to merge 2 commits into
ggml-org:masterfrom
zeel2104:feat/reasoning-budget-conclusion
Open

common: add two-phase graceful reasoning budget termination ...#21141
zeel2104 wants to merge 2 commits into
ggml-org:masterfrom
zeel2104:feat/reasoning-budget-conclusion

Conversation

@zeel2104
Copy link
Copy Markdown

Overview

Implements Option B from issue #20632 a two-phase graceful termination for the reasoning budget sampler, replacing hard truncation with a structured conclusion phase.
Previously, --reasoning-budget-message injected a wrap-up string at exactly token N, leaving the model zero tokens to act on it functionally equivalent to raw truncation. This PR splits the budget into a thinking phase and a conclusion phase:

At the end of the thinking budget, the message is forced token-by-token (INJECTING state)
The model is then given --reasoning-budget-conclusion N free tokens to terminate naturally (CONCLUDING state)
If the model does not produce the end tag within those tokens, the hard-cutoff safety net fires (FORCING state) preserving existing behavior

New state machine: IDLE → COUNTING → INJECTING → CONCLUDING → FORCING → DONE
Setting --reasoning-budget-conclusion 0 (the default) preserves existing behavior exactly fully backward compatible.

Requirements

I have read and agree with the contributing guidelines
AI usage disclosure: I used AI for minor assistance. I have reviewed all changes, understand them fully, and take responsibility for the submitted code.

@zeel2104 zeel2104 requested a review from a team as a code owner March 29, 2026 03:03
@ggml-gh-bot

This comment was marked as off-topic.

@zeel2104
Copy link
Copy Markdown
Author

I did disclose I used AI for minor assistance. Thank you

Copy link
Copy Markdown
Member

@pwilkin pwilkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice extension, but please add tests into test-reasoning-budget.cpp

@zeel2104 zeel2104 requested a review from ggerganov as a code owner March 30, 2026 01:42
@github-actions github-actions Bot added the testing Everything test related label Mar 30, 2026
Comment thread common/common.h Outdated
common_reasoning_format reasoning_format = COMMON_REASONING_FORMAT_DEEPSEEK;
int enable_reasoning = -1; // -1 = auto, 0 = disable, 1 = enable
int reasoning_budget = -1;
int reasoning_budget_conclusion = 0; // tokens reserved for conclusion phase (0 = disabled)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should not be added in common_params. Better to first fix #20429 before merging this.

Add --reasoning-budget-conclusion N flag that splits the reasoning budget
into a thinking phase and a conclusion phase:

- At end of thinking budget, inject --reasoning-budget-message and enter
  INJECTING state (forces message tokens token-by-token)
- After message is injected, enter CONCLUDING state giving the model N
  free tokens to terminate naturally
- If model does not self-terminate, fall through to FORCING (hard cutoff)
  as a safety net

New states added to the sampler state machine:
  IDLE -> COUNTING -> INJECTING -> CONCLUDING -> FORCING -> DONE

Setting --reasoning-budget-conclusion 0 (the default) preserves existing
behavior exactly — fully backward compatible.

Add 5 new tests to test-reasoning-budget.cpp covering:
- natural end in conclusion window (no FORCING)
- conclusion budget exhausted, safety net fires
- no message tokens, conclusion budget only
- backward compat with conclusion_budget=0
- multi-token message injection

Implements Option B from issue ggml-org#20632.
@zeel2104 zeel2104 force-pushed the feat/reasoning-budget-conclusion branch from deb1c35 to 02d4c32 Compare April 1, 2026 04:19
@zeel2104
Copy link
Copy Markdown
Author

zeel2104 commented Apr 1, 2026

reasoning_budget_conclusion has been removed from common_params. It now lives only in common_params_sampling, consistent with the direction of #20429. The CLI arg writes directly to params.sampling.reasoning_budget_conclusion only.

@MarkErik
Copy link
Copy Markdown

MarkErik commented Apr 2, 2026

I hope that this gets added soon!

I recently updated from 8500 to 8600, which I believe must have included the new reasoning budget, and now many coding tasks in Roo Code just stall during the thinking phase. (Using GLM4.7)

@vanmilleru
Copy link
Copy Markdown

Can it also support text-based reasoning effort (none, low, medium, high)?
openai api has this and many projects like Openwebui are already using this as reasoning budget

@pwilkin
Copy link
Copy Markdown
Member

pwilkin commented May 1, 2026

Can you please fix conflicts?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

testing Everything test related

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants