common: add two-phase graceful reasoning budget termination ...#21141
common: add two-phase graceful reasoning budget termination ...#21141zeel2104 wants to merge 2 commits into
Conversation
This comment was marked as off-topic.
This comment was marked as off-topic.
|
I did disclose I used AI for minor assistance. Thank you |
pwilkin
left a comment
There was a problem hiding this comment.
Nice extension, but please add tests into test-reasoning-budget.cpp
| common_reasoning_format reasoning_format = COMMON_REASONING_FORMAT_DEEPSEEK; | ||
| int enable_reasoning = -1; // -1 = auto, 0 = disable, 1 = enable | ||
| int reasoning_budget = -1; | ||
| int reasoning_budget_conclusion = 0; // tokens reserved for conclusion phase (0 = disabled) |
There was a problem hiding this comment.
This should not be added in common_params. Better to first fix #20429 before merging this.
Add --reasoning-budget-conclusion N flag that splits the reasoning budget into a thinking phase and a conclusion phase: - At end of thinking budget, inject --reasoning-budget-message and enter INJECTING state (forces message tokens token-by-token) - After message is injected, enter CONCLUDING state giving the model N free tokens to terminate naturally - If model does not self-terminate, fall through to FORCING (hard cutoff) as a safety net New states added to the sampler state machine: IDLE -> COUNTING -> INJECTING -> CONCLUDING -> FORCING -> DONE Setting --reasoning-budget-conclusion 0 (the default) preserves existing behavior exactly — fully backward compatible. Add 5 new tests to test-reasoning-budget.cpp covering: - natural end in conclusion window (no FORCING) - conclusion budget exhausted, safety net fires - no message tokens, conclusion budget only - backward compat with conclusion_budget=0 - multi-token message injection Implements Option B from issue ggml-org#20632.
deb1c35 to
02d4c32
Compare
|
reasoning_budget_conclusion has been removed from common_params. It now lives only in common_params_sampling, consistent with the direction of #20429. The CLI arg writes directly to params.sampling.reasoning_budget_conclusion only. |
|
I hope that this gets added soon! I recently updated from 8500 to 8600, which I believe must have included the new reasoning budget, and now many coding tasks in Roo Code just stall during the thinking phase. (Using GLM4.7) |
|
Can it also support text-based reasoning effort (none, low, medium, high)? |
|
Can you please fix conflicts? |
Overview
Implements Option B from issue #20632 a two-phase graceful termination for the reasoning budget sampler, replacing hard truncation with a structured conclusion phase.
Previously, --reasoning-budget-message injected a wrap-up string at exactly token N, leaving the model zero tokens to act on it functionally equivalent to raw truncation. This PR splits the budget into a thinking phase and a conclusion phase:
At the end of the thinking budget, the message is forced token-by-token (INJECTING state)
The model is then given --reasoning-budget-conclusion N free tokens to terminate naturally (CONCLUDING state)
If the model does not produce the end tag within those tokens, the hard-cutoff safety net fires (FORCING state) preserving existing behavior
New state machine: IDLE → COUNTING → INJECTING → CONCLUDING → FORCING → DONE
Setting --reasoning-budget-conclusion 0 (the default) preserves existing behavior exactly fully backward compatible.
Requirements
I have read and agree with the contributing guidelines
AI usage disclosure: I used AI for minor assistance. I have reviewed all changes, understand them fully, and take responsibility for the submitted code.