Feature summary
No response
What problem are you trying to solve?
Cost per turn varies depending on the selected model, the reasoning effort, how much context is loaded, and how the assistant is being run (e.g. a single interactive reply vs. a longer autonomous run). Today users only learn the cost after spending it, which makes it hard to make informed choices (switch to a cheaper model, lower reasoning effort, trim context, etc.) before sending.
Proposed solution
Add a lightweight "Next message" estimate: a single, compact prediction of what the upcoming turn will roughly cost, learned only from the user's own recent usage (on‑device, no server calls or extra data collection). Changing the model, reasoning effort, or run mode should visibly move the number.
UX
- Visualized in a usage popover, e.g. "Next message — ~X credits (est.)", on a single line alongside "Session" spend.
- Framing: it's an estimate of the typical next turn, not a guarantee.
How the estimate is built
-
Learn a typical cost ("anchor") at several granularities. Maintain a smoothed, geometric (log‑space) moving average of realized per‑turn cost so a few unusually large or small turns don't dominate. Track and blend it at a few levels:
- Per‑configuration — keyed by the cost‑relevant choices the user controls: model, reasoning effort, context size tier, and run mode. This is what makes the estimate react when the user switches any of those.
- Per‑session — captures the "weight" of the current conversation (a heavy session tends to keep being heavy), ramped in as the session accumulates turns.
- Global — a cross‑session fallback used before a given configuration has any history.
-
Cold start. Before any history exists, fall back to a context‑proportional approach.
-
Self‑calibrate. After each turn, compare what actually happened to what was predicted and fold the realized cost back into the averages, so the estimate improves over time and adapts to the user's habits.
Workflow impact
No response
Installation context
No response
Additional context
No response
Feature summary
No response
What problem are you trying to solve?
Cost per turn varies depending on the selected model, the reasoning effort, how much context is loaded, and how the assistant is being run (e.g. a single interactive reply vs. a longer autonomous run). Today users only learn the cost after spending it, which makes it hard to make informed choices (switch to a cheaper model, lower reasoning effort, trim context, etc.) before sending.
Proposed solution
Add a lightweight "Next message" estimate: a single, compact prediction of what the upcoming turn will roughly cost, learned only from the user's own recent usage (on‑device, no server calls or extra data collection). Changing the model, reasoning effort, or run mode should visibly move the number.
UX
How the estimate is built
Learn a typical cost ("anchor") at several granularities. Maintain a smoothed, geometric (log‑space) moving average of realized per‑turn cost so a few unusually large or small turns don't dominate. Track and blend it at a few levels:
Cold start. Before any history exists, fall back to a context‑proportional approach.
Self‑calibrate. After each turn, compare what actually happened to what was predicted and fold the realized cost back into the averages, so the estimate improves over time and adapts to the user's habits.
Workflow impact
No response
Installation context
No response
Additional context
No response