feat(node): add fee prediction API for upcoming L2 slots#22116
Conversation
Adds a `getPredictedMinFees` method to the Aztec Node that predicts minimum fees for the current slot and the next LAG (2) slots. The prediction accounts for L1 gas oracle transitions and configurable congestion assumptions via ManaUsageEstimate. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
LHerskind
left a comment
There was a problem hiding this comment.
We have fees.ts that still uses the getCurrentMinFees directly, so the CLI flow would still be using the old behaviour? And also the factory.ts in the bot seems to be relying on direct calls to getCurrentMinFees when deploying accounts.
| @@ -224,8 +224,7 @@ export abstract class BaseWallet implements Wallet { | |||
| feePayer?: AztecAddress, | |||
| gasSettings?: Partial<FieldsOf<GasSettings>>, | |||
| ): Promise<FeeOptions> { | |||
| const maxFeesPerGas = | |||
| gasSettings?.maxFeesPerGas ?? (await this.aztecNode.getCurrentMinFees()).mul(1 + this.minFeePadding); | |||
| const maxFeesPerGas = gasSettings?.maxFeesPerGas ?? (await this.getMinFees()).mul(1 + this.minFeePadding); | |||
There was a problem hiding this comment.
The PR description says the wallet should pick getPredictedMinFees(Limit), but completeFeeOptions() calls getMinFees() with no argument and getMinFees() defaults to ManaUsageEstimate.Target. That only models steady-state congestion, so this path can still underprice transactions when congestion grows between submission and inclusion.
There was a problem hiding this comment.
Changed the default from ManaUsageEstimate.Target to ManaUsageEstimate.Limit in getMinFees(), so the wallet always estimates against worst-case congestion growth.
| } | ||
| return predicted.reduce((worst, fees) => (fees.feePerL2Gas > worst.feePerL2Gas ? fees : worst)); | ||
| } catch { | ||
| // Fallback for old nodes that don't support getPredictedMinFees |
There was a problem hiding this comment.
Falls back on any failure, could we not end up talking to new node but other RPC issues cause falling back and same problem as before? More limited so not as big a deal.
There was a problem hiding this comment.
Narrowed the catch to only fall back on JSON-RPC method-not-found errors (code -32601). Other errors (network, timeouts) now rethrow.
| // Most of the items below are cached by the rollup contract | ||
| const [lastCheckpoint, currentSlot, manaTarget, manaLimit, provingCostPerManaEth, epochDuration] = | ||
| await Promise.all([ | ||
| this.rollupContract.getPendingCheckpoint(), |
There was a problem hiding this comment.
Using the last checkpoint instead of the effective might lead to differences when pruning happens. In the L1 part of it, we are using the effective pending which depends on whether a prune can happen or not.
When looking at the rollup.ts it is not clear to me if we should actually replace the pending it got in there with the effective or not, probably something that would blow up an unimaginable amount of tests 😅
There was a problem hiding this comment.
Added getEffectivePendingCheckpoint() to the TS RollupContract. It checks canPruneAtTime(timestamp) and returns the proven checkpoint when a prune is imminent, matching STFLib.getEffectivePendingCheckpointNumber() on L1. The fee predictor now uses this instead of the raw getPendingCheckpoint().
| } | ||
|
|
||
| /** Compresses a FeeHeader into a uint256 following the FeeHeaderLib bit layout in Solidity. */ | ||
| static compressFeeHeader(fh: FeeHeader): bigint { |
There was a problem hiding this comment.
A bunch of the following functions seems to be used only in tests. I understand why they are here, but it feels a little odd 🤷
There was a problem hiding this comment.
Acknowledged — leaving these in RollupContract for now since moving them would be a pure refactor. Can revisit later.
| @@ -12,6 +13,9 @@ import type { CheckpointGlobalVariables, GlobalVariables } from './global_variab | |||
| export interface GlobalVariableBuilder { | |||
| getCurrentMinFees(): Promise<GasFees>; | |||
|
|
|||
| /** Returns predicted min fees for the current slot and next N slots. */ | |||
| getPredictedMinFees(manaUsage?: ManaUsageEstimate): Promise<GasFees[]>; | |||
There was a problem hiding this comment.
It feels a bit like we are starting to overload some of these builders
There was a problem hiding this comment.
Extracted a separate FeeProvider interface in stdlib/src/tx/fee_provider.ts and split the implementation into FeeProviderImpl (in its own file) and GlobalVariableBuilder. AztecNodeService now takes both as separate dependencies. The sequencer and checkpoint proposal job only see GlobalVariableBuilder (no fee methods).
There was a problem hiding this comment.
The current tests validate a single prediction window, not the predictor’s roll-forward behavior over time.
They assert that one call returns exactly FEE_ORACLE_LAG + 1 entries and that those entries match L1, but they do not advance one slot at a time and verify that repeated calls continue to track rollup.getManaMinFeeAt(...) across successive windows.
A stronger regression test would step through 5-10 successive slots, call getPredictedMinFees(...) at each step, assert that the result still has FEE_ORACLE_LAG + 1 entries, compare each predicted entry against rollup.getManaMinFeeAt(...) for slot + 0..LAG, then advance the slot and checkpoint/oracle state before repeating. That would more directly validate the behavior described.
There was a problem hiding this comment.
Added a roll-forward regression test that steps through 6 successive slots, creating a fresh FeePredictor at each step, asserting the expected array length, and comparing every predicted entry against rollup.getManaMinFeeAt(). Uses ManaUsageEstimate.None to keep it simple.
| return { | ||
| lastSlot, | ||
| excessMana: computeExcessMana(feeHeader.excessMana, feeHeader.manaUsed, manaTarget), | ||
| ethPerFeeAsset: feeHeader.ethPerFeeAsset, |
There was a problem hiding this comment.
The ethPerFeeAsset is used as if fixed between slots, but it can move for every one of them, so the actual value might be off by a few % by the time it reached the chain 👀 Since it got a fixed upper limit, you can predict the limits though.
There was a problem hiding this comment.
Now decaying ethPerFeeAsset by MAX_FEE_ASSET_PRICE_MODIFIER_BPS (1%) per slot in the prediction. Since a lower ethPerFeeAsset means higher fees in fee-asset terms, this gives a conservative (worst-case) estimate. Slot 0 uses the current value, each subsequent slot applies ethPerFeeAsset * 9900 / 10000, clamped to MIN_ETH_PER_FEE_ASSET.
|
|
||
| const lastSlot = lastCheckpoint.slotNumber; | ||
| // Start from the later of: the slot after the last checkpoint, or the current slot. | ||
| const nextSlot = SlotNumber.add(lastSlot, 1) > currentSlot ? SlotNumber.add(lastSlot, 1) : currentSlot; |
There was a problem hiding this comment.
Previously this logic used the earliest timestamp a new checkpoint could actually land at, including the next possible L1 block timestamp. Here we only use max(lastCheckpoint + 1, currentSlot).
Can you elaborate on the reasoning for dropping the next-L1-block adjustment?
It looks like nextSlot can still be the current slot even when no further checkpoint can actually land in that slot, which would shift the prediction window one slot too early.
There was a problem hiding this comment.
Reintroduced the L1 block timing adjustment. The fee predictor now uses a DateProvider to compute getSlotAtNextL1Block(now) and takes the max of that, currentSlot, and lastSlot + 1 as the prediction start. If the next L1 block would land in the next L2 slot, the window starts from there. Also moved publicClient and dateProvider to constructor args instead of passing on each call.
|
|
||
| ## Prediction Window | ||
|
|
||
| The prediction covers `LAG + 1 = 3` entries (the next available slot plus 2 more). |
There was a problem hiding this comment.
The LAG + 1 guarantee looks one slot too optimistic. The predictor returns fees through nextSlot + LAG, but if the oracle cooldown has already elapsed, a new oracle update can still be enqueued immediately after the prediction is computed and will activate LAG slots later. That means the last slot in the returned window can still change after the prediction is made. In other words, the stable window appears to be the next LAG slots, not LAG + 1 entries.
For a test, I would try to validate exactly that race:
- Move the chain forward until the oracle cooldown has elapsed.
- Make sure nextSlot is the current slot, so an oracle update queued now would activate at the last slot of the predicted window.
- Call the predictor and store:
- the returned array
- the computed start slot
- the last predicted entry
- Without advancing an Aztec slot yet, enqueue a fresh oracle update with a very different base fee.
- Advance time to the timestamp of startSlot + LAG.
- Read the actual L1 value for that slot with getManaMinFeeAt.
- Assert that:
- the predictor originally returned FEE_ORACLE_LAG + 1 entries
- the last predicted entry is no longer equal to the actual L1 fee at startSlot + LAG
The important part is step 4: the update must happen after the prediction is computed, but still early enough that its slotOfChange is exactly startSlot + LAG. That is what demonstrates the off-by-one in the guarantee.
A practical way to make that test deterministic with the existing helpers would be:
- Advance enough slots that the oracle update is allowed.
- Compute the prediction.
- Set a sharply different next block base fee.
- Mine and call updateL1GasFeeOracle while still in the same Aztec slot as the prediction start.
- Then advance to the last predicted slot and compare the old prediction against the live contract result.
This could get worse the further the start slot is in the future 😬 Which seems to be something that can happen with the last + 1 quite easily.
There was a problem hiding this comment.
Agreed with the off-by-one analysis. Reduced the prediction window from LAG + 1 to LAG entries. All returned entries are now guaranteed stable — no oracle update can change them within the window. Updated the README, tests, and TXE mock accordingly.
| @@ -10,6 +10,10 @@ export class TXEGlobalVariablesBuilder implements GlobalVariableBuilder { | |||
| return Promise.resolve(new GasFees(0, 0)); | |||
| } | |||
|
|
|||
| public getPredictedMinFees(): Promise<GasFees[]> { | |||
| return Promise.resolve([new GasFees(0, 0)]); | |||
There was a problem hiding this comment.
The length here is slightly odd when other places talking about it being fixed size that is not 1 in length.
There was a problem hiding this comment.
Updated TXEFeeProvider to return FEE_ORACLE_LAG zero-fee entries using times(), matching the production prediction window size.
Addresses PR #22116 review comment 1: wallet was defaulting to Target mana usage estimate which can underprice during congestion growth. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…rrors Addresses PR #22116 review comment 2: the catch-all fallback could mask RPC connectivity issues by silently falling back to getCurrentMinFees. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…arios Addresses PR #22116 review comment 3: fee predictor was using the raw pending checkpoint which could be stale when a prune is imminent. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ictor Addresses PR #22116 review comment 6: validates that the predictor tracks rollup.getManaMinFeeAt across successive slots over time. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… fee prediction Addresses PR #22116 review comment 7: ethPerFeeAsset was treated as fixed across prediction slots but can change by up to 1% per checkpoint. Now assumes worst-case (decreasing) ethPerFeeAsset for higher fee estimates. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ries Addresses PR #22116 review comment 9: the last entry in the LAG+1 window could be invalidated by a concurrent oracle update. LAG entries are all guaranteed stable. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Addresses PR #22116 review comment 10: mock was returning a single-entry array instead of matching the production prediction window size. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Addresses PR #22116 review comment 11: CLI wallet was bypassing the fee prediction system by calling getCurrentMinFees directly. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ilder Addresses PR #22116 review comment 5: the GlobalVariableBuilder interface was overloaded with both fee and global-variables concerns. FeeProvider is now a separate interface with its own implementation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…on window Addresses PR #22116 review comment 8: if the next L1 block lands in the next L2 slot, the prediction window should start from that slot. Also moves publicClient to FeePredictor constructor instead of passing it on every call. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Re: review comment about Updated the CLI wallet ( |
|
Yes, all responses above were posted by Claude |
Allows callers of simulateTx, sendTx, and profileTx to override the assumed congestion level (None, Target, Limit) used for fee prediction. Defaults to Limit (worst case) when not specified. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ASSET Ensures the conservative ethPerFeeAsset decay doesn't go below the protocol minimum (100), matching the on-chain clamp in computeManaMinFee. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
LHerskind
left a comment
There was a problem hiding this comment.
The BOT is still using getCurrentMinFees in factory.ts to deploy we should probably update it.
| // Most of the items below are cached by the rollup contract | ||
| const [lastCheckpoint, currentSlot, manaTarget, manaLimit, provingCostPerManaEth, epochDuration] = | ||
| await Promise.all([ | ||
| this.rollupContract.getEffectivePendingCheckpoint(), |
There was a problem hiding this comment.
Small edge case: getEffectivePendingCheckpoint() is resolved against the current block timestamp, while the predictor starts from a future nextSlot. Since pruneability is time-dependent, there is a narrow epoch-boundary case where the effective parent at now and at nextSlot can differ.
Pretty rare, but is this intended to be evaluated at the prediction start timestamp instead?
I don't think it is a super big issue, but something that could happen once in a while. Not blocking.
| return Promise.resolve(GasFees.empty()); | ||
| } | ||
| getPredictedMinFees(): Promise<GasFees[]> { | ||
| return Promise.resolve([GasFees.empty()]); |
There was a problem hiding this comment.
Should we be more strict with sizing and force that as part of the schema?
There was a problem hiding this comment.
I changed it to a Tuple<GasFees, typeof FEE_ORACLE_LAG>, but honestly seemed messy and rolled it back in the end.
…fee-prediction # Conflicts: # yarn-project/end-to-end/src/e2e_fees/fee_settings.test.ts # yarn-project/txe/src/state_machine/global_variable_builder.ts
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The bot was using getCurrentMinFees to set maxFeesPerGas for deploy and fee juice top-up transactions. This switches to getPredictedMinFees with worst-case across predicted slots, matching the wallet's approach. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…iction timestamp Evaluate pruneability at the prediction start timestamp instead of the current L1 block time, fixing an epoch-boundary edge case where the effective parent checkpoint could differ between now and nextSlot. Pin all non-constant rollup queries to a single L1 block number for a consistent snapshot. Add blockNumber option to getCheckpointNumber, getSlotNumber, getCheckpoint, getL1FeesAt, and getEffectivePendingCheckpoint. Bypass viem's getBlockNumber cache to avoid stale pins. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Fixed in 66825b1 |
The wallet now calls getPredictedMinFees before getCurrentMinFees, so the test mock on getCurrentMinFees alone was being bypassed. Mock both methods so the test controls the fee values seen during tx proving. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Flakey Tests🤖 says: This CI run detected 1 tests that failed, but were tolerated due to a .test_patterns.yml entry. |
## Summary The `epochs_mbps.pipeline.parallel` test expects at least 12 blocks in a single checkpoint, which requires near-ideal timing (72s slot / 5.5s per block = 13 max blocks). On constrained CI runners (2 CPUs, 8GB RAM), block building is slower and only 11 blocks were achieved, causing a flaky failure unrelated to the PR that triggered it (#22116). Lowered `EXPECTED_BLOCKS_PER_CHECKPOINT` from 12 to 8, which still validates MBPS behavior while giving sufficient margin for CI. ## Details Full analysis: https://gist.github.com/AztecBot/7779b7de743711f18899ef57e2060c68 ClaudeBox log: https://claudebox.work/s/6228f03c1549fb57?run=1
BEGIN_COMMIT_OVERRIDE fix(p2p): back off on repeated auth handshake failures (#22435) chore(pipeline): add metrics for pipeling building timelines (#21591) fix: no division by zero in sentinel (#22467) chore(pipelining): update next net (#22466) feat(claude): add skill to read gists (#22471) feat(node): add fee prediction API for upcoming L2 slots (#22116) fix: lower EXPECTED_BLOCKS_PER_CHECKPOINT for CI stability (#22480) END_COMMIT_OVERRIDE
## Motivation Wallets currently use `getCurrentMinFees` to set `maxFeesPerGas`, but this only reflects the fee at the current moment. If L1 fees change (via the oracle's LAG-delayed transition) or congestion grows before the transaction lands, the fee could be too low and the tx gets rejected. We need a prediction API that accounts for upcoming L1 fee transitions and congestion growth so wallets can set fees that guarantee inclusion. Fixes A-648 ## Approach Ports the fee computation logic from `FeeLib.sol` into TypeScript (`fee_math.ts`) so fees can be predicted locally without state overrides. A new `FeePredictor` class queries the L1 rollup state once per L1 block (cached), then computes per-slot fees for a `LAG + 1 = 3` slot window. The window is LAG (not LIFETIME) because a new oracle update could be enqueued at any time and activate after LAG slots, making longer predictions unreliable. The wallet picks the max fee across the window with a backwards-compatible fallback to `getCurrentMinFees` for old nodes. ## Changes - **stdlib/src/gas/fee_math.ts**: TypeScript port of FeeLib.sol fee computation (fakeExponential, congestion multiplier, full fee calculation) with `ManaUsageEstimate` enum (None/Target/Limit) - **stdlib/src/gas/fee_math.test.ts**: Unit tests for all fee math functions - **stdlib/src/gas/README.md**: Documentation on L1 gas oracle LAG/LIFETIME and the fee prediction window - **sequencer-client/src/global_variable_builder/fee_predictor.ts**: `FeePredictor` class that caches L1 state and computes per-slot predictions with configurable mana usage assumptions - **sequencer-client/src/global_variable_builder/fee_predictor.test.ts**: Integration tests against Anvil + deployed Rollup verifying exact match with L1 `getManaMinFeeAt` across all mana usage estimates - **ethereum/src/contracts/rollup.ts**: Added `compressFeeHeader`, `packChainTips`, `chainTipsStorageSlot`, `getTempCheckpointLogStorageSlot`, `TempCheckpointLogField` enum, and `getFeeHeader` wrapper - **ethereum/src/contracts/rollup.test.ts**: Unit tests for the new RollupContract helpers - **stdlib/src/interfaces/aztec-node.ts**: Added `getPredictedMinFees(manaUsage?)` to AztecNode interface and schema - **aztec-node/src/aztec-node/server.ts**: Delegates to GlobalVariableBuilder - **wallet-sdk/src/base-wallet/base_wallet.ts**: Uses `getPredictedMinFees(Limit)` with fallback to `getCurrentMinFees` - **wallet-sdk/src/base-wallet/base_wallet.test.ts**: Unit tests for `getMinFees` (max selection, estimate forwarding, fallback paths) --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
## Summary The `epochs_mbps.pipeline.parallel` test expects at least 12 blocks in a single checkpoint, which requires near-ideal timing (72s slot / 5.5s per block = 13 max blocks). On constrained CI runners (2 CPUs, 8GB RAM), block building is slower and only 11 blocks were achieved, causing a flaky failure unrelated to the PR that triggered it (#22116). Lowered `EXPECTED_BLOCKS_PER_CHECKPOINT` from 12 to 8, which still validates MBPS behavior while giving sufficient margin for CI. ## Details Full analysis: https://gist.github.com/AztecBot/7779b7de743711f18899ef57e2060c68 ClaudeBox log: https://claudebox.work/s/6228f03c1549fb57?run=1
Motivation
Wallets currently use
getCurrentMinFeesto setmaxFeesPerGas, but this only reflects the fee at the current moment. If L1 fees change (via the oracle's LAG-delayed transition) or congestion grows before the transaction lands, the fee could be too low and the tx gets rejected. We need a prediction API that accounts for upcoming L1 fee transitions and congestion growth so wallets can set fees that guarantee inclusion.Fixes A-648
Approach
Ports the fee computation logic from
FeeLib.solinto TypeScript (fee_math.ts) so fees can be predicted locally without state overrides. A newFeePredictorclass queries the L1 rollup state once per L1 block (cached), then computes per-slot fees for aLAG + 1 = 3slot window. The window is LAG (not LIFETIME) because a new oracle update could be enqueued at any time and activate after LAG slots, making longer predictions unreliable. The wallet picks the max fee across the window with a backwards-compatible fallback togetCurrentMinFeesfor old nodes.Changes
ManaUsageEstimateenum (None/Target/Limit)FeePredictorclass that caches L1 state and computes per-slot predictions with configurable mana usage assumptionsgetManaMinFeeAtacross all mana usage estimatescompressFeeHeader,packChainTips,chainTipsStorageSlot,getTempCheckpointLogStorageSlot,TempCheckpointLogFieldenum, andgetFeeHeaderwrappergetPredictedMinFees(manaUsage?)to AztecNode interface and schemagetPredictedMinFees(Limit)with fallback togetCurrentMinFeesgetMinFees(max selection, estimate forwarding, fallback paths)