Skip to content

Configurable Tempo & Owner-Triggered Epochs — Specification #2633

@evgeny-s

Description

@evgeny-s

Configurable Tempo & Owner-Triggered Epochs — Specification

Status: Draft / WIP

Audience: Subtensor pallet engineers, validator-tooling maintainers

Goal: Give subnet owners control over their epoch cadence, while keeping a network-wide guarantee that yield is distributed at least once per maximum tempo.


1. Motivation

The current design fixes every subnet to a single rigid cadence (tempo = 360 blocks ≈ 72 minutes) chosen by the protocol, not the builder. Builders whose subnet does not benefit from a 72-minute weight cycle (e.g. long-running training, infrequent inference benchmarks, asynchronous human-in-the-loop tasks) are forced to design around the cadence rather than have the cadence support their use case.

This feature deliberately moves authority from the protocol to the owner, with one network-level guarantee preserved: stakers always receive yield at least once per MaxTempo, regardless of owner behaviour.


2. Requirements

A subnet owner should be able to:

  1. R1 — Set the subnet's tempo to any value in [MinTempo, MaxTempo] blocks, rate-limited to MinTempo blocks (= 360), fixed regardless of current tempo (lets owners recover from a mistake within ~12 minutes instead of up to a week).
  2. R2 — Set the activity-cutoff factor for the subnet (in epochs of validator inactivity tolerated), rate-limited via the same OwnerHyperparamUpdate pattern.
  3. R3 — Manually trigger an epoch for their own subnet, rate-limited via the same OwnerHyperparamUpdate pattern. The triggered epoch becomes eligible to fire after AdminFreezeWindow blocks (giving validators their freeze window of warning before execution). Concurrent triggers from many owners are de-collided by the per-block epoch cap (§6.1) — at most MaxEpochsPerBlock epochs execute per block, the rest are deferred to the next block via EpochDeferred. Actual execution latency is therefore AdminFreezeWindow + (queue position / MaxEpochsPerBlock) blocks.

The protocol shall guarantee:

  1. G1 — An epoch runs for every subnet at least once every MaxTempo blocks, regardless of owner action or inaction. (Edge case: the legacy runtime short-circuits should_run_epoch to false when Tempo == 0, and we preserve that behaviour for compatibility — see §9. No production subnet has Tempo == 0 and the bounds enforced on owner-side set_tempo make this state unreachable through this PR's new extrinsics.)
  2. G2 — A manual trigger and the next automatic epoch are mutually exclusive in time: when a triggered epoch fires it resets the automatic schedule so the next automatic epoch is Tempo blocks later.
  3. G3 — Validators always receive at least AdminFreezeWindow blocks of stable subnet state before any epoch — automatic, safety-net forced, or manually triggered. No admin operation that mutates epoch-relevant state may occur within that window.

Out of scope:

  • A separate "pause" extrinsic. Pause is emulated by setting Tempo = MaxTempo. See §9.
  • Per-subnet weight-version-key, registration, or other hyperparameter changes — those follow existing patterns, are unaffected.

3. Design overview

Two structural changes underlie everything else:

3.1 Scheduler becomes stateful

Today the scheduler is a pure function of the current block:

should_run_epoch(netuid, block) =
    (block + netuid + 1) % (Tempo[netuid] + 1) == 0

This makes "reset the tempo on manual trigger" and "make set_tempo predictable mid-cycle" both impossible — there is no per-subnet state to update.

We introduce a new storage map LastEpochBlock<NetUid, u64> and switch the rule to:

if Tempo[netuid] == 0:                                // legacy defensive short-circuit (§9)
    return false
blocks_since = block - LastEpochBlock[netuid]
pending      = PendingEpochAt[netuid]
should_run_epoch(netuid, block) =
       (pending > 0 && block >= pending)             // manual trigger ripe (R3)
    || BlocksSinceLastStep[netuid] > MaxTempo        // safety net (G1) — anchored to last *successful* epoch, owner-immune
    || blocks_since >= Tempo[netuid]                 // normal cadence — period is exactly tempo

After any decision that the epoch slot has been reached (i.e. should_run_epoch returns true), LastEpochBlock[netuid] = current_block and PendingEpochAt[netuid] = 0regardless of whether the epoch then runs successfully or is skipped due to inconsistent input state. The schedule advances independently of execution. This is essential to avoid permanently locking the subnet's freeze window if the epoch cannot run (see §6.1 for the failure mode).

LastEpochBlock is also reset to current_block on a successful set_tempo (see §5.1) — guarantees no epoch can fire without a full freeze window of warning after a tempo change. The safety-net branch is anchored on BlocksSinceLastStep (existing storage, only resets on a successful epoch run, owner cannot mutate it), so G1 holds even under adversarial owner behaviour like alternating grow/shrink.

Newly registered subnets initialise LastEpochBlock in init_new_network (see §10) with a per-netuid stagger so the normal-cadence branch does not fire on the very first block_step after registration and so multiple subnets registered in the same block do not all fire their first epoch on the same future block. Without this initialisation, the ValueQuery default of 0 would cause blocks_since = current_block − 0 to exceed tempo on any live chain, triggering an empty epoch immediately on subnet creation; without stagger, mass-registered subnets would synchronise into a single block of heavy concurrent epochs once tempo blocks have elapsed.

3.2 Activity cutoff becomes a function of tempo

Today ActivityCutoff[netuid]: u16 is an absolute number of blocks of validator inactivity tolerated before a validator is excluded from the consensus computation. It does not scale with tempo — at tempo = 7 days the current default of 5000 blocks would mean validators must push weights every ~16 hours of a 7-day cycle, which is nonsensical.

We introduce a new storage map ActivityCutoffFactorMilli<NetUid, u32> representing tolerated inactivity in per-mille epochs (milli-units, 1/1000 granularity). Effective cutoff is computed at the use site:

cutoff_blocks = (ActivityCutoffFactorMilli[netuid] as u64 * Tempo[netuid] as u64) / 1000
cutoff_blocks = cutoff_blocks.max(1)

Per-mille rather than integer factor primarily for precise legacy matching: the historical 5000-block cutoff at default tempo 360 corresponds to factor 13.889, not an integer. With per-mille, store 13_889 and compute 13_889 × 360 / 1000 = 5000 exactly.

Storage type u32 (rather than Substrate's Permill) because Permill caps at 1.0 and we need factors up to ~50. Storage name suffixed Milli to make units explicit at every call site.

The existing ActivityCutoff<T> storage and its sudo_set_activity_cutoff extrinsic are left untouched but become unread by the epoch logic. They will be removed in a follow-up cleanup PR (§15).


4. Storage changes

4.1 New

Name Type Default Purpose
LastEpochBlock StorageMap<NetUid, u64> 0 Block number at which the last epoch ran for this subnet. Updated after every epoch.
ActivityCutoffFactorMilli StorageMap<NetUid, u32> 13_889 Tolerated inactivity in per-mille epochs (1/1000 granularity). Effective cutoff = (factor × tempo) / 1000.
PendingEpochAt StorageMap<NetUid, u64> 0 Block at which a manually triggered epoch should fire. 0 means no trigger pending. Cleared after the epoch runs.
SubnetEpochIndex StorageMap<NetUid, u64> 0 Monotonic epoch counter; +1 per consumed epoch slot. Canonical epoch index for commit-reveal (§7).

4.2 New runtime constants

Name Value Notes
MinTempo 360 Lower bound for owner-set tempo.
MaxTempo 50_400 Upper bound for owner-set tempo (≈ 7 days at 12 s/block).
MinActivityCutoffFactorMilli 1_000 = factor 1.0 (one full tempo). Sub-tempo cutoffs disallowed by design — see §3.2. Also makes the cutoff_blocks ≥ 1 clamp in §6.2 unreachable in practice (1_000 × 360 / 1000 = 360 ≫ 1).
MaxActivityCutoffFactorMilli 50_000 = factor 50.0. Sized to accommodate every observed production cutoff exactly without clamping (the largest is 12 000 blocks at tempo=360 → factor 33 334, requiring MAX ≥ 33 334) and leave headroom for owner-initiated growth post-migration. At MaxTempo this gives 50 × 50 400 = 2 520 000 blocks (~350 days) — extreme but bounded.
InitialActivityCutoffFactorMilli 13_889 = factor 13.889. Preserves the current default 5000-block cutoff at default tempo exactly (13 889 × 360 / 1000 = 5 000).
MaxEpochsPerBlock 2 Per-block cap on number of epochs that may execute in a single block_step (§6.1). When the cap is reached, remaining epochs scheduled for that block are deferred by 1 block. Bounds peak block weight contribution from epoch execution.

4.3 Unchanged (left alone, eventually removed)

  • ActivityCutoff<T> (storage) — no longer read by the runtime. Stays in storage for now to avoid migration risk.
  • sudo_set_activity_cutoff (admin-utils extrinsic) — continues to write to the dead storage. Effective no-op. Remove in cleanup PR.
  • MinActivityCutoff<T> (admin-utils storage) — no longer used. Remove in cleanup PR.

5. New extrinsics (pallet-subtensor)

All three are owner-only (ensure_subnet_owner). They are allowed regardless of CommitRevealWeightsEnabled — see §7.4.

5.1 set_tempo(netuid, tempo: u16)

  • Validates MinTempo <= tempo <= MaxTempo.
  • Rate limit: MinTempo blocks (= 360), fixed. Implemented via dedicated TransactionType::TempoUpdate whose rate_limit_on_subnet returns MinTempo.
  • Subject to AdminFreezeWindow.
  • On success: LastEpochBlock = current_block (cycle reset, both shrink and grow). Preserves G3. G1 is protected independently by the BlocksSinceLastStep-anchored safety-net branch (§3.1, §6.1) — owner cannot push the safety-net horizon forward by alternating set_tempo calls because that field only resets on a successful epoch.
  • Emits TempoSet { netuid, tempo }.

5.2 set_activity_cutoff_factor(netuid, factor: u32)

  • Parameter in per-mille units (1/1000): factor = 13_890 represents 13.890 tempos. See §3.2.
  • Validates MinActivityCutoffFactorMilli <= factor <= MaxActivityCutoffFactorMilli.
  • Rate-limited via existing OwnerHyperparamUpdate(Hyperparameter::ActivityCutoffFactorMilli) (new enum variant, same OwnerHyperparamRateLimit × Tempo[netuid] cooldown).
  • Subject to AdminFreezeWindow.
  • Emits ActivityCutoffFactorMilliSet { netuid, factor_milli }.

5.3 trigger_epoch(netuid)

  • Rate-limited via existing OwnerHyperparamUpdate(Hyperparameter::TriggerEpoch) (new enum variant, same OwnerHyperparamRateLimit × Tempo[netuid] cooldown).
  • Not subject to AdminFreezeWindow itself — but its successful execution engages the freeze window for the subnet from the next block onward (see §6.4). This delivers G3 without requiring a separate check inside trigger_epoch.
  • Fails with EpochTriggerAlreadyPending if PendingEpochAt[netuid] != 0 at the time of the call. (The rate limit normally prevents this from occurring; the explicit check exists for the corner case where AdminFreezeWindow exceeds the rate-limit cooldown, which would otherwise allow re-trigger before the prior one fires.)
  • Sets PendingEpochAt[netuid] = current_block + AdminFreezeWindow. The epoch fires when block_step finds block >= PendingEpochAt and the per-block epoch cap (§6.1) is not exhausted, at which point the epoch runs, LastEpochBlock = block, and PendingEpochAt is cleared.
  • Concurrent triggers from many owners are protected by the per-block epoch cap (§6.1), not by a deterministic per-netuid stagger. If N owners trigger in the same block, all N PendingEpochAt values equal current_block + AdminFreezeWindow. At that fires_at block, block_step runs up to MaxEpochsPerBlock of them and defers the rest by emitting EpochDeferred and pushing their PendingEpochAt forward by 1 block. This handles the trigger storm with a single mechanism that also covers auto-epoch / safety-net firing collisions.
  • If an automatic or safety-net epoch fires before the triggered one ripens, the trigger is harmlessly absorbed: any epoch run clears PendingEpochAt, so no double-run occurs.
  • Emits EpochTriggered { netuid, by, fires_at } where fires_at is the earliest block at which the triggered epoch may execute. Actual execution may be deferred under the per-block cap; subscribe to EpochDeferred to track real fires_at.

6. Modified logic

6.1 block_step / should_run_epoch

Replace the modulo computation. Pseudocode for the predicate:

fn should_run_epoch(netuid, block):
    let tempo = Tempo[netuid]
    if tempo == 0:
        return false                                  // legacy defensive short-circuit (§9)
    let blocks_since = block.saturating_sub(LastEpochBlock[netuid])
    let pending      = PendingEpochAt[netuid]
    return (pending > 0 && block >= pending)
        || BlocksSinceLastStep[netuid] > MaxTempo     // safety net (G1) — owner-immune anchor
        || blocks_since >= tempo                      // period is exactly tempo

The per-subnet block-step loop is restructured so that schedule advancement and epoch execution are decoupled:

let mut epochs_run_this_block: u32 = 0
const MAX_EPOCHS_PER_BLOCK: u32 = 2

// Iterate by ascending PendingEpochAt (FIFO for triggered), then by netuid (deterministic).
for netuid in subnets_sorted_by_pending_then_netuid:
    BlocksSinceLastStep[netuid] += 1

    if should_run_epoch(netuid, current_block):
        // 1) Per-block cap — defer if already at limit.
        if epochs_run_this_block >= MAX_EPOCHS_PER_BLOCK:
            // Push to next block; do NOT advance LastEpochBlock yet.
            // PendingEpochAt is updated so the deferred slot fires on the next block_step.
            PendingEpochAt[netuid] = current_block + 1
            deposit_event(EpochDeferred {
                netuid,
                from_block: current_block,
                to_block: current_block + 1,
            })
            continue

        // 2) Run the epoch only if input state is consistent.
        //    LastMechansimStepBlock is NOT yet advanced — bonds masking (§6.2.1)
        //    reads it from storage and must see the previous successful run.
        if is_epoch_input_state_consistent(netuid):
            BlocksSinceLastStep[netuid] = 0
            // drain pending emissions, distribute, run consensus, etc.
            epochs_run_this_block += 1
            LastMechansimStepBlock[netuid] = current_block  // success-only, post-distribute
        else:
            log::error!("Epoch skipped for {netuid}: inconsistent input state")
            deposit_event(EpochSkippedDueToInconsistentState { netuid, block: current_block })
            // Schedule still advances below; execution skipped. Does not count toward the
            // per-block cap (no Yuma compute, no block weight consumed by epoch execution).
            // PendingServerEmission / PendingValidatorEmission / PendingRootAlphaDivs /
            // PendingOwnerCut accumulate; drained by the next successful epoch.

        // 3) Advance schedule unconditionally — the slot is consumed.
        LastEpochBlock[netuid] = current_block
        PendingEpochAt[netuid] = 0

MAX_EPOCHS_PER_BLOCK is a runtime constant (initially 2). It bounds peak block weight contributed by epoch execution and Yuma consensus. The cap protects against:

  • Trigger storm. N owners calling trigger_epoch in the same block all set PendingEpochAt = current_block + AdminFreezeWindow. At fires_at, the cap runs MAX_EPOCHS_PER_BLOCK of them and defers the rest by 1 block, cascading until the queue drains.
  • Auto-cadence collision. Multiple subnets with the same tempo and aligned LastEpochBlock would otherwise fire epochs on the same block. Same cap, same cascade.
  • Safety-net synchrony. Subnets that exceeded MaxTempo simultaneously (e.g., after an extended root-paused period) get spread across consecutive blocks rather than overwhelming one.

Iteration order is (PendingEpochAt ASC, netuid ASC): triggered epochs ordered by when they were armed (FIFO for owners), with netuid as the tie-breaker for auto-cadence cases. This makes deferral fair — the longest-waiting triggered epoch always runs first when the cap binds, and within a single block of competing triggers, lower netuids run first deterministically.

A deferred epoch is rescheduled by 1 block at a time. Cascades resolve in ⌈N / MAX_EPOCHS_PER_BLOCK⌉ blocks for N competing slots. There is no infinite-defer loop: each cascade step consumes one slot from the queue.

Why schedule advance is unconditional

Today (coinbase/run_coinbase.rs:314) the runtime guards epoch execution with a consistency check (is_epoch_input_state_consistent, epoch/run_epoch.rs:1591) that returns false if the subnet's Keys map contains duplicate hotkeys. Under the legacy modulo scheduler this is harmless — the next freeze-window position is computed purely from the current block, so a subnet stuck in inconsistent state still has admin operations available outside the modulo-defined window each cycle.

Under the new stateful scheduler, LastEpochBlock drives both the cadence and the freeze window (§6.4). If LastEpochBlock did not advance on consistency-skipped epochs, then for a broken subnet:

  • next_auto = LastEpochBlock + tempo would stop moving forward.
  • current_block keeps advancing past next_auto.
  • remaining = next_auto.saturating_sub(current_block) = 0 < AdminFreezeWindow.
  • is_in_admin_freeze_window returns true permanently.
  • All admin extrinsics on the subnet are rejected, including those an operator would need to repair the inconsistency.

The decoupling above eliminates that perma-lock: the schedule progresses, the freeze window stays correctly aligned, and the operator retains the ability to fix the underlying issue (e.g. cleaning up the duplicate Keys entry). Pending emissions are conserved — they accumulate across skipped slots and are released in full by the next successful epoch.

Semantic split between LastEpochBlock and LastMechansimStepBlock

After this change the two fields mean different things:

  • LastEpochBlock[netuid] — block of the last epoch attempt (consumed slot). Drives scheduling and the freeze predicate. Always advanced when should_run_epoch returns true.
  • LastMechansimStepBlock[netuid] — block of the last successful epoch run. Existing semantics, used by emission accounting paths. Advanced only on the success branch.

Same applies to BlocksSinceLastStep[netuid] — preserves its existing "blocks since last successful step" meaning.

Other notes

LastEpochBlock is also written to current_block on a successful set_tempo (without running an epoch). This ensures the next automatic epoch lands new_tempo blocks later regardless of how far into the previous cycle the change occurred — see §5.1.

6.2 Epoch internals — activity cutoff sites

epoch/run_epoch.rs:172, 598 (both mechanisms — sparse and dense):

let factor_milli = ActivityCutoffFactorMilli::<T>::get(netuid) as u64;
let tempo        = get_tempo(netuid) as u64;
let activity_cutoff = factor_milli
    .saturating_mul(tempo)
    .checked_div(1000)
    .unwrap_or(0)
    .max(1);
let inactive = last_update.iter()
    .map(|u| u.saturating_add(activity_cutoff) < current_block)
    .collect();

6.2.1 Bonds masking — switch from current_block - tempo to LastMechansimStepBlock + 1

Three sites in epoch/run_epoch.rs compute last_tempo for the recently_registered mask used in bonds preprocessing:

  • epoch_dense_mechanism: line 208
  • epoch_mechanism (Yuma 3.0 path): line 822
  • epoch_mechanism (Yuma classic path): line 862

All three use let last_tempo: u64 = current_block.saturating_sub(tempo); — a proxy that equals the previous-epoch block under modulo-based static tempo, but diverges under dynamic tempo (after set_tempo mid-cycle, trigger_epoch, or safety-net firing).

Replace with the previous-successful-epoch block read directly from storage, falling back to the legacy proxy on cold start (LastMechansimStepBlock == 0, no successful epoch ever fired):

let lms = LastMechansimStepBlock::<T>::get(netuid);
let last_tempo: u64 = if lms == 0 {
    current_block.saturating_sub(tempo)
} else {
    lms.saturating_add(1)
};

LastMechansimStepBlock is the right semantic source: it advances only on a successful epoch run, never on set_tempo resets or consistency-skipped slots. Bonds masking conceptually wants "previous successful epoch", and using LastEpochBlock (which now advances on set_tempo and on consistency-skip) would let a registered neuron be excluded from recently_registered after just one set_tempo call, bypassing the legacy registration-sniping protection.

The cold-start fallback exists for two reasons. (1) On a fresh subnet LastMechansimStepBlock is 0 until the first successful epoch fires, and LastMechansimStepBlock + 1 = 1 would mask only block-0 registrations, weaker than the legacy current_block - tempo proxy that masked the entire pre-tempo window. (2) Existing test fixtures call epoch() directly without going through run_coinbase, so LastMechansimStepBlock is never advanced by them; the fallback preserves their startup masking semantics with no per-test changes. The dynamic-tempo correctness argument is unaffected: any production subnet that has ever fired one successful epoch is on the spec branch from then on.

To make this read return the previous-epoch block (not the just-being-set current value) while inside the epoch function, the LastMechansimStepBlock write at coinbase/run_coinbase.rs:319 is moved after distribute_emissions_to_subnets completes, not before.

Mathematical equivalence under static tempo: current_block - tempo == LastMechansimStepBlock + 1 holds exactly on healthy subnets under the modulo-based legacy scheduler (since LastMechansimStepBlock was set at the previous firing block under modulo), so existing subnets see no behavioural change in bonds masking. On adversarial / consistency-broken subnets the new formula is strictly more correct.

Yuma 3.0 EMA logic itself is unaffected by dynamic tempo: compute_bonds, compute_liquid_alpha_values, and mat_ema_alpha apply alpha parameters per-epoch (one EMA step per successful epoch), not per-block. Mathematics remain correct under any cadence; only the wall-clock decay rate of bonds-EMA scales with tempo, which is a UX expectation shift for validators rather than a correctness issue.

No other epoch logic changes.

6.3 Existing sudo_set_tempo

Root retains full control: any u16 value accepted, no bounds clamp, no rate limit. Existing freeze-window check (ensure_admin_window_open, pallets/admin-utils/src/lib.rs:973) is preserved unchanged. One addition: sudo_set_tempo now also writes LastEpochBlock[netuid] = current_block, mirroring the owner-side set_tempo (§5.1). This makes the schedule restart from the call block on any root tempo write, avoiding the stale-LastEpochBlock failure mode that would otherwise fire an immediate safety-net epoch on the very first block after a transition out of Tempo == 0.

6.4 AdminFreezeWindow compatibility

The freeze window is a network-wide guarantee that admin operations (≈ 30 sudo extrinsics in pallet-admin-utils) cannot mutate epoch-relevant state in the last AdminFreezeWindow blocks before any epoch runs. This protects validator weight submissions from racing admin changes.

The current predicate (pallets/subtensor/src/utils/misc.rs:58) is hard-coupled to the legacy modulo scheduler:

fn is_in_admin_freeze_window(netuid, current_block):
    let tempo = get_tempo(netuid)
    if tempo == 0: return false
    let remaining = blocks_until_next_epoch(netuid, tempo, current_block)  // modulo
    remaining < AdminFreezeWindow

This breaks under the new scheduler (the tempo == 0 short-circuit is preserved as legacy defensive behaviour — see §9 — so the predicate's first line stays). We replace the predicate accordingly:

fn is_in_admin_freeze_window(netuid, current_block):
    let tempo = Tempo[netuid]
    if tempo == 0:
        return false                                 // legacy defensive short-circuit (§9)
    let pending = PendingEpochAt[netuid]
    if pending > 0 && pending > current_block:
        return true                                  // trigger-armed countdown
    let last  = LastEpochBlock[netuid]
    let next_auto = last + min(tempo, MaxTempo)      // period is tempo (§3.1)
    let remaining = next_auto.saturating_sub(current_block)
    remaining < AdminFreezeWindow

Two changes from the legacy predicate:

  1. Schedule source is LastEpochBlock, not modulo. The window is now positioned correctly relative to where epochs actually fire under §6.1.
  2. Pending manual trigger engages the window immediately. Once trigger_epoch succeeds, the subnet enters the freeze window for the entire countdown to PendingEpochAt. Subsequent admin operations on that subnet — whether by the same owner in the same block (after the trigger transaction) or by root in the following blocks — fail with AdminActionProhibitedDuringWeightsWindow. This satisfies G3 without needing a separate check inside trigger_epoch.

The tempo == 0 early return is preserved from the legacy predicate as defensive behaviour (§9): with Tempo == 0 the scheduler short-circuits and no epoch is ever scheduled, so there is no upcoming epoch to protect.

Notes:

  • set_tempo and set_activity_cutoff_factor (§5.1, §5.2) call ensure_admin_window_open like other admin extrinsics. They mutate state that the upcoming epoch reads, and must not race weight submissions.
  • trigger_epoch itself does not call ensure_admin_window_open. The rationale: the freeze window protects the lead-in to a coming epoch, but the trigger is defining the next epoch. Gating the trigger by the window would prevent owners from triggering during the auto-epoch lead-in for no protective benefit (the auto epoch is already imminent and will fire under its own freeze).
  • Concurrent ordering within a single block: if an owner submits an admin extrinsic and trigger_epoch in the same block, the order in which the runtime processes them determines which one fails. If admin extrinsic comes first, it succeeds and trigger_epoch arms the window for future blocks. If trigger_epoch comes first, the admin extrinsic fails. The runtime does not enforce any ordering itself; this is consistent with how all extrinsic ordering works in Substrate.
  • A consequence of (3) is that AdminFreezeWindow and the trigger rate-limit (OwnerHyperparamRateLimit × Tempo, default 2 × Tempo720) interact: as long as AdminFreezeWindow < cooldown (true by default — 10 vs ≥ 720), a triggered epoch always fires before the next trigger could even be attempted. If a future configuration sets AdminFreezeWindow ≥ cooldown, the explicit EpochTriggerAlreadyPending check in §5.3 prevents storage corruption.

7. Commit-Reveal interaction

Commit-Reveal is migrated off the legacy modulo grid onto the stateful epoch
counter. An epoch is an event — one increment of SubnetEpochIndex[netuid]
per consumed epoch slot — not a position on a block grid. This keeps "the block
an epoch fires" and "the CR epoch boundary" the same event under any scheduler
perturbation (deferral, set_tempo, trigger_epoch).

7.1 Why the modulo formula could not survive dynamic tempo

The legacy CR logic computed epoch index as (block + netuid + 1) / (tempo + 1)
— a stateless function of the block number, assuming epochs fire on a fixed
global grid. The state-based scheduler (§3.1, §6.1) breaks that invariant:

  1. set_tempo re-anchors LastEpochBlock; the modulo grid shifts and commits
    keyed under the old grid are orphaned.
  2. trigger_epoch fires off-grid entirely.
  3. Per-block defer (§6.1, MaxEpochsPerBlock) re-anchors LastEpochBlock to the
    late block — so even a subnet whose owner never touches set_tempo drifts off
    the modulo grid when other subnets' epochs collide with it.

Vector (3) is the decisive one: blocking owner extrinsics on CR-enabled subnets
(§7.4) does not protect them, because the shared per-block cap can shift a
CR subnet that did nothing.

7.2 The stateful model

  • SubnetEpochIndex[netuid]: u64 — monotonic counter, +1 per consumed epoch
    slot in run_coinbase.
  • get_epoch_index(netuid, _block) -> u64 returns the counter (the block
    argument is ignored, kept only for GetTempoInterface signature compat).
  • current_epoch_with_lookahead(netuid) — counter plus one if an epoch slot is
    due this block. The look-ahead is required because reveal_crv3_commits runs
    in block_step before run_coinbase increments the counter, and a commit on
    a deferred fire-block belongs to the next epoch.
  • CR-v4 (TimelockedWeightCommits) — already keyed by epoch index; commits
    are keyed by current_epoch_with_lookahead, the auto-reveal sweep reads the
    same.
  • CR-v2 (WeightCommits) — tuple is (hash, commit_epoch, commit_block, _).
    commit_epoch drives reveal-window timing (is_reveal_block_range,
    is_commit_expired are pure counter comparisons); commit_block is retained
    for the epoch's commit-reveal weight column-mask (run_epoch.rs), which still
    compares commit-time against block_at_registration in block units.
  • get_reveal_blocks / get_first_block_of_epoch (modulo block-range helpers)
    are removed from live logic — block ranges of future epochs are unpredictable
    under dynamic tempo. get_first_block_of_epoch is retained solely for the
    already-executed migrate_crv3_commits_add_block migration.

7.3 Deferral is non-destructive to CR

If a CR subnet's epoch is deferred from block N to N+1, the reveal sweep at
block N reveals commits for the about-to-fire epoch (look-ahead), takes them from
storage, and the later block(s) become no-ops. The only artifact is a reveal one
block early — harmless, the weights simply sit in storage one extra block.

7.4 No owner-side block

do_set_tempo and do_trigger_epoch are allowed on commit-reveal-enabled
subnets. The original block (DynamicTempoBlockedByCommitReveal) existed because
the modulo grid orphaned commits on a tempo change; with CR keyed off the
stateful SubnetEpochIndex counter that failure mode is gone — set_tempo does
not touch the counter, and a triggered epoch advances it like any automatic one,
so in-flight commits stay correctly keyed. The error variant and both ensure!
guards are removed.

Residual: set_tempo (decrease) and trigger_epoch shorten the wall-clock
duration of a reveal epoch, so a CR-v4 timelock commit's drand round may not yet
be on chain when its reveal epoch arrives. The reveal sweep retries every block
within the reveal epoch, so this is a soft, self-correcting delay rather than a
correctness break — and it only affects the owner's own subnet, which the owner
already controls.


8. Rate limit summary

Extrinsic Rate limit Mechanism
set_tempo MinTempo blocks (= 360), fixed New TransactionType::TempoUpdate.
set_activity_cutoff_factor Same Existing OwnerHyperparamUpdate(Hyperparameter::ActivityCutoffFactorMilli) (new enum variant).
trigger_epoch Same Existing OwnerHyperparamUpdate(Hyperparameter::TriggerEpoch) (new enum variant).

9. Pause semantics and the Tempo == 0 edge case

9.1 Owner-pause: Tempo = MaxTempo

There is no separate pause flag or extrinsic for owners. To "pause" automatic epochs, an owner sets Tempo = MaxTempo. The safety net (G1) still forces an epoch every MaxTempo blocks regardless. An owner who wants more frequent epochs while paused can call trigger_epoch.

This is mathematically and semantically equivalent to a notional Paused: bool flag, given that the safety net cannot be bypassed. We choose the encoding without a new flag because:

  • It is one fewer storage item, one fewer extrinsic, one fewer concept to document.
  • It cannot be misconfigured (paused == true while tempo is also set is undefined; here that state cannot exist).
  • The owner's intent ("rare automatic epochs, manual control") is fully captured by the tempo setting.

9.2 The Tempo == 0 edge case (legacy defensive behaviour, not a feature)

The legacy runtime contains two short-circuits keyed on Tempo == 0:

  • blocks_until_next_epoch (coinbase/run_coinbase.rs:996) returns u64::MAX when tempo == 0. The accompanying comment reads only Special case: tempo = 0, the network never runs. There is no "kill", "switch", "pause", or "disable" wording anywhere in the code or storage definition.
  • is_in_admin_freeze_window (utils/misc.rs:60) returns false when tempo == 0, as a consequence of the previous: with u64::MAX blocks remaining, freeze-window arithmetic would otherwise be undefined.

Together these mean: if Tempo[netuid] == 0, no epoch ever runs and the freeze window is never engaged. This is defensive behaviour against a degenerate parameter value, not a documented feature. No production subnet has Tempo == 0 and no extrinsic — current or new — sets it deliberately.

This spec preserves the short-circuits in the new scheduler (§6.1) and freeze predicate (§6.4) for compatibility, but adds no new policy around the Tempo == 0 state:

  • Owner-side set_tempo validates the parameter against [MinTempo, MaxTempo], so owner cannot create the state.
  • Owner-side set_tempo does not read the current Tempo[netuid] value as a precondition. If root has set Tempo = 0 (unprecedented on mainnet), the owner can call set_tempo(netuid, N) with N ∈ [MinTempo, MaxTempo] and the scheduler resumes normally on the next block. This is consistent with R1, which gives owner authority over their subnet's cadence without reference to whatever value root last wrote.
  • Root retains full control via sudo_set_tempo (any u16, including 0), unchanged from today.

10. Migration

A single one-shot migration on runtime upgrade:

  1. Initialise LastEpochBlock for every existing subnet so the schedule continues without a perceived gap. For each netuid, compute the current "blocks until next epoch" under the old modulo formula, and back-fill LastEpochBlock = current_block - (Tempo[netuid] - blocks_until_next). The new period is Tempo (§3.1), next firing at LastEpochBlock + Tempo; this back-fill makes the new scheduler fire its first epoch on the same block the legacy modulo grid would have. The migration also seeds SubnetEpochIndex[netuid] with the legacy modulo epoch index so existing CR commit keys stay valid, and rewrites every CR-v2 WeightCommits entry to the (hash, commit_epoch, commit_block, _) layout (§7.2).

  2. Preserve existing Tempo[netuid] values as-is. Migration does not clamp tempo into [MinTempo, MaxTempo]. Two mainnet subnets currently run on non-standard tempos that root deliberately set; clamping would silently change their cadence and break operator expectations. Owner-side set_tempo continues to enforce the bounds for new updates (§5.1), and root-side sudo_set_tempo continues to accept any u16 (§6.3). Subnets with Tempo == 0 are left as-is — the legacy tempo == 0 short-circuit (§9.2) stays in place, so these subnets continue to receive no epochs after the upgrade, matching their pre-upgrade behaviour.

  3. Convert ActivityCutoff to ActivityCutoffFactorMilli per subnet, preserving block counts 1:1. For each netuid with non-zero Tempo, derive the per-mille factor from the existing absolute cutoff using ceiling division so the new computed cutoff lands at or above the legacy value:

    raw_factor = (ActivityCutoff[netuid] * 1000 + Tempo[netuid] - 1) / Tempo[netuid]
    factor     = clamp(raw_factor, MinActivityCutoffFactorMilli, MaxActivityCutoffFactorMilli)
    ActivityCutoffFactorMilli[netuid] = factor
    

    Ceiling rather than floor division ensures every observed production ActivityCutoff round-trips exactly to the same number of blocks under the new formula factor × tempo / 1000. With MaxActivityCutoffFactorMilli = 50_000 (raised from the originally proposed 20_000, see §4.2), every observed mainnet cutoff fits the bound without clamping, including the largest outlier (12 000 blocks at tempo=360 → factor 33 334). Subnets near the bound or beyond would shift to the nearest representable factor; this clause exists for defensive correctness only.

    Verified round-trip on production data: 5000→5000 (121 subnets), 6000→6000, 7200→7200, 12000→12000, 1000→1000, 360→360. The original migration sketch (no per-subnet conversion, default factor 14_000) was rejected because it would have silently shifted the 5000-block default to 5040 blocks for every subnet and clamped the 12000-block outlier to 7200 blocks.

Commit-Reveal: SubnetEpochIndex is seeded and CR-v2 WeightCommits entries are rewritten in the same migration (see §7.2). CR-v3/v4 TimelockedWeightCommits keys (epoch indices) stay valid because the counter is seeded continuously from the legacy modulo value.

In addition to the one-shot migration above, two non-migration code changes are needed alongside the storage introduction:

  1. New-subnet initialisation with stagger. init_new_network (pallets/subtensor/src/subnets/subnet.rs:300) is amended to explicitly write LastEpochBlock[netuid] immediately after Tempo::insert, using a per-netuid stagger:

    let stagger = u64::from(netuid) % ((default_tempo as u64).saturating_add(1));
    LastEpochBlock[netuid] = current_block.saturating_sub(stagger);
    

    PendingEpochAt and ActivityCutoffFactorMilli rely on their storage defaults (0 and 13_889 respectively), no explicit init required. The factor default 13_889 is set so a freshly registered subnet at default tempo 360 gets the historical 5000-block cutoff exactly (13 889 × 360 / 1000 = 5 000).

    Two purposes are served by this initialisation:

    (a) Avoid first-block safety-net firing. Without any explicit init, LastEpochBlock defaults to 0 via ValueQuery; on a live chain current_block − 0 ≫ MaxTempo would trigger the safety-net branch on the very first block_step after registration, running a pointless epoch on an empty subnet.

    (b) Preserve epoch-cadence stagger across new registrations. The legacy modulo formula (block + netuid + 1) % (tempo + 1) == 0 implicitly distributed subnets with the same tempo across consecutive blocks via the + netuid term. Under the new scheduler (period tempo + 1, §3.1), N subnets registered in the same block with no stagger would all fire their first epoch on the same future block (registration_block + tempo + 1), creating a single block of N concurrent heavy epochs and risking block weight overflow. The netuid % (default_tempo + 1) stagger reproduces the legacy distribution: consecutive netuids registered in the same block fire their first epochs on consecutive blocks, exactly as the legacy formula would have placed them.

    The boundary case netuid % (default_tempo + 1) == default_tempo produces a "first epoch on the next block" — bounded and harmless: the freshly registered subnet has no accumulated emissions, no validator weights, and an empty active set, so the early epoch is a no-op. After that no-op, LastEpochBlock is set to that block and the schedule continues normally.

    Stagger is not applied on subsequent LastEpochBlock rewrites (set_tempo cycle reset, sudo_set_tempo revive). Those events are owner-/root-driven and do not exhibit the mass-coordination pattern that registration does.

  2. Subnet dissolution cleanup. The dissolve path in coinbase/root.rs:280-308 (where Tempo, ActivityCutoff, Kappa, etc. are removed) is extended to remove the three new storage items: LastEpochBlock::<T>::remove(netuid), PendingEpochAt::<T>::remove(netuid), ActivityCutoffFactorMilli::<T>::remove(netuid). Prevents stale state from leaking into a future registration if the netuid is recycled.


11. Breaking changes & compatibility

11.1 On-chain runtime behaviour: none observable

The post-upgrade scheduler is a strict superset of the legacy modulo behaviour for any subnet that does not call set_tempo / trigger_epoch:

  • Migration step add subtensor pallet to runtime  #1 back-fills LastEpochBlock such that the first post-upgrade epoch fires on the same block as the legacy formula would have. No "missed" or "extra" epoch at the upgrade boundary.
  • Bonds masking (current_block - tempoLastEpochBlock + 1) is byte-identical under static tempo (§6.2.1).
  • is_in_admin_freeze_window (§6.4) yields the same window placement under static tempo.
  • CR reveal timing is migrated to the stateful epoch counter (§7); the counter is seeded continuously so pre-upgrade TimelockedWeightCommits and WeightCommits keep revealing without a gap.

11.2 Storage shape: additive only

New: LastEpochBlock, PendingEpochAt, ActivityCutoffFactorMilli. No existing storage removed or re-typed. ActivityCutoff<T> and MinActivityCutoff remain in storage (deprecated, see §14).

11.3 Off-chain semantic shifts (no shape change)

These do not break decoders, but consumers reading these surfaces will see different values or no-op writes:

Surface Pre-upgrade behaviour Post-upgrade behaviour
ActivityCutoff::<T> raw storage Source of truth for inactivity cutoff. Dead value — no longer read by the runtime. RPC consumers must migrate to ActivityCutoffFactorMilli (or use the computed activity_cutoff RPC field).
sudo_set_activity_cutoff extrinsic Writes the active cutoff. Silent no-op (writes to dead storage). Scripts relying on it have no effect.
RPC activity_cutoff field Direct read of ActivityCutoff. Computed (factor × tempo) / 1000. Migration converts each subnet's pre-upgrade ActivityCutoff into a factor via ceiling division, so post-upgrade activity_cutoff matches the pre-upgrade block count exactly for every observed production value (5000→5000, 6000→6000, 7200→7200, 12000→12000, 1000→1000, 360→360).
LastMechansimStepBlock Single per-subnet "last activity" field; conflates "scheduler tick" and "successful run" — on a healthy subnet they coincide. Field semantics unchanged (still success-only). But a sibling field LastEpochBlock now exists for "scheduler tick", and the two diverge on consistency-skipped subnets. Consumers must pick the meaning they need: LastEpochBlock for "when did the scheduler last fire here", LastMechansimStepBlock for "when did emission last distribute".
sudo_set_tempo Writes Tempo. Writes Tempo and LastEpochBlock = current_block. Behaviour change only when root uses it; not used regularly on mainnet.

11.4 Validator-tooling

  • New events (TempoSet, ActivityCutoffFactorMilliSet, EpochTriggered, EpochDeferred, EpochSkippedDueToInconsistentState) are additive — existing decoders ignore unknown events.
  • New errors are additive.
  • New extrinsics are additive.
  • AdminFreezeWindow predicate now also engages on pending manual trigger (§6.4). Admin extrinsics that previously always succeeded outside the auto-cadence window can now fail with AdminActionProhibitedDuringWeightsWindow if a trigger_epoch is pending — but trigger_epoch is itself new, so this is a property of the new feature, not a regression of existing flow.

11.5 Migration safety

A faulty migration (specifically: failure to back-fill LastEpochBlock) would cause the normal-cadence branch to fire on every subnet on the first block_step post-upgrade (since default LastEpochBlock = 0 makes blocks_since exceed tempo immediately), producing an empty epoch on every subnet simultaneously. The safety-net branch is unaffected by this — it keys on BlocksSinceLastStep, which is already populated pre-upgrade — but the cadence concern alone makes try-runtime CI mandatory before release.


12. Events and errors

Events (in pallet-subtensor)

  • TempoSet { netuid: NetUid, tempo: u16 }
  • ActivityCutoffFactorMilliSet { netuid: NetUid, factor_milli: u32 }
  • EpochTriggered { netuid: NetUid, by: AccountId, fires_at: BlockNumber }fires_at is the earliest block at which the triggered epoch may execute. Actual execution may be deferred under per-block epoch cap (§6.1); see also EpochDeferred.
  • EpochDeferred { netuid: NetUid, from_block: BlockNumber, to_block: BlockNumber } — emitted when the per-block epoch cap is reached and a subnet's epoch (auto, safety-net, or triggered) is rescheduled to the next block. Off-chain tooling can use this to track actual vs. expected fires_at. See §6.1.
  • EpochSkippedDueToInconsistentState { netuid: NetUid, block: BlockNumber } — emitted whenever should_run_epoch returns true but is_epoch_input_state_consistent returns false. The schedule still advances; the actual epoch execution is skipped for this slot. See §6.1.

Errors

  • TempoOutOfBounds
  • ActivityCutoffFactorMilliOutOfBounds
  • EpochTriggerAlreadyPending
  • RateLimitExceeded (existing pattern)
  • NotSubnetOwner (existing)
  • AdminActionProhibitedDuringWeightsWindow (existing — now applies to additional state shapes; see §6.4)

13. RPC

get_subnet_info / metagraph queries:

  • activity_cutoff — keep field in the response, but compute as (factor_milli × tempo) / 1000 rather than reading the deprecated storage. Existing consumers see the same shape with a value that now scales with tempo and supports fractional factors.
  • New fields:
    • activity_cutoff_factor_milli: u32 — for tooling that wants the source-of-truth value (per-mille units).
    • last_epoch_block: u64 — for explorers and validator dashboards.
    • pending_epoch_at: u64 — block at which a triggered epoch is scheduled to fire, or 0 if no trigger pending. Note: this is a lower bound — actual execution may be deferred by 1+ blocks under the per-block epoch cap (§6.1). Clients tracking precise execution time should subscribe to EpochDeferred and EpochTriggered events rather than computing forward from pending_epoch_at.

No removed fields in this PR; ActivityCutoff raw value is no longer meaningful but the RPC field stays during the transition.


14. Test updates required

Existing tests use ActivityCutoff::<Test>::set(netuid, u16::MAX) to "make all stake active". After this change, that storage is no longer read. The pattern must be replaced with ActivityCutoffFactorMilli::<Test>::set(netuid, u32::MAX). Affected files (current state, may shift):

  • pallets/subtensor/src/tests/coinbase.rs:3046, 3241
  • pallets/subtensor/src/tests/staking.rs:2478
  • pallets/subtensor/src/tests/children.rs:2919
  • pallets/subtensor/src/tests/mechanism.rs:457, 1515
  • pallets/subtensor/src/tests/networks.rs:352
  • pallets/subtensor/src/tests/epoch_logs.rs:63

pallets/admin-utils/src/tests/mod.rs::test_sudo_set_activity_cutoff — exercises the deprecated path; can be left as-is or marked #[ignore] until cleanup.

New test surface — to be detailed during implementation. Coverage areas: each new extrinsic (happy path, bounds, rate limit, freeze-window, non-owner rejection), schedule/execution decoupling on consistency-broken subnets, new-subnet stagger, per-block epoch cap, trigger storm, safety-net invariant under owner attack (alternating grow/shrink across MaxTempo blocks must still force-fire via BlocksSinceLastStep > MaxTempo), migration continuity, CR-v3 coexistence, Tempo == 0 legacy edge case, bonds masking equivalence under static tempo and correctness under dynamic tempo, activity-cutoff per-mille at boundary factors.


15. Out of scope (future cleanup PR)

  • Removing the unused ActivityCutoff<T> storage map.
  • Removing sudo_set_activity_cutoff extrinsic and its weights / benchmarks.
  • Removing the MinActivityCutoff storage in admin-utils.
  • Removing the legacy get_activity_cutoff getter once all call sites use get_activity_cutoff_factor.
  • Removing the modulo-based blocks_until_next_epoch helper from run_coinbase.rs if no remaining caller depends on it.
  • Extracting pallet-epoch. All epoch-related state and logic (Yuma classic + 3.0, bonds, weights, scheduling, pending pools, the new tempo storage from this PR) should eventually move out of pallet-subtensor into a dedicated pallet. Doing this together with the dynamic-tempo feature would mix a large refactor with a behavioural change; we defer it to a follow-up PR that can move all epoch state atomically with a single migration.

These are mechanical removals with no behaviour change and are deferred to keep this PR focused and reviewable.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions