Skip to content

metrics: simplify and separate metrics logic from specs#667

Merged
tcoratger merged 9 commits intoleanEthereum:mainfrom
unnawut:metrics-observer
Apr 22, 2026
Merged

metrics: simplify and separate metrics logic from specs#667
tcoratger merged 9 commits intoleanEthereum:mainfrom
unnawut:metrics-observer

Conversation

@unnawut
Copy link
Copy Markdown
Collaborator

@unnawut unnawut commented Apr 22, 2026

🗒️ Description

The existing specs have metrics logic mixed with the core specs logic. The most visible example is the metrics code in the fork choice store. This PR attempts to minimize the metrics code in the specs for better readability and also separation of the specs from other objectives.

Changes

  • Introduce observability subspecs, the vendor-neutral version for reporting metrics. The specs now implement observability instead of prometheus-specific metrics subspecs.
    • Although, we still have MetricsRegistry containing all the metrics declaration. If this PR passes through I can move these into observability subspecs.
    • Once the metrics subspecs have only Prometheus-specific code, it can then be moved out of the subspecs whenever feat: add multi-fork architecture with ForkProtocol and SpecRunner #638 settles how we will separate the specs vs. client code.
  • Refactored fork choice store so instead of calling and calculating the metrics timing in the fork choice specs, the fork choice store only need to do a with statement e.g. with observe_on_block(): which is a lot easier for specs reader to ignore.
  • Also removed _record_metrics() from the fork choice specs which seems out of place. The metrics can be recorded by the caller (sync service) which is outside the core specs.
  • Removed BlockLookup methods that were only used by the metrics but became part of the specs. The methods are moved into their respective callers

Overall our core specs now need not aware of the metrics implementation. It only needs to register through with observe_on_attestation(): or with observe_on_block(): and metrics can be calculated and published by the abstracted observer instead. Then we can configure that spec tests can ignore metrics by setting to NullObserver while the python client could inject PrometheusObserver into the observability module.

🔗 Related Issues or PRs

✅ Checklist

  • Ran tox checks to avoid unnecessary CI fails:
    uvx tox
  • Considered adding appropriate tests for the changes.
  • Considered updating the online docs in the ./docs/ directory.

@unnawut unnawut requested review from ch4r10t33r and tcoratger April 22, 2026 07:36
@unnawut unnawut added the refactor Category: refactor label Apr 22, 2026
unnawut and others added 7 commits April 22, 2026 14:37
Store.on_block now accepts an observer parameter with a NullObserver
default. All metric calls are replaced with observer hook invocations.
The spec is free of any metrics import; the Prometheus coupling lives
in PrometheusForkChoiceObserver (metrics/forkchoice_observer.py).

The observer Protocol defines four hooks: state_transition_timed,
block_processed, head_advanced, head_reorged. This shape mirrors the
current Prometheus surface 1:1 so the translation is mechanical.

sync/service.py's default_block_processor constructs a module-level
PrometheusForkChoiceObserver and passes it in. Call sites that don't
need telemetry (tests, validator service) keep calling on_block with
no observer and get the null implementation.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replace the four-method ForkChoiceObserver with a unified SpecObserver
that lives at subspecs/observability/. state_transition times itself
(the phenomenon being measured is intrinsic to that function, not to
any caller) and publishes through a module-level singleton. No
observer parameter threads through spec signatures.

Store.on_block reverts to its original signature. Block-processing
timing, head/safe-target gauges, and reorg detection move to
default_block_processor in sync/service.py, where they are derived
by wrapping the call and diffing pre/post stores.

PrometheusSpecObserver (metrics/spec_observer.py) is the single
vendor-specific adapter. Registered once at node startup via
set_observer. Spec code has zero Prometheus imports.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Spec code no longer imports time or handles perf_counter arithmetic.
observe_state_transition() wraps the transition body and publishes the
elapsed duration through the observer singleton on clean exit. On
exception the event is suppressed, preserving prior behavior where a
failed state-root check did not emit a sample.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ntext managers

Extends the SpecObserver Protocol with on_block_timed and on_attestation_timed
and adds matching observe_on_block and observe_on_attestation context managers.
Store.on_block and Store.on_gossip_attestation now self-time through the
observer, mirroring State.state_transition.

default_block_processor and the gossip-attestation handler lose their
perf_counter stopwatches. They still own caller-side concerns: head /
justified / finalized gauges, reorg counter, and the valid/invalid
attestation counters whose source label is caller-defined.

PrometheusSpecObserver forwards the two new hooks into the existing
histograms (lean_fork_choice_block_processing_time_seconds and
lean_attestation_validation_time_seconds), so the exposed metric
surface is unchanged.

Tests parametrize the observer + context-manager + Prometheus-adapter
behavior across all three events to keep the suite DRY.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
BlockLookup's only beyond-dict behavior was ancestors() and reorg_depth(),
and the single caller of either was the fork-choice reorg-depth histogram
in default_block_processor. Neither method participated in any spec rule.

Store.blocks becomes dict[Bytes32, Block]. The reorg-depth computation
moves inline to default_block_processor via a small _ancestor_set helper,
keeping the same histogram and counter values. No behavior change for
production paths; tests that previously touched BlockLookup now use
plain dicts.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The "Spec" qualifier was redundant. The class lives in the metrics
subpackage, implements the SpecObserver Protocol from observability,
and is referenced as the Prometheus-backed observer at startup. A
plain "PrometheusObserver" reads the same and matches how it's used
at wire-up in __main__.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@unnawut
Copy link
Copy Markdown
Collaborator Author

unnawut commented Apr 22, 2026

Btw. the with statement introduced quite a bit of indents. It's a lot easier to review by hiding whitespaces e.g. https://github.com/leanEthereum/leanSpec/pull/667/changes?w=1

unnawut and others added 2 commits April 22, 2026 14:54
BlockLookup returns as dict[Bytes32, "Block"], matching its original
shape before the metrics-driven class with ancestors() and reorg_depth()
was introduced. Store.blocks reuses the alias so the type intent stays
visible at the model field. The reorg-depth caller in default_block_processor
and the StoreChecks helper compute depth via a small _ancestor_set walk
plus a native set difference.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@tcoratger tcoratger merged commit f160625 into leanEthereum:main Apr 22, 2026
13 checks passed
@unnawut unnawut deleted the metrics-observer branch April 22, 2026 08:47
tcoratger added a commit to tcoratger/leanSpec that referenced this pull request Apr 22, 2026
Integrates the metrics→observability split (leanEthereum#667) with the multi-fork layout:

- fork code (forks/devnet4/store.py, forks/devnet4/containers/state/state.py)
  imports only the vendor-neutral observe_on_attestation / observe_on_block /
  observe_state_transition hooks; no Prometheus-specific metrics land inside
  forks/, keeping the fork folder consensus-critical by construction
- BlockLookup is now the plain dict[Bytes32, Block] alias from leanEthereum#667, so the
  former dict-subclass imports (Iterator, GetCoreSchemaHandler, ZERO_HASH)
  and the node's BlockLookup({...}) wrap are dropped
- reorg-depth telemetry moves to sync/service.py's default_block_processor
  and stays off the spec side
- consensus tests and fixture builders added in leanEthereum#663, leanEthereum#664, leanEthereum#665, leanEthereum#666 retarget
  lean_spec.subspecs.containers.* → lean_spec.forks.devnet4.containers.*; the
  verify_signatures tamper hook's in-function imports are pulled up to the top
- drops tests/lean_spec/subspecs/containers/block/test_block_lookup.py
  (BlockLookup no longer has ancestors / reorg_depth methods to test)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

refactor Category: refactor

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants