Skip to content

[Code Health] scope_stats heap-ring wrap: scope_high_water / scope_alloc miscompute when heap_top < heap_tail #996

Description

@ChaoZheng109

Category

Robustness (potential edge-case failure)

Component

Other — DFX / scope_stats (heap ring usage accounting)

Description

scope_stats snapshots each ring's head/tail at every PTO2_SCOPE boundary and reports usage
as end - start deltas. This is correct for the task_window ring (task_head/task_tail
are monotonic sequence numbers, so head >= tail always), but not for the heap ring, whose
heap_top_/heap_tail_ are wrapping byte offsets in [0, heap_size). Two consequences:

  1. The on-device contract comment ("heap_end - heap_start is bytes in use") is wrong once the
    heap has wrapped (heap_end < heap_start), so any consumer that subtracts the raw fields gets
    a garbage/negative value. The Python plotter compensates with a single-fold wrap correction,
    which only happens to be enough for instantaneous occupancy (always < capacity).

  2. The span/cumulative metrics are not recoverable from two wrapped snapshots. scope_alloc and
    scope_high_water are not bounded by capacity: a scope whose cumulative heap throughput
    exceeds heap_size wraps more than once, and the wrapped end/begin offsets can no longer
    reconstruct the true value. (Backpressure bounds instantaneous occupancy, not per-scope
    throughput.) Separately, scope_high_water = end.top - begin.tail is not even a true peak in
    the no-wrap case — it is the total address span touched, an upper bound, not the realized peak.

Location

  • src/common/platform/include/common/scope_stats.h:72-89 — contract comment + raw heap fields
  • src/a2a3/runtime/tensormap_and_ringbuffer/runtime/pto_ring_buffer.h:183-206 — monotonic task
    head/tail vs wrapping heap top/tail
  • src/a2a3/runtime/tensormap_and_ringbuffer/runtime/pto_orchestrator.cpp:443-448,466-471
    capture site
  • simpler_setup/tools/scope_stats_plot.py:65-67,88-94 — metrics + single-fold wrap correction
  • a5 mirror: src/a5/runtime/tensormap_and_ringbuffer/runtime/{pto_ring_buffer.h,pto_orchestrator.cpp}

Proposed Fix

Record monotonic (non-wrapping) heap accounting at the scope boundary instead of the raw
wrapping heap_top_/heap_tail_, so every metric becomes an exact subtraction and the Python
wrap correction can be removed. Relabel scope_high_water to reflect what it actually is (an
upper bound on occupancy, not an observed peak). The task_window ring needs no change. Keep a5
in sync and add a regression test for a scope that wraps the heap more than once.

Priority

Medium (minor risk, should fix in next few releases)


Tracked under #995. Distinct from #991 (HTML readability / dep_pool channel) and #902 (per-task
granularity).

Metadata

Metadata

Assignees

Labels

code healthTechnical debt, robustness, code quality

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions