Skip to content

Latest commit

 

History

History
2444 lines (1621 loc) · 120 KB

File metadata and controls

2444 lines (1621 loc) · 120 KB

CHANGELOG

v1.0.2 (2026-04-05)

Bug Fixes

  • decoration: Python 3.14 compat -- two-pass decoration + TypeError catch (#138, e6f0f9a)

Python 3.14 (PEP 649) evaluates annotations lazily. During decorate_all_once(), wrapping Tensor.bool before inspecting Tensor.dim_order caused inspect.signature() to resolve bool in bool | list[torch.memory_format] to the wrapper function instead of the builtin type, raising TypeError on first call only.

Three fixes: - Catch TypeError alongside ValueError in get_func_argnames (safety net) - Split decorate_all_once() into two passes: collect argnames from pristine namespace first, then decorate (eliminates root cause) - Replace _orig_to_decorated idempotency guard with _is_decorated flag so partial decoration failure allows retry instead of locking in incomplete state

6 new tests, gotchas.md updated.

v1.0.1 (2026-03-23)

Bug Fixes

  • decoration: Clear stale sq_item C slot after wrapping Tensor.getitem (b2c6085)

When getitem is replaced on a C extension type with a Python function, CPython sets the sq_item slot in tp_as_sequence. This makes PySequence_Check(tensor) return True (was False in clean PyTorch), causing torch.tensor([0-d_tensor, ...]) to iterate elements as sequences and call len() -- which raises TypeError for 0-d tensors. The slot is never cleared by restoring the original wrapper_descriptor or by delattr.

Fix: null sq_item via ctypes after every decoration/undecoration cycle (decorate_all_once, unwrap_torch, wrap_torch). Safe because tensor indexing uses mp_subscript (mapping protocol), not sq_item (sequence protocol). Verified via tp_name guard; fails silently on non-CPython.

Adds 9 regression tests covering all lifecycle paths.

Chores

  • Add secret detection pre-commit hooks (0e2889a)

Add detect-private-key (pre-commit-hooks) and detect-secrets (Yelp) to catch leaked keys, tokens, and high-entropy strings before they hit the repo.

v1.0.0 (2026-03-13)

Bug Fixes

  • decoration: Cast mode.device to str for mypy return-value check (45c0ff3)

CI mypy (stricter torch stubs) catches that mode.device returns torch.device, not str. Explicit str() cast satisfies the Optional[str] return type annotation.

Features

  • decoration: Lazy wrapping — import torchlens has no side effects (b5da8b8)

BREAKING CHANGE: torch functions are no longer wrapped at import time. Wrapping happens lazily on first log_forward_pass() call and persists.

Three changes:

  1. Lazy decoration: removed decorate_all_once() / patch_detached_references() calls from init.py. _ensure_model_prepared() triggers wrapping on first use via wrap_torch().

  2. Public wrap/unwrap API: - torchlens.wrap_torch() — install wrappers (idempotent) - torchlens.unwrap_torch() — restore original torch callables - torchlens.wrapped() — context manager (wrap on enter, unwrap on exit) - log_forward_pass(unwrap_when_done=True) — one-shot convenience Old names (undecorate_all_globally, redecorate_all_globally) kept as internal aliases.

  3. torch.identity fix: decorated identity function now stored on _state._decorated_identity instead of monkey-patching torch.identity (which doesn't exist in PyTorch type stubs). Eliminates 2 mypy errors.

Tests updated: 75 pass including 12 new lifecycle tests.

Breaking Changes

  • decoration: Torch functions are no longer wrapped at import time. Wrapping happens lazily on first log_forward_pass() call and persists.

v0.22.0 (2026-03-13)

Chores

  • types: Remove stale typing noise (5ba099d)

Documentation

  • maintenance: Refresh maintainer notes (685a358)

  • maintenance: Split CLAUDE.md/AGENTS.md into architect vs implementation roles (18f8ae9)

Break the symlink mirroring convention: CLAUDE.md now holds architect-level context (what, why, how it connects) while AGENTS.md holds implementation-level context (conventions, gotchas, known bugs, test commands). Pure-implementation subdirs (.github, scripts, tests, utils) get AGENTS.md only. Also populates .project-context/ templates (architecture, conventions, gotchas, decisions).

Features

  • decoration: Add global undecorate override (b0bafeb)

  • viz: Add dagua torchlens integration (35d5bcd)

Refactoring

  • types: Finish package mypy cleanup (6f9a3fe)

v0.21.3 (2026-03-11)

Bug Fixes

  • tests: Make SIGALRM signal safety test deterministic (b3fc461)

Replace timer-based SIGALRM with direct os.kill() inside forward() so the signal always fires mid-logging. Eliminates flaky skips when the forward pass completes before the timer.

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

v0.21.2 (2026-03-09)

Bug Fixes

  • vis: Avoid graphviz.Digraph memory bomb when ELK fails on large graphs (f5563ee)

When ELK layout fails (OOM/timeout) on 1M+ node graphs, the fallback path previously built a graphviz.Digraph in Python — nested subgraph body-list copies exploded memory and hung indefinitely. Now render_elk_direct handles the failure internally: reuses already-collected Phase 1 data to generate DOT text without positions and renders directly with sfdp, bypassing graphviz.Digraph entirely.

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

  • vis: Bypass ELK for large graphs — use Python topological layout (37cce3a)

ELK's stress algorithm allocates TWO O(n²) distance matrices (n² × 16 bytes). At 100k nodes that's 160 GB, at 1M nodes it's 16 TB — the root cause of the std::bad_alloc. The old >150k stress switch could never work.

For graphs above 100k nodes, we now skip ELK entirely and compute a topological rank layout in Python (Kahn's algorithm, O(n+m)). Module bounding boxes are computed from node positions. The result feeds into the same neato -n rendering path, preserving cluster boxes.

If ELK fails for smaller graphs, the Python layout is also used as a fallback instead of the old sfdp path that built a graphviz.Digraph (which exploded on nested subgraph body-list copies).

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

v0.21.1 (2026-03-09)

Bug Fixes

  • postprocess: Fix mypy type errors in _build_module_param_info (11ea006)

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

Chores

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

Performance Improvements

  • postprocess: Optimize pipeline for large models (a211417)

  • Per-step verbose timing: unwrap grouped _vtimed blocks into individual step timing with graph-stats summary, enabling users to identify which specific step is slow (O16) - Cache module_str by containing_modules tuple to avoid redundant string joins in Step 6 (O8) - Early-continue guards in _undecorate_all_saved_tensors to skip BFS on layers with empty captured_args/kwargs (O5) - Pre-compute buffer_layers_by_module dict in _build_module_logs, eliminating O(modules × buffers) scan per module (O6) - Single-pass arglist rebuild in Step 11 rename, replacing 3-pass enumerate + index set + filter pattern (O2) - Replace OrderedDict with dict in _trim_and_reorder (Python 3.7+ preserves insertion order) for lower allocation overhead (O4) - Reverse-index approach in _refine_iso_groups: O(members × neighbors) instead of O(members²) all-pairs combinations (O9) - Pre-compute param types per subgraph as frozenset before pair loop in _merge_iso_groups_to_layers (O10) - Set-based O(n) collision detection replacing O(n²) .count() calls in _find_isomorphic_matches (O12)

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

v0.21.0 (2026-03-09)

Bug Fixes

  • capture: Fix mypy type errors in output_tensors field dict (d54e9a9)

Annotate fields_dict as Dict[str, Any] and extract param_shapes with proper type to satisfy mypy strict inference.

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

  • vis: Pass heap limits to ELK Worker thread to prevent OOM on 1M nodes (23ef8d8)

The Node.js Worker running ELK layout had no explicit maxOldGenerationSizeMb in its resourceLimits — only stackSizeMb was set. The --max-old-space-size flag controls the main thread's V8 isolate, not the Worker's. This caused the Worker to OOM at ~16GB on 1M-node graphs despite the main thread being configured for up to 64GB.

  • Add maxOldGenerationSizeMb and maxYoungGenerationSizeMb to Worker resourceLimits, passed via _TL_HEAP_MB env var - Add _available_memory_mb() to detect system RAM and cap heap allocation to (available - 4GB), preventing competition with Python process - Include available system memory in OOM diagnostic messages

Also includes field/param renames from feat/grand-rename branch.

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

Documentation

  • Update all CLAUDE.md files with deepdive session 4 findings (b15c5bf)

Sync all project and subpackage documentation with current codebase: - Updated line counts across all 36 modules - Added elk_layout.py documentation to visualization/ - Added arg_positions.py and salient_args.py to capture/ - Documented 13 new bugs (ELK-IF-THEN, BFLOAT16-TOL, etc.) - Updated test counts (1,004 tests across 16 files) - Added known bugs sections to validation/, utils/, decoration/ - Updated data_classes/ with new fields and properties

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

Features

  • Rename all data structure fields and function args for clarity (f0d7452)

Rename ~68 fields across all 8 data structures (ModelLog, LayerPassLog, LayerLog, ParamLog, ModuleLog, BufferLog, ModulePassLog, FuncCallLocation) plus user-facing function arguments. Key changes:

  • tensor_contents → activation, grad_contents → gradient - All _fsize_memory (e.g. tensor_fsize → tensor_memory) - func_applied_name → func_name, gradfunc → grad_fn_name - is_bottom_level_submodule_output → is_leaf_module_output - containing_module_origin → containing_module - spouse_layers → co_parent_layers, orig_ancestors → root_ancestors - model_is_recurrent → is_recurrent, elapsed_time_* → time_* - vis_opt → vis_mode, save_only → vis_save_only - Fix typo: output_descendents → output_descendants

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

v0.20.5 (2026-03-09)

Bug Fixes

  • vis: Prevent OOM kill on 1M-node ELK render (#128, d9a1525)

The 1M-node render was OOM-killed at ~74GB RSS because: 1. Model params (~8-10GB) stayed alive during ELK subprocess 2. preexec_fn forced fork+exec, COW-doubling the 74GB process 3. Heap/stack formulas produced absurd values (5.6TB heap, 15GB stack) 4. No memory cleanup before subprocess launch

Changes: - render_large_graph.py: separate log_forward_pass from render_graph, free model/autograd before ELK render - elk_layout.py: cap heap at 64GB, stack floor 4096MB/cap 8192MB, write JSON to temp file (free string before subprocess), gc.collect before subprocess, set RLIMIT_STACK at module level (removes preexec_fn and the forced fork+exec)

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

v0.20.4 (2026-03-09)

Bug Fixes

  • postprocess: Backward-only flood in conditional branch detection + THEN labeling (#88, d737828)

Bug #88: _mark_conditional_branches flooded bidirectionally (parents + children), causing non-conditional nodes' children to be falsely marked as in_cond_branch. Fix restricts flooding to parent_layers only.

Additionally adds THEN branch detection via AST analysis when save_source_context=True, with IF/THEN edge labels in visualization. Includes 8 new test models, 22 new tests, and fixes missing 'verbose' in MODEL_LOG_FIELD_ORDER.

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

  • vis: Use Worker thread for ELK layout to fix stack overflow on large graphs (3fe6a84)

V8's --stack-size flag silently caps at values well below what's requested, causing "Maximum call stack size exceeded" on 1M+ node graphs. Switch to Node.js Worker threads with resourceLimits.stackSizeMb, which reliably delivers the requested stack size at the V8 isolate level.

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

v0.20.3 (2026-03-08)

Bug Fixes

  • vis: Increase ELK Node.js stack floor to 4GB for large graphs (29af94e)

128MB was insufficient for ELK's recursive layout on 500k+ node graphs.

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

  • vis: Raise OS stack limit for ELK Node.js subprocess (da82c9d)

The OS soft stack limit (ulimit -s) was smaller than the --stack-size value passed to Node.js, causing a segfault on large graphs (500k+ nodes) instead of allowing V8 to use the requested stack. Uses preexec_fn to set RLIMIT_STACK to unlimited in the child process only.

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

Performance Improvements

  • decoration: Optimize model prep and move session attrs to ModelLog dicts (b63a4fa)

Five performance fixes for _prepare_model_session and related setup code:

  • PERF-38: Replace O(N²) list concat in _traverse_model_modules with deque - PERF-37: Cache user_methods per class in _get_class_metadata; move _pytorch_internal set to module-level frozenset - PERF-36: Iterate module._parameters directly instead of rsplit on named_parameters addresses + lookup dict - PERF-39: Skip patch_model_instance for already-prepared models - Move 4 session-scoped module attrs (tl_module_pass_num, tl_module_pass_labels, tl_tensors_entered_labels, tl_tensors_exited_labels) from nn.Module instances to ModelLog dicts keyed by id(module). Remove tl_source_model_log (dead code). Eliminates per-module cleanup iteration in _cleanup_model_session.

At 10K modules: ensure_prepared repeat calls drop from ~48ms to ~0.4ms (111x), session setup ~1.3x faster, cleanup ~1.4x faster.

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

v0.20.2 (2026-03-08)

Bug Fixes

  • vis: Increase ELK Node.js stack size to prevent overflow (b8edbc8)

Bump --stack-size floor from 64MB to 128MB and multiplier from 16x to 48x (matching heap scaling) to prevent "Maximum call stack size exceeded" in elkjs on large graphs.

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

Chores

  • scripts: Enable loop detection in render_large_graph (803e16f)

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

v0.20.1 (2026-03-08)

Chores

  • scripts: Use log_forward_pass vis_opt instead of separate render call (d2aea0f)

Let verbose mode handle all phase timing instead of manual timestamps. Use log_forward_pass's built-in vis_opt to render in one call.

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

Performance Improvements

  • model_prep: Optimize _prepare_model_session for large models (2892323)

  • Hoist set(dir(nn.Module)) to module-level constant _NN_MODULE_ATTRS - Replace dir(module) MRO walk with dict scans for attrs and methods - Pre-build address→module dict to eliminate per-parameter tree walks - Use model.modules() with cached tl_module_address instead of second DFS

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

v0.20.0 (2026-03-08)

Chores

  • scripts: Unify large graph render scripts into single parameterized script (07a8186)

Replace run_250k.py and run_1M.py with render_large_graph.py that accepts any node count as a CLI argument, plus --format, --seed, and --outdir options.

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

Features

  • logging: Add verbose mode for timed progress messages (0603f10)

Add verbose: bool = False parameter to log_forward_pass, show_model_graph, and internal pipeline functions. When enabled, prints [torchlens]-prefixed progress at each major pipeline stage with timing. Also fixes _trim_and_reorder_model_history_fields to preserve all non-ordered attributes (not just private ones).

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

v0.19.0 (2026-03-08)

Chores

  • Add large graph render scripts to scripts/ (d02233b)

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

Documentation

  • Update RESULTS.md and tests/CLAUDE.md with current counts (75fa346)

  • Total tests: 892 → 951, test files: 14 → 15, toy models: 249 → 250 - Add test_large_graphs.py (51 tests) to file tables - Add decoration overhead benchmark table to profiling baselines - Add large graph scaling section (100 to 1M nodes) - Update all per-file test counts to current values

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

Features

  • capture: Add func_config field for lightweight hyperparameter extraction (7144d6d)

Adds a func_config dict to every LayerPassLog/LayerLog containing computation-defining hyperparameters (kernel_size, stride, in/out channels, dropout p, etc.) extracted at capture time with zero tensor cloning. Empty for source tensors and output nodes.

Also fixes pre-existing test failures in test_validation.py (read-only property assignments) and adds detect_loops to MODEL_LOG_FIELD_ORDER.

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

Testing

  • profiling: Add decoration overhead benchmark to profiling report (c8fbffd)

Measures per-call overhead of TorchLens's toggle-gated wrappers when logging is disabled. Benchmarks 11 functions from cheap (relu, add) to heavy (conv2d, SDPA) — confirms ~600ns overhead on cheap ops, <1% on real compute.

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

v0.18.0 (2026-03-08)

Features

  • data: Add MACs properties to LayerPassLog, LayerLog, ModelLog, ModuleLog (c60e7b9)

MACs (multiply-accumulate operations) = FLOPs / 2. Added: - LayerPassLog: macs_forward, macs_backward properties - LayerLog: macs_forward, macs_backward properties - ModelLog: total_macs_forward, total_macs_backward, total_macs, macs_by_type() - ModuleLog: flops_forward, flops_backward, flops, macs_forward, macs_backward, macs

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

Refactoring

  • data: Convert 23 stored fields to computed @properties across data classes (2c22208)

Replace redundant stored fields with computed @property methods that derive their values from existing data. Eliminates ~155 lines of write-site code across 13 files while preserving identical behavior and passing all invariants.

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

v0.17.0 (2026-03-08)

Bug Fixes

  • types: Add mypy annotations for defaultdict and deque in elk_layout (98478a3)

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

Features

  • vis: Scale ELK rendering to 250k+ nodes, add detect_loops option (4707631)

ELK's layered algorithm (Sugiyama) uses O(n²) memory for crossing minimization, causing std::bad_alloc at ~150k+ nodes. This adds:

  • Auto-switch to ELK stress algorithm above 150k nodes (O(n) memory) - Topological position seeding for stress to preserve directional flow - Increased Node.js heap allocation (16GB floor, 48x JSON size) - Better error messages when Node.js OOM-kills (was silent empty stderr) - detect_loops parameter on log_forward_pass/show_model_graph to skip expensive isomorphic subgraph expansion, keeping only same-param grouping (Rule 1). Default True (existing behavior unchanged). - 8 loop comparison tests rendering with/without loop detection

Successfully renders 250k-node graphs in ~19 minutes (was impossible). 1M-node render in progress.

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

v0.16.4 (2026-03-08)

Bug Fixes

  • validation: Use call nesting depth not address depth in invariant Q, fix test inputs (#117, 98660ff)

v0.16.3 (2026-03-08)

Bug Fixes

  • tests: Rename model_kwargs to input_kwargs and fix test configs (8768339)

  • Rename model_kwargs= to input_kwargs= in 17 test call sites to match API - Skip test_gpt_bigcode (JIT-compiled attention incompatible with TorchLens) - Fix test_deformable_detr backbone out_features to match 2-layer ResNet - Add index, symint, checkkeypaddingmask to _UNARY_FUNCS ArgSpec entries

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

  • types: Resolve 14 pre-existing mypy errors across 4 files (7adcc36)

  • rng.py: annotate rng_dict as Dict[str, object] for mixed-type values - elk_layout.py: annotate out_shape as tuple to avoid index-out-of-range - model_prep.py: annotate meta as Dict[str, object], add type: ignore for dynamic tl_buffer_address attrs on Tensor - output_tensors.py: add type: ignore for ArgSpec arg-type and assignment

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

v0.16.2 (2026-03-08)

Bug Fixes

  • validation: Exempt exponential_ from perturbation check (a6dce40)

In-place RNG op — output is determined by RNG state, not input values. Fixes test_gumbel_vq_model.

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

  • validation: Exempt maximum/minimum from perturbation check (9d658bc)

torch.maximum/minimum with extreme-valued args (e.g. RWKV's negative infinity masks) are insensitive to perturbation — same as max/min.

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

  • validation: Fix perturbation precision and add exemptions for C++ ops (7f86655)

  • Scale constant-tensor perturbation by ±10% of magnitude to ensure float32-distinguishable values while staying in safe range - Add posthoc magnitude-ratio exemption: when non-perturbed parent's magnitude dwarfs the perturbed parent's (>100x), float32 arithmetic swallows the perturbation — exempt rather than fail - Add maximum/minimum to posthoc exemption (element-wise binary ops insensitive to perturbation when one arg dominates) - Skip perturbation for _op (torchvision C++ PyCapsule ops like nms, roi_align) — perturbed coordinates segfault native extensions - Fix nystromformer test: seq_len must equal num_landmarks² (64)

Fixes: test_fcos_resnet50_train, test_retinanet_resnet50_train, test_nystromformer, test_maskrcnn_resnet50_train

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

  • validation: Scale perturbation range by magnitude for large constants (efcd876)

Constant tensors with large values (e.g. 1e8) were perturbed by ±1.0, which is indistinguishable in float32 precision. Now scales expansion by abs(value) * 1e-4 to ensure perturbation is always detectable.

Fixes test_fcos_resnet50_train validation failure.

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

  • validation: Use relative threshold for near-constant tensor perturbation (e0c82c7)

Near-constant float tensors (e.g. [2.6785714, 2.6785717]) had a value range smaller than float32 precision, so perturbation within [min, max] produced the same value after rounding. Changed exact lo == hi check to relative threshold hi - lo < max(1e-6, abs(lo) * 1e-6) to trigger range expansion for these cases.

Fixes test_ssd300_vgg16_train validation failure.

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

  • vis: Probe nvm paths when node is not on PATH (4d3ae24)

Non-interactive shells (IDE test runners, cron, subprocesses) often lack nvm's PATH additions, causing elk_available() to return False even when elkjs is installed. Add _find_node_binary() to probe ~/.nvm/versions/node/ as a fallback, and inject the node binary's directory into the subprocess PATH so elkjs detection works regardless of shell configuration.

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

v0.16.1 (2026-03-07)

Bug Fixes

  • tests: Use relative import for example_models in test_large_graphs (e2d0ae4)

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

  • vis: Harden ELK heap scaling and fix flaky signal safety test (41b9f89)

  • Bump ELK Node.js heap scaling from 8x to 16x JSON size to prevent OOM on 250k+ node graphs - Mark 100k node tests as @rare (too slow for regular runs) - Fix flaky TestSignalSafety: use setitimer(50ms) instead of alarm(1s), increase model iterations to 50k, skip if alarm doesn't fire

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

v0.16.0 (2026-03-07)

Bug Fixes

  • validation: Batch of bug fixes for edge-case models (dca5a7e)

  • Fix vmap/functorch compatibility: skip logging inside functorch transforms to avoid missing batching rules (torch_funcs.py) - Fix tensor_nanequal infinite recursion: wrap decorated tensor ops (.isinf, .resolve_conj, etc.) in pause_logging() (tensor_utils.py) - Fix perturbation for range-restricted functions: use uniform random within original value range instead of scaled normal (core.py) - Fix atomic_bool_val crash inside vmap context (output_tensors.py) - Fix output node initialized_inside_model flag (graph_traversal.py) - Add scatter_ full-overwrite exemption (exemptions.py) - Add max/min indices exemption for integer dtype outputs - Add bernoulli scalar-p exemption, constant-output exemption - Add non-perturbed parent special-value check for nested args (einsum) - Fix buffer_xrefs invariant: accept ancestor module matches - Fix real-world model configs: CvT, CLAP, EnCodec, SpeechT5, Informer, Autoformer, MobileBERT kwarg names

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

  • vis: Fix ELK rendering and scale for large graphs (ea96a85)

  • Fix neato -n position format: use points (not inches) — fixes empty nodes, missing edges, and invisible labels in ELK output - Auto-scale Node.js heap and stack with graph JSON size - Auto-scale ELK and neato timeouts with node count - Use straight-line edges for graphs > 1k nodes (spline routing is O(n^2)) - Warn users to use SVG format for graphs > 25k nodes (PDF renders empty) - Add dot-vs-ELK aesthetic comparison tests at 15/100/500/1k/3k nodes - Add 1M node test (rare marker) for trophy-file rendering

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

Features

  • vis: Add ELK layout engine for large graph visualization (5245872)

  • New elk_layout.py: ELK-based node placement via Node.js/elkjs subprocess, with graceful fallback to sfdp when unavailable - New vis_node_placement parameter ("auto"/"dot"/"elk"/"sfdp") threaded through show_model_graph, log_forward_pass, and render_graph - Auto mode uses dot for <3500 nodes, ELK (or sfdp fallback) for larger - RandomGraphModel in example_models.py: seeded random model generator with calibrated node counts (within ~5% of target up to 100k+) - 39 tests in test_large_graphs.py: node count accuracy, validation, engine selection, ELK utilities, rendering at 3k-100k scales, dot threshold benchmark - Increased Node.js stack size (--stack-size=65536) to handle graphs up to 250k+ nodes without stack overflow

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

  • vis: Add hierarchical module grouping to ELK layout (bcc1ad2)

  • New build_elk_graph_hierarchical(): builds nested ELK compound nodes from module containment structure (containing_modules_origin_nested) - ELK's "INCLUDE_CHILDREN" hierarchy handling preserves module grouping in the layout — nodes within the same module cluster together - inject_elk_positions() now recurses into compound nodes, accumulating absolute positions from nested ELK coordinates - render_with_elk() passes entries_to_plot for hierarchical layout, falls back to flat DOT parsing when entries not available - Tests for hierarchical graph building and nested position injection

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

Testing

  • vis: Add rare marker for 250k-node tests, fill test coverage gaps (c749b94)

  • New pytest marker "rare": always excluded by default via addopts, run explicitly with pytest -m rare - Add 250k node count, validation, and ELK render tests (marked rare) - Fill gaps: validation tests at 5k/10k/20k/50k/100k, ELK render tests at 5k/20k, node count test at 20k

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

v0.15.15 (2026-03-07)

Bug Fixes

  • gc: Release ParamLog._param_ref on cleanup, add GC test suite (#GC-1, #GC-12) (83e1bd2)

  • Add ParamLog.release_param_ref() to cache grad metadata then null _param_ref - cleanup() now nulls all _param_ref before clearing entries - Add ModelLog.release_param_refs() public API for early param release - Add _param_logs_by_module to cleanup's internal containers list - New test_gc.py with 10 tests covering ModelLog/param GC, memory growth, save_new_activations stability, and transient data clearing

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

Documentation

  • Move RESULTS.md to repo root for visibility (2e814dd)

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

  • tests: Add public test results summary (fd7b33f)

Committed tests/RESULTS.md with suite overview, model compatibility matrix (121 toy + 85 real-world), profiling baselines, and pointers to generated reports. Transparent scoreboard for the repo.

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

Testing

  • models: Add autoencoders, state space models, and architecture coverage (716cc9d)

Toy models (18 new in example_models.py): - Autoencoders: VanillaAutoencoder, ConvAutoencoder, SparseAutoencoder, DenoisingAutoencoder, VQVAE, BetaVAE, ConditionalVAE - State space: SimpleSSM, SelectiveSSM (Mamba-style), GatedSSMBlock, StackedSSM - Additional: SiameseNetwork, MLPMixer, SimpleGCN, SimpleGAT, SimpleDiffusion, SimpleNormalizingFlow, CapsuleNetwork

Real-world models (5 new in test_real_world_models.py): - SSMs: Mamba, Mamba-2, RWKV, Falcon-Mamba (via transformers) - Autoencoders: ViT-MAE ForPreTraining (via transformers)

All 22 new tests pass. Updated RESULTS.md to reflect 736 total tests, 139 toy models, 92 real-world models.

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

  • models: Close remaining stress-test gaps — MAML, NeRF, RecurrentGemma, VOLO (a1f2254)

Toy models (+2): - MAMLInnerLoop: higher-order gradients (torch.autograd.grad inside forward) - TinyNeRF: differentiable volumetric rendering (ray marching + alpha compositing)

Real-world models (+2): - RecurrentGemma: Griffin architecture (linear recurrence + local attention hybrid) - VOLO: outlooker attention (distinct from standard self-attention)

Closes 37/38 stress-test patterns from taxonomy. Only remaining gap is test-time training (TTT layers) which requires gradient computation within inference — fundamentally incompatible with TorchLens forward-pass logging.

Total: 249 toy models, 185 real-world models, 892 tests.

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

  • models: Exhaustive architecture coverage across 30+ categories (8411a72)

Add 32 new toy models (Groups M-R) covering distinct computational patterns: - Group M: Attention variants (MQA, GQA, RoPE, ALiBi, slot, cross-attention) - Group N: Gating & skip patterns (highway, SE, depthwise-sep, inverted-residual, FPN) - Group O: Generative & self-supervised (hierarchical VAE, gated conv, masked conv, SimCLR, stop-gradient/BYOL, AdaIN) - Group P: Exotic architectures (hypernetwork, DEQ, neural ODE, NTM memory, SwiGLU) - Group Q: Graph neural networks (GraphSAGE, GIN, EdgeConv, graph transformer) - Group R: Additional patterns (MoE, spatial transformer, dueling DQN, RMS norm, sparse pruning, Fourier mixing)

Add 37 new real-world model tests: - Decoder-only LLMs: LLaMA, Mistral, Phi, Gemma, Qwen2, Falcon, BLOOM, OPT - Encoder-only: ALBERT, DeBERTa-v2, XLM-RoBERTa - Encoder-decoder: Pegasus, LED - Efficient transformers: FNet, Nystromformer, BigBird - MoE: Mixtral, Switch Transformer - Vision transformers: DeiT, CvT, SegFormer - Detection: DETR, Mask R-CNN (train+eval) - Perceiver IO, PatchTST, Decision Transformer - timm: HRNet, EfficientNetV2, LeViT, CrossViT, PVT-v2, Twins-SVT, FocalNet - GNN (PyG): GraphSAGE, GIN, Graph Transformer

Total: 805 tests, 213 toy models, 129 real-world tests.

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

  • models: Exhaustive coverage expansion — 20 toy + 33 real-world architectures (4f4e7ae)

Toy models (+20): GRU, NiN, ChannelShuffle, PixelShuffle, PartialConv, FiLM, CoordinateAttention, DifferentialAttention, RelativePositionAttention, EarlyExit, MultiScaleParallel, GumbelVQ, EndToEndMemoryNetwork, RBFNetwork, SIREN, MultiTask, WideAndDeep, ChebGCN, PrototypicalNetwork, ECA.

Real-world models (+33): GPT-J, GPTBigCode, GPT-NeoX, FunnelTransformer, CANINE, MobileBERT, mBART, ProphetNet, WavLM, Data2VecAudio, UniSpeech, ConvNeXt-v2, NFNet, DaViT, CoAtNet, RepVGG, ReXNet, PiT, Visformer, GC-ViT, EfficientFormer, FastViT, NesT, Sequencer2D, TResNet, SigLIP, BLIP-2, Deformable DETR, LayoutLM, TimeSeriesTransformer, ChebConv, SGConv, TAGConv.

Total: 241 toy models, 183 real-world models, 882 tests. RESULTS.md updated with all new entries and pattern coverage table.

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

  • models: Final coverage pass — 6 novel computational patterns (e33c71a)

New toy models targeting genuinely missing graph patterns: - LinearAttentionModel: kernel-based phi(Q)(phi(K)^T V), no softmax - SimpleFNO: FFT -> learned spectral weights -> iFFT (Fourier Neural Operator) - PerceiverModel: cross-attention to fixed learned latent bottleneck - ASPPModel: multi-rate parallel dilated convolutions (DeepLab ASPP) - ControlNetModel: parallel encoder copy + zero-conv injection - SimpleEGNN: E(n) equivariant message passing with coordinate updates

Total: 247 toy models, 183 real-world models, 888 tests. RESULTS.md updated with new patterns and counts.

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

  • models: Gap-fill 8 toy models + 21 real-world models for exhaustive coverage (7d5f879)

Toy models (8 new, 221 total): - LeNet5, BiLSTM, Seq2SeqWithAttention, TripletNetwork - BarlowTwinsModel, DeepCrossNetwork, AxialAttentionModel, CBAMBlock

Real-world models (21 new, 150 total): - TorchVision: MobileNetV3, Keypoint R-CNN (train+eval) - timm: Res2Net, gMLP, ResMLP, EVA-02 - HF decoder-only: OLMo - HF vision: DINOv2 - HF efficient: Longformer, Reformer - HF audio: AST, CLAP, EnCodec, SEW, SpeechT5, VITS - HF time series: Informer, Autoformer - PyG GNN: GATv2, R-GCN

834 total tests, 221 toy models, 150 real-world tests.

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

v0.15.14 (2026-03-07)

Performance Improvements

  • capture: Lazy _fsize_nice properties, remove _trim_and_reorder, batch pause_logging, drop deepcopy (986bd91)

Four low-risk optimizations targeting remaining allocation pressure and per-operation overhead in the instrumentation path:

  1. Lazy _fsize_nice properties — tensor_fsize_nice, grad_fsize_nice, parent_params_fsize_nice, total_params_fsize_nice, params_fsize_nice, and fsize_nice converted from eagerly computed strings to @property methods. Eliminates ~2700 human_readable_size() calls per Swin-T pass.

  2. Remove _trim_and_reorder from postprocess — the OrderedDict rebuild of every LayerPassLog's dict (685 calls, ~0.04s on Swin-T) is purely cosmetic. Python 3.7+ dicts maintain insertion order. Function definition kept for opt-in use.

  3. Batch pause_logging for tensor memory — inline nelement() * element_size() at the two hottest call sites (_build_param_fields, _log_output_tensor_info) inside a single pause_logging() context. Eliminates per-call context manager overhead (~1088 calls).

  4. Remove activation_postfunc deepcopy — copy.deepcopy() on a callable is unnecessary; callables are effectively immutable.

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

v0.15.13 (2026-03-07)

Performance Improvements

  • capture: Speed-optimized defaults and remaining bottleneck elimination (#110, 7c9a00c)

Seven targeted optimizations that reduce Swin-T log_forward_pass from 5.91s to 1.55s (3.8x):

  1. Unified save_source_context flag (was save_call_stacks) — controls both per-function call stacks AND module source/signature fetching. Default: False. 2. save_rng_states=False default — skips per-op RNG state capture. Auto-enabled by validate_forward_pass. Uses torch_only=True when enabled (skips Python/NumPy RNG). 3. Inline isinstance in wrapped_func — _collect_tensor_args() and _collect_output_tensors() replace BFS crawls for flat arg/output cases. Falls back to BFS only for nested containers. 4. dict scan for buffer prep/cleanup — replaces iter_accessible_attributes (dir() + MRO walk) with direct dict iteration. 10x faster for buffer tagging and tensor cleanup. 5. Hoisted warnings.catch_warnings() — moved from per-attribute (46K entries) to caller level. 6. Lazy module metadata — _get_class_metadata skips inspect.getsourcelines/inspect.signature when save_source_context=False. Only captures class name and docstrings. 7. Module-level import weakref — moved from per-call in _trim_and_reorder to module level.

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

v0.15.12 (2026-03-06)

Performance Improvements

  • capture: O(1) tensor/param extraction via per-function ArgSpec lookup table (b1d6c56)

Replace expensive 3-level BFS crawl (~1.44s, 39% self-time, ~1.9M getattr calls) with O(1) position-based lookups using a static ArgSpec table of 350+ entries. Three-tier strategy: static table for known torch functions, dynamic cache for user-defined modules, BFS fallback (fires at most once per unique class).

Also hoists warnings.catch_warnings() from per-attribute (~77K entries) to per-call level, and adds usage stats collection + coverage test infrastructure.

Benchmark: Swin-T log_forward_pass 5.91s → 4.41s (-25%).

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

v0.15.11 (2026-03-06)

Bug Fixes

  • gc: Convert back-references to weakrefs and add optional call stack collection (2bed167)

Convert 5 circular back-references from strong to weakref.ref() so ModelLog and its children (LayerPassLog, LayerLog, ModuleLog, BufferAccessor, LayerAccessor) no longer prevent timely garbage collection. GPU tensors are now freed immediately when the last strong reference to ModelLog is dropped, instead of waiting for Python's gen-2 GC cycle.

Also add save_call_stacks parameter to log_forward_pass() (default True). When False, skips _get_func_call_stack() on every tensor operation, which is the main per-op overhead in production use. Call stacks remain on by default for pedagogical use.

Fixes: GC-2, GC-3, GC-4, PERF-19

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

Documentation

  • Add folder-wise CLAUDE.md files and comprehensive inline documentation (deaec2d)

  • Add CLAUDE.md to every package directory (torchlens/, capture/, data_classes/, decoration/, postprocess/, utils/, validation/, visualization/, tests/, scripts/, .github/) with file maps, key concepts, gotchas, and cross-references - Add module-level docstrings, function/class docstrings, and inline comments across all 39 source files explaining non-obvious logic, ordering dependencies, design decisions, and invariants - Fix coverage HTML output directory in pyproject.toml to point to tests/test_outputs/reports/coverage_html (matching conftest.py)

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

v0.15.10 (2026-03-05)

Bug Fixes

  • tests: Move coverage HTML output to reports directory (916b2d1)

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

v0.15.9 (2026-03-05)

Bug Fixes

  • tests: Use render_graph return value instead of reading cleaned-up .gv file (7998776)

Commit 147c7b7 added cleanup=True to dot.render(), which deletes the intermediate .gv source file after rendering. The TestVisualizationParams tests were reading that source file and all 15 failed with FileNotFoundError.

render_graph now returns dot.source so tests can inspect the graphviz source without depending on the intermediate file.

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

v0.15.8 (2026-03-05)

Bug Fixes

  • tests: Generate HTML coverage report in sessionfinish hook (29095f9)

The pyproject.toml configured coverage_html output directory but the pytest_sessionfinish hook only generated the text report. Add cov.html_report() call so HTML reports are written alongside the text summary.

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

v0.15.7 (2026-03-04)

Bug Fixes

  • vis: Clean up intermediate .gv source files after rendering (147c7b7)

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

Chores

  • tests: Add @pytest.mark.smoke to 18 critical-path tests (d26f4ec)

18 fast tests across 9 files marked as smoke tests for quick validation during development. Run with pytest tests/ -m smoke (~6s).

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

v0.15.6 (2026-03-04)

Bug Fixes

  • tests: Add explicit vis_outpath to render_graph calls (7d7926b)

Two tests in TestVisualizationBugfixes called render_graph() without vis_outpath, causing stray modelgraph/modelgraph.pdf files in whatever directory tests were run from.

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

Chores

  • tests: Rename test_outputs/graphs to test_outputs/visualizations (fbf1fe6)

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

v0.15.5 (2026-03-04)

Bug Fixes

  • loop-detection: Group param-sharing ops by func_applied_name + param_barcodes (f1e795b)

Previously _merge_iso_groups_to_layers Pass 2 grouped operations by parent_param_barcodes alone, causing different ops on the same params (e.g. getitem vs add) to be incorrectly merged into one layer. Now groups by (func_applied_name, sorted(parent_param_barcodes)). Also updates docstrings to reflect the correct grouping rule.

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

Chores

  • tests: Tidy test_outputs structure and remove private bug numbers (03817a2)

  • Restructure test_outputs/ into reports/ and graphs/ subdirectories - Update all path references in conftest, test_output_aesthetics, test_profiling, and ui_sandbox notebook - Strip private bug number references (Bug #N, #N:) from comments, docstrings, and class names across 7 test files - Rename TestBug* classes to descriptive names

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

v0.15.4 (2026-03-04)

Bug Fixes

  • validation: Resolve 14 test failures across validation, labeling, and warnings (ffc51a5)

  • Fix param sharing invariant: include func_applied_name in grouping key so different operations consuming the same parameter aren't incorrectly required to share labels (12 model failures: vit, beit, cait, coat, convit, poolformer, xcit, t5, clip, blip, etc.) - Fix unused input validation: guard against func_applied=None for unused model inputs like DistilBert's token_type_ids (1 failure) - Fix pass labeling consistency: inherit layer_type from first pass for pass>1 entries, preventing label mismatches within same_layer_operations groups (SSD300 train failure) - Fix numpy 2.0 deprecation: use cpu_data.detach().numpy() instead of np.array(cpu_data) - Suppress 6 external third-party warnings in pyproject.toml filterwarnings - Add 4 regression tests with 2 helper model classes

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

v0.15.3 (2026-03-04)

Bug Fixes

  • types: Move type: ignore to correct lines after ruff reformat (8205ca4)

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

  • types: Resolve all 181 mypy errors across 18 source files (546fd94)

  • Auto-fix implicit Optional with no_implicit_optional tool (24 errors) - Add type annotations for untyped variables (20 errors) - Add type: ignore[attr-defined] for dynamic tl_* attributes (24 errors) - Add type: ignore[assignment] for fields_dict and cross-type assignments (79 errors) - Add type: ignore[union-attr] for ModuleLog|ModulePassLog unions (19 errors) - Add type: ignore[arg-type] for acceptable type mismatches (19 errors) - Fix callable type annotation in exemptions.py (Callable[..., bool]) - Fix found_ids parameter type in introspection.py (List -> Set)

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

  • types: Suppress torch.device return-value mismatch in CI mypy (11922b3)

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

Chores

  • ci: Remove continue-on-error from mypy check — now a blocking gate (0e76db0)

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

v0.15.2 (2026-03-04)

Bug Fixes

  • core: Resolve remaining open bugs — FLOPs accuracy, PERF-21, mypy types (#19, #154-161) (da02dc5)

Bug #19: Verified fast-pass buffer orphan resolved by Bug #116 guard; added regression tests FLOPs: Fix addbmm/baddbmm batch dimension, add einsum handler, fix pool kernel_size extraction

PERF-21: Cache get_ignored_functions()/get_testing_overrides() in _get_torch_overridable_functions

Bug #154: Type _SENTINEL as Any in func_call_location.py Bug #155: Eliminate reused variable across str/bytes types in hashing.py Bug #156: Accept float in human_readable_size signature Bug #157: Use setattr/getattr for dynamic tl_tensor_label_raw in safe_copy Bug #158: Add List[Any] annotation to AutocastRestore._contexts Bug #160: Add _FlopsHandler type alias and typed SPECIALTY_HANDLERS dict Bug #161: Add ModuleLog TYPE_CHECKING import and type annotations in invariants.py

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

Chores

  • ci: Switch CI from pip to uv for faster installs (d619ef2)

Replace pip with uv pip in quality and lint workflows. uv resolves and installs packages 10-20x faster. Also adds explicit torch CPU index-url to dep-audit job (was missing, pulling full CUDA bundle).

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

v0.15.1 (2026-03-04)

Bug Fixes

  • core: Resolve 52 bugs across 9 waves of correctness, safety, and polish fixes (d4ca75d)

Wave 1 — Critical Correctness: - Rewrite safe_copy() to eliminate numpy round-trip, fix bfloat16 overflow, handle sparse/meta/quantized dtypes (#103, #137, #128, #139) - Guard print_override numpy conversion (#140), meta/sparse tensor memory (#24) - Fix validation crash on None parent_values (#150), silent exception pass (#151) - Add buffer duplicate guard in fast path (#116) - Fix postprocess_fast shared reference and unguarded parent_layers (#8, #152) - Add empty graph guard (#153), zombie label cleanup (#75) - Handle defaultdict in _safe_copy_arg (#127)

Wave 2 — Exception Safety: - Make module_forward_decorator exception-safe (#122) - Guard _handle_module_entry for untracked tensors (#117) - Clean up output tensors on fast-pass exception (#110) - Wrap activation_postfunc in pause_logging (#96) - Guard validation with save_function_args=False (#131) - Fix CUDA perturbation device consistency (#36)

Wave 3 — Fast-Pass State Reset: - Reset timing and stale lookup caches in save_new_activations (#87, #97, #98) - Pass raw label (not final) to gradient hook in fast path (#86) - Raise descriptive error for unexpected missing parents (#111)

Wave 4 — Argument Handling: - Deep-copy nested tuples in creation_args (#44) - Slice before cloning in display helper (#73) - Use logged shape in display (#45)

Wave 5 — Interface Polish: - Key layer_num_passes by no-pass label (#53) - Use rsplit for colon-split safety (#54) - Add slice indexing support (#78) - Guard to_pandas before pass finished (#124) - List candidates for ambiguous substring (#125) - Add ModuleLog string indexing (#120) - Support int/short-name in ParamAccessor.contains (#84)

Wave 6 — Decoration/Model Prep: - Only add mapper entries if setattr succeeded (#31) - Add hasattr guard for identity asymmetry (#39) - Guard buffer setattr (#40) - Store dunder argnames stripped (#82) - Use inspect.Parameter.kind for varargs (#123)

Wave 7 — Visualization: - Fix vis_nesting_depth=0 crash (#94) - Use rsplit for colon-split (#104) - Fix collapsed node variable (#100) - Guard None tensor_shape (#118) - Guard LayerLog nesting depth (#138) - Use exact equality instead of substring match (#129)

Wave 8 — Control Flow/Loop Detection: - Remove dead sibling_layers iteration (#2) - Prevent shared neighbor double-add (#148)

Wave 9 — Cleanup + GC: - Lazy import IPython (#72) - Filter callables from _trim_and_reorder (#134)

  • Complete cleanup() with internal containers (GC-5, GC-12) - Use weakref in backward hook closure (GC-8) - Clear ModulePassLog forward_args/kwargs after build (GC-11)

585 tests passing across 10 test files, 47 new regression tests.

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

  • core: Resolve final remaining bugs #23, #83, #95, #99 (6302662)

  • #95: fix mixed LayerLog/LayerPassLog format in vis module containment check - #83: LayerLog.parent_layer_arg_locs returns strings (not sets) for consistency - #99: warn on tensor shape mismatch in fast-path source tensor logging - #23: add case-insensitive exact match and substring lookup for layers/modules - #85: confirmed not-a-bug (any special arg correctly explains output invariance) - #39, #41, #42: confirmed already fixed or correct as-is - Add 6 regression tests for the above fixes

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

  • core: Resolve remaining low-risk bugs #28, #107, #108, #147 (d8616a0)

  • #147: descriptive ValueError in log_source_tensor_fast on graph change - #108: document fast-path module log preservation (no rebuild needed) - #107: handle both tuple and string formats in module hierarchy info - #28: remove torch.Tensor from dead type-check list in introspection - Add 4 regression tests for the above fixes

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

Testing

  • reorganize: Redistribute bugfix tests into domain-specific test files (e9c64cc)

Move all 57 regression tests from test_bugfixes.py into their natural homes: - test_internals.py: safe_copy, tensor utils, exception safety, cleanup/GC - test_validation.py: validation correctness, posthoc perturb check - test_save_new_activations.py: fast-path state reset, graph consistency - test_layer_log.py: interface polish, case-insensitive lookup, arg locs - test_module_log.py: string indexing, tuple/string normalization - test_param_log.py: ParamAccessor contains, param ref cleanup - test_output_aesthetics.py: visualization smoke tests

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

v0.15.0 (2026-03-04)

Bug Fixes

  • validation: Resolve 8 test failures and Bug #79 in metadata invariants (14b2e5b)

Three root-cause fixes for failures exposed by the full test suite:

  1. Module hierarchy aliasing (finalization.py): shared nn.Module instances (same Conv2d registered as both self.proj1 and self.conv_list[0]) had metadata lookup fail because _prepare_model_once overwrites tl_module_address to the last-visited alias while _capture_module_metadata stores under the first-visited alias. Built alias→meta map and address_children prefix rewriting so all aliases resolve correctly.

  2. Loop detection Bug #79 (loop_detection.py): _merge_iso_groups_to_layers only compared pairs within iso-groups, missing Rule 1 (param sharing → same layer) across iso-groups. Rewrote with union-find + path compression and added cross-iso-group param barcode merge pass. Also unify operation_equivalence_type for merged nodes whose module path suffixes differ.

  3. Adjacency invariant (invariants.py): removed incorrect operation-level neighbor check — subgraph-level adjacency is verified during BFS in loop_detection.py, not reconstructable post-hoc. Non-param multi-pass groups can be the only multi-pass group (param-free loops). Upgraded param sharing violation back from warning to error now that Bug #79 is fixed.

Also: boolean flag fixes in control_flow.py and graph_traversal.py for buffer/output layers; @pytest.mark.slow on vit and beit tests; profiling test streamlined.

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

Chores

  • ci: Add mypy type checking and pip-audit dependency auditing (2a0e0e7)

Add quality.yml workflow with two jobs: - mypy (advisory, continue-on-error) — 199 existing errors to fix later - pip-audit (blocking) — fails on known CVEs in dependencies

Mypy configured leniently in pyproject.toml; both tools added to dev deps.

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

  • ci: Add pip caching to quality workflow (d7573b1)

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

  • coverage: Add pytest-cov with HTML and text report generation (c1ac2c9)

Configure branch coverage for torchlens/ source. HTML report writes to tests/test_outputs/coverage_html/, text summary to coverage_report.txt. Reports auto-generate when running pytest --cov via a sessionfinish hook.

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

  • dev: Add UI sandbox notebook and nbstripout for clean diffs (5247361)

Add tests/ui_sandbox.ipynb — interactive workbench for tinkering with torchlens during development: logging, accessors, reprs, visualization, validation. Set up nbstripout via .gitattributes to auto-strip cell outputs on commit.

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

  • profiling: Expand profiling report to 12 models across architecture families (60e983a)

Added 9 models to the profiling test for illustrative coverage: - SimpleBranching (diverging/merging flow) - ResidualBlock (conv-BN-relu skip connection) - MultiheadAttention (nn.MultiheadAttention)

  • LSTM (recurrent + classifier) - ResNet18, MobileNetV2, EfficientNet_B0 (real CNNs) - Swin_T (shifted-window vision transformer, 671 layers) - VGG16 (deep sequential CNN, 138M params)

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

Features

  • modules: Add shared module alias lookup and explicit alias fields (c2fb14e)

Shared nn.Module instances (same object registered under multiple addresses) can now be looked up by any alias:

ml.modules["shared_conv"] # primary address ml.modules["alias_list.0"] # alias — returns same ModuleLog

New fields: - ModuleLog.is_shared: bool, True when len(all_addresses) > 1 - ModulePassLog.all_module_addresses: list of all addresses for the module - ModulePassLog.is_shared_module: bool, same as ModuleLog.is_shared - ModuleAccessor builds alias→ModuleLog map for getitem/contains - ModuleLog.repr shows aliases when is_shared=True

Invariant checks (module_hierarchy) now account for shared modules where address_parent refers to the primary alias's parent path.

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

  • validation: Add complex semantic invariants M-R to check_metadata_invariants (b745db8)

Add 6 new invariant categories verifying loop detection, graph ordering, distance/reachability, connectivity, module containment, and lookup key consistency. Fix existing checks H and L for recurrent model compatibility. 17 new corruption tests (50 total in test_validation.py).

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

  • validation: Add validate_forward_pass with metadata invariant checks (8a25ede)

Rename validate_saved_activations → validate_forward_pass (old name kept as deprecated alias). Add comprehensive metadata invariant checker (check_metadata_invariants) covering 12 categories: ModelLog self-consistency, special layer lists, graph topology, LayerPassLog fields, recurrence, branching, LayerLog cross-refs, module-layer containment, module hierarchy, param cross-refs, buffer cross-refs, and equivalence symmetry.

validate_metadata=True by default, so every existing validation call now exercises the full invariant suite automatically.

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

Testing

  • profiling: Add automated profiling report for overhead measurement (a483011)

Profiles raw forward pass, log_forward_pass, save_new_activations, and validate_saved_activations across toy and real-world models. Generates tests/test_outputs/profiling_report.txt with absolute times and overhead ratios.

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

v0.14.0 (2026-03-04)

Features

  • core: Add buffer name/module_address properties to LayerLog (#92, e3a1ce2)

Ensures buffer_address, name, and module_address are accessible on both buffer LayerPassLogs (via BufferLog subclass) and buffer LayerLogs (via computed properties derived from buffer_address).

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

  • core: Add LayerAccessor and log.layers property (#92, b639e50)

Adds LayerAccessor (dict-like accessor for LayerLog objects) with support for label/index/pass-notation lookup and to_pandas(). Accessible via log.layers, mirroring log.modules for modules.

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

  • core: Add LayerLog aggregate class with per-layer metadata (#92, 9c6472d)

Introduces LayerLog as the aggregate per-layer object grouping one or more LayerPassLog entries (one per invocation). For non-recurrent models every LayerLog has exactly one pass; for recurrent models, multiple passes are grouped under a single LayerLog with union-based aggregate graph properties.

Key changes: - New LayerLog class with ~52 aggregate fields, single-pass delegation via @property and getattr, aggregate graph properties (child_layers, parent_layers unions) - _build_layer_logs() postprocessing step wired into both exhaustive and fast pipelines - ModelLog.layer_logs OrderedDict populated after postprocessing - getitem routing: log["label"] returns LayerLog for multi-pass layers - parent_layer_log back-reference on LayerPassLog - 27 new tests covering construction, delegation, multi-pass behavior, display, and integration

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

Refactoring

  • core: Modulelog.all_layers now stores unique no-pass labels (#92, 214e161)

Establishes aggregate↔aggregate symmetry: - ModuleLog.all_layers → no-pass labels → resolve to LayerLog - ModulePassLog.layers → pass-qualified labels → resolve to LayerPassLog

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

  • core: Remove buffer name/module_address from LayerLog (#92, f896dd4)

These properties are too generic on LayerLog (name returns "" for non-buffers). Keep them only on BufferLog(LayerPassLog); single-pass buffer LayerLogs still access them via getattr delegation.

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

  • core: Remove RolledTensorLog, use LayerLog for rolled visualization (#92, 118b3ec)

RolledTensorLog was a visualization-only aggregate that duplicated ~30 fields from LayerPassLog with its own merge logic. Now that LayerLog provides a proper aggregate abstraction, RolledTensorLog is unnecessary.

Key changes: - Delete RolledTensorLog class (~175 lines) and _roll_graph() function - Remove layer_list_rolled and layer_dict_rolled from ModelLog - Visualization "rolled" mode now uses self.layer_logs (LayerLog objects) - Add vis-specific computed properties to LayerLog: child_layers_per_pass, parent_layers_per_pass, child_passes_per_layer, parent_passes_per_layer, edges_vary_across_passes, bottom_level_submodule_passes_exited, parent_layer_arg_locs - LayerLog.child_layers/parent_layers always return no-pass labels - Merge multi-pass aggregate fields (has_input_ancestor, input_output_address, is_bottom_level_submodule_output) in _build_layer_logs - Update tests and aesthetic report to use LayerLog

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

  • core: Rename TensorLog → LayerPassLog + nomenclature sweep (d328476)

Phase 1 of LayerLog hierarchy redesign. Mechanical rename of the class, file (tensor_log.py → layer_pass_log.py), constant (TENSOR_LOG_FIELD_ORDER → LAYER_PASS_LOG_FIELD_ORDER), and all imports. Also sweeps internal nomenclature: field names (_raw_tensor_dict → _raw_layer_dict, etc.), function names (_make_tensor_log_entry → _make_layer_log_entry, etc.), and local variables (tensor_entry → layer_entry, etc.) that referred to log entries rather than actual tensors. Backward-compat aliases preserved.

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

  • core: Replace 17 flat module_* dicts on ModelLog with transient _module_build_data (ef3f20e)

These intermediate fields (module_addresses, module_types, module_passes, etc.) were only used during postprocessing to build structured ModuleLog objects, but persisted on the public API. Now consolidated into a single _module_build_data dict that is populated during logging, consumed by _build_module_logs (step 17), and then re-initialized. Reduces ModelLog from ~143 to ~127 public fields.

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

v0.13.1 (2026-03-03)

Bug Fixes

  • core: Handle DataParallel, device mismatch, and embedding perturbation (#91, 1a0f8b8)

  • Unwrap nn.DataParallel in log_forward_pass, show_model_graph, validate_saved_activations - Auto-move inputs to model device (supports dict, UserDict, BatchEncoding) - Exempt embedding index arg from perturbation (prevents CUDA OOB)

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

Refactoring

  • batch1: Aesthetic cleanup — names, types, docstrings for small files (52efdc1)

Files: _state.py, buffer_log.py, cleanup.py, postprocess/init.py, param_log.py, func_call_location.py

  • Add docstrings to dunder methods across BufferAccessor, ParamAccessor, ParamLog, FuncCallLocation
    • Rename _UNSET -> _SENTINEL in func_call_location.py - Fix single-letter vars in cleanup.py (x->label, tup->edge, k/v->descriptive) - Add return type hints (-> None) to all cleanup functions
    • Add docstring and return type to postprocess_fast() - Add property docstrings and setter type hints to ParamLog

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

  • batch2: Aesthetic cleanup — names, types, docstrings for medium files (8610c46)

Files: trace_model.py, control_flow.py, model_log.py, module_log.py, graph_traversal.py, finalization.py, interface.py, user_funcs.py

  • Add docstrings to _get_input_arg_names, _find_output_ancestors, _update_node_distance_vals, _log_internally_terminated_tensor, _log_time_elapsed, _single_pass_or_error - Fix truncated docstring on _flood_graph_from_input_or_output_nodes - Add return type hints to all private functions - Rename loop vars: a->arg_idx, t->tensor, pn->pass_num, mpl->module_pass_log, cc->child_pass_label, sp->child_pass_label - Rename pretty_print_list_w_line_breaks -> _format_list_with_line_breaks - Rename _module_hierarchy_str_helper -> module_hierarchy_str_recursive - Mark run_model_and_save_specified_activations as private () - Rename x->input_data in validate_batch_of_models_and_inputs - Add docstrings to all dunder methods on ModuleLog, ModulePassLog, ModuleAccessor

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

  • batch3: Aesthetic cleanup — names, types, docstrings for medium-large files (74c0850)

Files: decorate_torch.py, labeling.py, constants.py, validation/core.py, loop_detection.py

  • Rename open_ind/close_ind -> paren_start/paren_end in decorate_torch.py - Rename namespace_name_notorch -> namespace_key - Rename my_get_overridable_functions -> _get_torch_overridable_functions - Expand shadow set abbrevs: _ml_seen -> _module_labels_seen, etc. - Rename p_label -> perturbed_label in validation/core.py - Parameterize bare Dict type hints in loop_detection.py - Expand single-letter vars: m->module_index, n->pass_index, s->subgraph_key
    • Add docstrings to _merge_iso_groups_to_layers, _validate_layer_against_arg, _copy_validation_args, _get_torch_overridable_functions - Add return type hints to all private functions

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

  • batch4: Aesthetic cleanup — names, types, docstrings for large files (ce19825)

tensor_log.py: add docstrings to _str_during_pass, _str_after_pass, _tensor_family_str_helper model_funcs.py: rename log_whether_exited_submodule_is_bottom_level → _is_bottom_level_submodule_exit, fwd → forward_func, mid → module_id helper_funcs.py: rename make_var_iterable → ensure_iterable, tuple_tolerant_assign → assign_to_sequence_or_dict, remove_attributes_starting_with_str → remove_attributes_with_prefix flops.py: add type hints to all 25 private functions, add Args sections vis.py: rename check_whether_to_mark* → should_mark, constructbuild*, fix _setup_subgraphs consistency logging_funcs.py: rename _log_info_specific_to_single_function_output_tensor → _log_output_tensor_info, _get_parent_tensor_function_call_location → _locate_parent_tensors_in_args

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

  • design: Algorithmic review pass — efficiency, correctness, robustness (78924f8)

Efficiency (21 fixes): - Hot path: eliminate all_args allocation via kwargs short-circuit - Barcode gen: secrets.choice → random.choices with pre-computed alphabet - Type scanning: any([...]) → any(...) generator, found_ids list → set - Model prep: recursive get_all_submodules → list(model.modules()) - Source/output tensors: eliminate double get_tensor_memory_amount() calls - Output tensors: copy only mutables instead of full 131-key dict copy - Gradient hooks: O(N²) per-hook rebuild → _saved_gradients_set O(1) dedup - Cleanup: dir() → list(dict), skip O(n²) reference removal in full teardown - Labeling: dir(self) → self.dict, inds_to_remove list → set - Finalization: pre-computed reverse mappings, _roll_graph idempotency guard - Graph traversal/validation: min([a,b]) → min(a,b), list.pop(0) → deque.popleft() - Visualization: any([...]) → any(...), min([...]) → min(...) - TensorLog: frozenset for FIELD_ORDER, view-then-clone-slice for display - Display: np.round → round(), removed numpy dependency

Correctness (4 FLOPs fixes): - SDPA: new _sdpa_flops handler (Q@K^T + softmax + attn@V) - addbmm/baddbmm: _matmul_flops → _addmm_flops (correct shape extraction) - MHA: return None to avoid double-counting with sub-op FLOPs - scatter/index ops: moved from ZERO_FLOPS to ELEMENTWISE_FLOPS

Robustness (4 fixes): - Buffer dedup: for...else pattern fixes incorrect inner-loop append - Parameter fingerprinting: reorder isinstance checks (Parameter before Tensor) - Duplicate siblings: added not-in guards before sibling append - _roll_graph: early-return guard if already populated

Test coverage: - New tests/test_internals.py: 13 tests for FIELD_ORDER sync and constants

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

  • design: Mid-level design pass — dataclasses, decomposition, parameter bundles (0849edb)

Replace implicit parameter clusters with explicit dataclasses (FuncExecutionContext, VisualizationOverrides, IsomorphicExpansionState, ModuleParamInfo), decompose 100+ line functions into coordinator + helpers, and extract shared patterns (model traversal visitor, buffer tagging).

No public API changes. All 471 non-slow tests pass.

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

  • qa: Address naive reader findings — docstrings, comments, variable names (22724af)

Fix 14 issues flagged by whole-codebase naive reader review: - make_random_barcode: fix misleading "integer hash" docstring - _getitem_after_pass: replace TODO-like docstring with proper docs - _is_bottom_level_submodule_exit: define "bottom-level" in docstring - search_stack_for_vars_of_type: rename tensor-specific accumulators to generic names - validate_parents_of_saved_layer: fill in empty param docs, note mutation - _append_arg_hash: explain operation_equivalence_type concept - nested_assign: add docstring and type hints - _capture_module_metadata: fix misleading wording - extend_search_stack_from_item: document missing address/address_full params - safe_copy: add bfloat16 round-trip comment - _roll_graph: define "rolled" in docstring - vis.py: add docstrings to _get_node_bg_color, _make_node_label, _get_max_nesting_depth - _give_user_feedback_about_lookup_key: document mode parameter - _get_torch_overridable_functions: comment the ignore flag - RolledTensorLog.update_data: replace vacuous docstring

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

  • structure: Reorganize into focused subpackages (2cfa90a)

Split grab-bag root modules into focused subpackages without changing any public API, class names, or core algorithmic logic.

  • Split helper_funcs.py into 7 utils/ modules (rng, tensor_utils, arg_handling, introspection, collections, hashing, display) - Moved cleanup.py + interface.py into data_classes/ - Created decoration/ from decorate_torch.py + model_funcs.py - Split logging_funcs.py into capture/ (source_tensors, output_tensors, tensor_tracking); moved trace_model.py + flops.py into capture/ - Created visualization/ from vis.py - Replaced hardcoded _TORCHLENS_SUFFIXES with directory-based stack filtering - Added module-level docstrings to all 27 files that lacked them - Decomposed 3 oversized functions into named helpers - 9 root files deleted, 22 new files created across subpackages

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

v0.13.0 (2026-03-03)

Bug Fixes

  • logging: Prevent label-steal on no-op same-object returns (c8ec165)

Functions like to(same_dtype) and contiguous() on already-contiguous tensors return the same Python object without modifying it. TorchLens was treating these as in-place ops and propagating the new label back to the original tensor, breaking module thread cleanup in T5 and other encoder-decoder models with pre-norm residual patterns.

Fix 1 (decorate_torch.py): Split was_inplace into same_object_returned (always safe_copy) vs true in-place (func ends with _ or starts with __i) — only true in-place ops propagate the label back.

Fix 2 (model_funcs.py): Module post-hook thread cleanup now uses labels captured at pre-hook time rather than reading from live tensors that may have been overwritten by true in-place ops mid-forward.

Also removes the cycle-protection band-aid in vis.py (visited sets in _get_max_nesting_depth and _set_up_subgraphs) since the root cause of module_pass_children cycles is now fixed.

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

  • multi: Resolve 16 test failures across suite (6fcf72f)

  • Add soxr to test deps (transformers 5.2 unconditionally imports it) - Fix test_no_grad_block and test_model_calling_model shape mismatch (use inline torch.rand(2, 32) instead of small_input fixture) - Fix _get_input_arg_names for *args signatures (generate synthetic names) - Add empty tensor perturbation exemption in validation core - Add topk/sort integer indices posthoc exemption

    • Add type_as structural arg position exemption - Fix _append_arg_hash infinite recursion (tensor formatting triggers wrapped methods; use isinstance check + shape-only representation) - Fix TorchFunctionMode/DeviceContext bypass in decorated wrappers (Python wrappers skip C-level dispatch, so torch.device('meta') context wasn't injecting device kwargs into factory functions, causing corrupt buffers during transformers from_pretrained) - Rewrite test_autocast_mid_forward to skip validation (autocast context not captured during logging)

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

  • validation: Exempt setitem when perturbing all-zeros/all-ones destination (a3a24ad)

BART creates position embeddings by new_zeros() then setitem to fill. Perturbing the all-zeros destination has no effect since setitem overwrites it. Add Case 3 to _check_setitem_exempt: when the perturbed tensor is the destination (args[0]) and it's a special value (all-zeros/all-ones), exempt.

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

  • vis: Add cycle protection to _get_max_nesting_depth and subgraph setup (8779af9)

call_children can contain cycles in models like T5 where residual connections cross module boundaries (e.g. layer_norm -> next block). _get_max_nesting_depth looped infinitely, causing

600s timeout.

Also: - Add visited set to _set_up_subgraphs while-loop for same reason - Add min with multiple args to posthoc exemption (same as max) - Remove @pytest.mark.slow from test_t5_small (now 15s, was infinite)

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

Features

  • validation: Capture and restore autocast context during logging and replay (19bbcb6)

Autocast state (enabled/dtype per device) is now captured alongside RNG state when each function executes during logging. During validation replay, the saved autocast context is restored so operations run at the same precision as the original forward pass.

  • Add log_current_autocast_state() and AutocastRestore context manager - Thread func_autocast_state through exhaustive and fast logging paths - Wrap validation replay in AutocastRestore - Restore validate_saved_activations in test_autocast_mid_forward

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

Refactoring

  • validation: Restructure into subpackage with data-driven exemption registries (ff85efd)

Replace the monolithic validation.py (814 lines) with a clean torchlens/validation/ subpackage. The 141-line elif chain and three separate exemption mechanisms are now consolidated into four declarative registries in exemptions.py (SKIP_VALIDATION_ENTIRELY, SKIP_PERTURBATION_ENTIRELY, STRUCTURAL_ARG_POSITIONS, CUSTOM_EXEMPTION_CHECKS), with posthoc checks kept as belt-and-suspenders.

Secondary fixes: - mean==0/mean==1 heuristic replaced with exact torch.all() checks - Bare except Exception now logs error details when verbose=True - _copy_validation_args uses recursive _deep_clone_tensors helper - While-loop perturbation bounded by MAX_PERTURB_ATTEMPTS=100

Adds 23 new tests covering imports, registry consistency, perturbation unit tests, deep clone helpers, and integration tests through specific exemption paths. All 399 existing tests pass with zero regressions.

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

Testing

  • suite: Exhaustive test suite expansion (~559 tests) (b489120)

Add ~76 new tests covering major gaps in architecture and edge-case coverage: - 64 new toy model tests (attention, transformers, containers, conditionals, normalization variants, residual/param sharing, in-place/type ops, scalar/ broadcasting, exemption stress tests, architecture patterns, adversarial edge cases) - 12 new real-world model tests (DistilBERT, ELECTRA, MobileViT, MoE, T5, BART, RoBERTa, BLIP, Whisper, ViT-MAE, Conformer, SentenceTransformer)

Merge test_real_world_models_slow.py into test_real_world_models.py organized by architecture/modality, using @pytest.mark.slow for heavy tests. Register the slow marker in pyproject.toml.

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

v0.12.0 (2026-03-02)

Features

  • data_classes: Add BufferLog subclass and BufferAccessor for first-class buffer support (5a9b4c6)

Buffers now get their own dedicated class (BufferLog) that subclasses TensorLog, giving them focused repr, convenience properties (name, module_address), and ergonomic access via BufferAccessor on both ModelLog and ModuleLog.

  • BufferLog(TensorLog) subclass with clean repr and computed properties - BufferAccessor with indexing by address, short name, or ordinal position - Scoped mh.modules["addr"].buffers for per-module buffer access - Fix vis.py type(node) == TensorLog checks to use isinstance (supports subclasses) - Fix TensorLog.copy() to preserve subclass type via type(self)(fields_dict) - Fix cleanup.py property-before-callable check order to prevent AttributeError - Add buffer repr sections to aesthetic test reports (text + PDF)

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

v0.11.2 (2026-03-02)

Performance Improvements

  • postprocess: Optimize 6 remaining hot-path bottlenecks in log_forward_pass (ed62350)

Cache dir() per type, replace sorted deque with heapq in loop detection, use dict + empty-field skip in label renaming, two-phase stack capture, batch orphan removal, and shadow sets for module hierarchy membership checks. 8-33% wall time reduction across models.

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

v0.11.1 (2026-03-02)

Performance Improvements

  • logging: Optimize 5 hot-path bottlenecks in log_forward_pass (d4a5d3a)

  • Cache torch.cuda.is_available() (called per-op, ~2ms each) - Replace tensor CPU copy + numpy + getsizeof with nelement * element_size - Hoist warnings.catch_warnings() out of per-attribute loop

    • Cache class-level module metadata (inspect.getsourcelines, signature) - Replace inspect.stack()/getframeinfo() with sys._getframe() chain; defer source context + signature loading in FuncCallLocation to first property access via linecache

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

Testing

  • decoration: Comprehensive tests for permanent decoration architecture (f6305c8)

61 tests across 14 test classes covering: - Toggle state (on/off/exception/KeyboardInterrupt recovery) - Passthrough behavior when logging disabled - Detached import patching (module-level, class attrs, lists, dicts, late imports) - Permanent model preparation (WeakSet caching, session cleanup) - pause_logging context manager (nesting, exception safety) - Wrapper transparency (functools.wraps, name, doc) - torch.identity installation and graph presence - JIT builtin table registration and shared-original wrapper reuse - Decoration consistency (bidirectional mapper, idempotency) - In-place ops, property descriptors, edge cases - Signal safety (SIGALRM during forward) - Session isolation (no cross-session leakage)

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

v0.11.0 (2026-03-02)

Features

  • core: Permanent toggle-gated decoration of torch functions (cbeaa83)

Replace the per-call decorate/undecorate cycle (~2000 torch functions) with one-time permanent decoration at import time. A single boolean toggle (_state._logging_enabled) gates whether wrappers log or pass through, eliminating repeated overhead and fixing detached import blindness (e.g. from torch import cos).

Key changes: - New _state.py: global toggle, active_logging/pause_logging context managers - decorate_all_once(): one-time decoration with JIT builtin table registration - patch_detached_references(): sys.modules crawl for detached imports - Split model prep: _prepare_model_once (WeakSet) + prepare_model_session - Delete clean* escape hatches; safe_copy/safe_to use pause_logging() - Fix tensor_log.py activation_postfunc exception-safety bug (#89) - Fix GC issues: closures no longer capture ModelLog (GC-6, GC-7)

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

v0.10.1 (2026-03-02)

Bug Fixes

  • core: Fix one-liners and small guard additions across codebase (3d1591b)

  • Remove duplicate "contiguous" from ZERO_FLOPS_OPS (flops.py) - Fix strip("") → rstrip("") to avoid stripping leading chars (tensor_log.py) - Fix return type annotation to include str return (trace_model.py) - Replace isinstance(attr, Callable) with callable(attr) (model_funcs.py) - Remove dead pass_num=1 overwrite of parameterized value (logging_funcs.py) - Fix parent_params=None → [] for consistency (finalization.py) - Add int key guard in _getitem_after_pass (interface.py) - Add max(0, ...) guard for empty model module count (interface.py) - Add empty list guard in _get_lookup_help_str (interface.py) - Add explicit KeyError raise for failed lookups (interface.py) - Remove dead unreachable module address check (interface.py) - Add str() cast for integer layer keys (trace_model.py) - Add list() copy to prevent getfullargspec mutation (trace_model.py) - Reset has_saved_gradients and unlogged_layers in save_new_activations (logging_funcs.py)

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

  • misc: Fix decoration, param detection, vis forwarding, and guards (4269c4d)

  • Remove dead double-decoration guard in decorate_torch.py - Add hasattr guard for torch.identity installation - Add None guard for code_context in FuncCallLocation.getitem - Fix is_quantized to check actual qint dtypes, not all non-float - Forward show_buffer_layers to rolled edge check in vis.py

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

  • safety: Add try/finally cleanup and exception state resets (c9bdb7f)

  • Wrap validate_saved_activations in try/finally for cleanup (user_funcs.py) - Wrap show_model_graph render_graph in try/finally with cleanup (user_funcs.py) - Reset _track_tensors and _pause_logging in exception handler (trace_model.py) - Update test to expect [] instead of None for cleared parent_params

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

  • validation: Fix isinstance checks and misleading variable name (581f286)

  • Replace type(val) == torch.Tensor with isinstance() (12 occurrences) - Rename mean_output to output_std for accuracy

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

Refactoring

  • postprocess: Split monolithic postprocess.py into thematic package (cd14d27)

Break 2,115-line postprocess.py into a postprocess/ package with 5 modules: - graph_traversal.py: Steps 1-4 (output nodes, ancestry, orphans, distances) - control_flow.py: Steps 5-7 (conditional branches, module fixes, buffers) - loop_detection.py: Step 8 (loop detection, isomorphic subgraph expansion) - labeling.py: Steps 9-12 (label mapping, final info, renaming, cleanup) - finalization.py: Steps 13-18 (undecoration, timing, params, modules, finish)

No behavioral changes — pure file reorganization.

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

Testing

  • aesthetic: Add aesthetic testing infrastructure for visual inspection (73cdf01)

Add regenerable human-inspectable outputs in tests/test_outputs/: - Comprehensive text report (aesthetic_report.txt) covering all user-facing reprs, accessors, DataFrames, error messages, and field dumps for every major data structure (ModelLog, TensorLog, RolledTensorLog, ModuleLog, ModulePassLog, ParamLog, ModuleAccessor, ParamAccessor) - 28 visualization PDFs exercising nesting depth, rolled/unrolled views, buffer visibility, graph direction, frozen params, and loop detection - 6 new aesthetic test models (AestheticDeepNested, AestheticSharedModule, AestheticBufferBranch, AestheticKitchenSink, AestheticFrozenMix) - Rename visualization_outputs/ → test_outputs/ for cleaner structure

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

  • aesthetic: Add gradient visualization coverage (085dcd9)

Add gradient backward arrows (blue edges) to aesthetic testing: - New _vis_gradient helper using log_forward_pass(save_gradients=True) + backward() - 5 gradient vis PDFs: deep nested, frozen mix, kitchen sink (various configs) - Gradient section (G) in text report: TensorLog/ParamLog grad fields, frozen contrast - GRADIENT_VIS_GALLERY integrated into LaTeX PDF report as Section 3

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

  • aesthetic: Add LaTeX PDF report with embedded visualizations (d101326)

Generate a comprehensive PDF report (aesthetic_report.pdf) alongside the text report, with: - tcolorbox-formatted sections for each model's outputs - Full field dumps for all data structures - All 28 visualization PDFs embedded inline with captions - Table of contents, figure numbering, proper typography - Skips gracefully if pdflatex not installed

Also saves the .tex source for customization.

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

v0.10.0 (2026-03-02)

Features

  • data: Add ModuleLog, ModulePassLog, and ModuleAccessor (5b0baa8)

Introduce structured per-module metadata classes following the ParamLog/ParamAccessor pattern. log.modules["features.3"] now returns a rich ModuleLog with class, params, layers, source info, hierarchy, hooks, forward signature, and nesting depth. Multi-pass modules support per-call access via passes dict and pass notation (e.g. "fc1:2").

  • ModulePassLog: per-(module, pass) lightweight container - ModuleLog: per-module-object user-facing class with delegating properties for single-pass modules - ModuleAccessor: dict-like accessor with summary()/to_pandas() - Metadata captured in prepare_model() before cleanup strips tl_* attrs - Forward args/kwargs captured per pass in module_forward_decorator() - build_module_logs() in postprocess Step 17 assembles everything - Old module* dicts kept alive for vis.py backward compat - 44 new tests, 315 total passing

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

Refactoring

  • data: Move ModelHistory & TensorLogEntry into data_classes/ (6ecc7e5)

Move model_history.py and tensor_log.py into torchlens/data_classes/ alongside the existing FuncCallLocation and ParamLog data classes. Update all relative imports across 12 consumer files. Also fixes a missing-dot bug in decorate_torch.py's TYPE_CHECKING import.

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

  • data: Rename ModelHistory/TensorLogEntry to ModelLog/TensorLog (f36718e)

  • Rename classes: ModelHistory → ModelLog, TensorLogEntry → TensorLog, RolledTensorLogEntry → RolledTensorLog - Rename file: data_classes/model_history.py → data_classes/model_log.py - Rename variables: model_history → model_log, source_model_history → source_model_log - Rename constants: MODEL_HISTORY_FIELD_ORDER → MODEL_LOG_FIELD_ORDER, TENSOR_LOG_ENTRY_FIELD_ORDER → TENSOR_LOG_FIELD_ORDER - Fold test_meta_validation.py into test_metadata.py

  • vis: Migrate vis.py and interface.py from module_* dicts to ModuleAccessor API (8e64d43)

Replace all direct accesses to old module_types, module_nparams, module_num_passes, module_pass_layers, module_pass_children, module_children, module_layers, module_num_tensors, module_pass_num_tensors, top_level_module_passes, and top_level_modules dicts with the new ModuleAccessor API (self.modules[addr]).

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

Testing

  • validation: Add meta-validation tests for corruption detection (11de9da)

Add 9 tests that deliberately corrupt saved activations and verify validate_saved_activations() catches each corruption: output/intermediate replacement, layer swap, zeroing, noise, scaling, wrong shape, and corrupted creation_args.

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

v0.9.0 (2026-03-01)

Features

  • data: Add ParamLog class for structured parameter metadata (f21f3e9)

Introduce ParamLog — a dedicated data class for parameter metadata — and ParamAccessor for convenient dict-like access by address, index, or short name. Parameters are now first-class objects with full metadata including trainable/frozen status, module info, linked params, gradient tracking, and optional optimizer tagging.

Key additions: - ParamLog class with lazy gradient detection via _param_ref - ParamAccessor on both ModelHistory (mh.params) and TensorLogEntry (entry.params) - Trainable/frozen tallies on MH, TLE, and per-module - Optional optimizer= param on log_forward_pass() - Visualization: param names with bracket convention, trainable/frozen color coding, gradient fills for mixed layers, caption breakdown - 68 new tests covering all functionality

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

v0.8.0 (2026-03-01)

Features

  • data: Add FuncCallLocation class for structured call stack metadata (49ada0e)

Replace unstructured List[Dict] func_call_stack with List[FuncCallLocation] providing clean repr with source context arrows, getitem/len support, and optional func_signature/func_docstring extraction. Introduces torchlens/data_classes/ package and threads a configurable num_context_lines parameter through the full call chain.

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

v0.7.2 (2026-03-01)

Bug Fixes

  • tests: Reduce r2plus1d_18 input size to prevent OOM (2c73fcd)

The test_video_r2plus1_18 test used a (16,3,16,112,112) input (~96M elements) which consistently got OOM-killed. Reduced to (1,3,1,112,112) which passes reliably while still exercising the full model.

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

v0.7.1 (2026-02-28)

Bug Fixes

  • logging: Handle complex-dtype tensors in tensor_nanequal (b425cbe)

torch.nan_to_num does not support complex tensors, which caused test_qml to fail when PennyLane quantum ops produced complex outputs. Use view_as_real/view_as_complex to handle NaN replacement for complex dtypes.

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

v0.7.0 (2026-02-28)

Bug Fixes

  • logging: Capture/restore RNG state for two-pass stochastic models (#58, 271cef3)

Move RNG state capture/restore before pytorch decoration to prevent internal .clone() calls from being intercepted by torchlens' decorated torch functions. Also speed up test_stochastic_loop by using a higher starting value.

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

  • logging: Ensure output layer parents are saved with layers_to_save (#46, b104094)

When layers_to_save is a subset, the fast pass now automatically includes parents of output layers in the save list. This ensures output layer tensor_contents is populated in postprocess_fast (which copies from parent).

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

  • logging: Replace copy.deepcopy with safe_copy to prevent infinite loops (#18, e1ad9ae)

copy.deepcopy hangs on complex tensor wrappers with circular references (e.g. ESCNN GeometricTensor). Replace with safe_copy_args/safe_copy_kwargs that clone tensors, recurse into standard containers, and leave other objects as references.

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

  • logging: Use argspec to disambiguate tuple/list input args (#43, a91e378)

When a model's forward() expects a single arg that IS a tuple/list of tensors, torchlens incorrectly unpacked it into multiple positional args. Now uses inspect.getfullargspec to detect single-arg models and wraps the tuple/list as a single arg. Also handles immutable tuples in _fetch_label_move_input_tensors device-move logic.

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

  • vis: Functional ops at end of container modules rendered as ovals (#48, 7d1c68c)

_check_if_only_non_buffer_in_module was too broad — it returned True for functional ops (like torch.relu) at the end of container modules with child submodules, causing them to render as boxes. Added a leaf-module check: only apply box rendering for modules with no child submodules.

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

Features

  • metadata: Track module training mode per submodule (#52, e5197ca)

Capture module.training in module_forward_decorator and store in ModelHistory.module_training_modes dict (keyed by module address). This lets users check whether each submodule was in train or eval mode during the forward pass.

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

v0.6.2 (2026-02-28)

Bug Fixes

  • postprocess: Correct _merge_buffer_entries call convention (#47, 07df691)

_merge_buffer_entries is a module-level function, not a method on ModelHistory. Fixes AttributeError when processing recurrent models with duplicate buffer entries.

Based on gilmoright's contribution in PR #56.

Co-authored-by: gilmoright artem.dahaka@gmail.com

v0.6.1 (2026-02-28)

Bug Fixes

  • tensor_log: Use getattr default in TensorLogEntry.copy() (538288d)

Prevents AttributeError when copying entries that predate newly added fields (e.g., deserialized from an older version).

Co-Authored-By: whisperLiang whisperLiang@users.noreply.github.com

v0.6.0 (2026-02-28)

Features

  • flops: Add per-layer FLOPs computation for forward and backward passes (31b43e6)

Compute forward and backward FLOPs at logging time for every traced operation. Uses category-based dispatch: zero-cost ops (view, reshape, etc.), element-wise ops with per-element cost, and specialty handlers for matmul, conv, normalization, pooling, reductions, and loss functions. Unknown ops return None rather than guessing.

  • New torchlens/flops.py with compute_forward_flops / compute_backward_flops - ModelHistory gains total_flops_forward, total_flops_backward, total_flops properties and flops_by_type() method - TensorLogEntry and RolledTensorLogEntry gain flops_forward / flops_backward fields - 28 new tests in test_metadata.py (unit + integration) - scripts/check_flops_coverage.py dev utility for auditing op coverage - Move test_video_r2plus1_18 to slow test file

Based on whisperLiang's contribution in PR #53.

Co-Authored-By: whisperLiang whisperLiang@users.noreply.github.com

v0.5.0 (2026-02-28)

Bug Fixes

  • logging: Revert children_tensor_versions to proven simpler detection (ade9c39)

The refactor version applied device/postfunc transforms to the stored value in children_tensor_versions, but validation compares against creation_args which are always raw. This caused fasterrcnn validation to fail. Revert to the simpler approach that stores raw arg copies and was verified passing twice.

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

  • postprocess: Gate output node variations on has_child_tensor_variations (4106be9)

Don't unconditionally store children_tensor_versions for output nodes. Gate on has_child_tensor_variations (set during exhaustive logging) to avoid false positives and preserve postfunc-applied tensor_contents.

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

  • postprocess: Rebuild pass assignments after loop detection and fix output handler (847b2a7)

Two fixes:

  1. _rebuild_pass_assignments: Multiple rounds of _expand_isomorphic_subgraphs can reassign a node to a new group while leaving stale same_layer_operations in the old group's members. This caused multiple raw tensors to map to the same layer:pass label, producing validation failures (e.g. fasterrcnn). The cleanup step groups tensors by their authoritative layer_label_raw and rebuilds consistent pass numbers.

  2. Output node handler: Replaced the has_child_tensor_variations gate with a direct comparison of actual output (with device/postfunc transforms) against tensor_contents using tensor_nanequal. This correctly handles in-place mutations through views (e.g. InPlaceZeroTensor) while preserving postfunc values for unmodified outputs.

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

  • validation: Handle bool and complex tensor perturbation properly (dfd2d7b)

  • Generate proper complex perturbations using torch.complex() instead of casting away imaginary part

    • Fix bool tensor crash by reordering .float().abs() (bool doesn't support abs, but float conversion handles it) - Add ContextUnet diffusion model to example_models.py for self-contained stable_diffusion test - Update test_stable_diffusion to use example_models.ContextUnet

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

Features

  • logging: Generalize was_getitem_applied to has_child_tensor_variations (1177f60)

Replace the getitem-specific parent detection with runtime mismatch detection that catches any case where a parent's tensor_contents diverges from what children actually received (getitem slicing, view mutations through shared storage, in-place ops after logging, etc.).

Key changes: - Rename was_getitem_applied → has_child_tensor_variations - Detection now compares arg copies against parent tensor_contents at child-creation time, with transform-awareness (device + postfunc) - Output nodes now detect value changes vs parent tensor_contents - Use tensor_nanequal (not torch.equal) for dtype/NaN consistency - Fix fast-mode: clear stale state on re-run, prevent double-postfunc - Use clean_to and try/finally for _pause_logging safety - Add 6 view-mutation stress tests (unsqueeze, reshape, transpose, multiple, chained, false-positive control)

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

Refactoring

  • loops: Cherry-pick approved changes from feat/loop-detection-hardening (33e0f22)

  • Rename 10 loop detection functions to clearer names (e.g. _assign_corresponding_tensors_to_same_layer → _detect_and_label_loops, _fetch_and_process_next_isomorphic_nodes → _advance_bfs_frontier) - Rename 6 local variables for clarity (e.g. node_to_iso_group_dict → node_to_iso_leader, subgraphs_dict → subgraph_info) - Add SubgraphInfo dataclass replacing dict-based subgraph bookkeeping - Replace list.pop(0) with deque.popleft() in BFS traversals - Remove ungrouped sweep in _merge_iso_groups_to_layers - Remove safe_copy in postprocess_fast (direct reference suffices) - Rewrite _get_hash_from_args to preserve positional indices, kwarg names, and dict keys via recursive _append_arg_hash helper - Remove vestigial index_in_saved_log field from TensorLogEntry, constants.py, logging_funcs.py, and postprocess.py - Fix PEP8: type(x) ==/!= Y → type(x) is/is not Y in two files - Split test_real_world_models.py into fast and slow test files - Add 12 new edge-case loop detection test models and test functions

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

v0.4.1 (2026-02-26)

Bug Fixes

  • loops: Refine iso groups to prevent false equivalence in loop detection (5aaad70)

When operations share the same equivalence type but occupy structurally different positions (e.g., sin(x) in a loop body vs sin(y) in a branch), the BFS expansion incorrectly groups them together. Add _refine_iso_groups to split such groups using direction-aware neighbor connectivity.

Also adds NestedParamFreeLoops test model and prefixes intentionally unused variables in test models with underscores to satisfy linting.

Code Style

v0.4.0 (2026-02-26)

Chores

  • tests: Move visualization_outputs into tests/ directory (22ed018)

Anchor vis_outpath to tests/ via VIS_OUTPUT_DIR constant in conftest.py so test outputs don't pollute the project root.

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

Features

  • core: Generalize in-place op handling, fix loop grouping, and harden validation (c72d6a6)

  • Generalize in-place op detection in decorate_torch.py: use was_inplace flag based on output identity instead of hardcoded function name list, and propagate tensor labels for all in-place ops

    • Fix ungrouped isomorphic nodes in postprocess.py: sweep for iso nodes left without a layer group after pairwise grouping (fixes last-iteration loop subgraphs with no params) - Deduplicate ground truth output tensors by address in user_funcs.py to match trace_model.py extraction behavior - Add validation exemptions: bernoulli_/full arg mismatches from in-place RNG ops, meshgrid/broadcast_tensors multi-output perturbation, *_like ops that depend only on shape/dtype/device - Gracefully handle invalid perturbed arguments (e.g. pack_padded_sequence) instead of raising - Guard empty arg_labels in vis.py edge label rendering - Fix test stability: reduce s3d batch size, add eval mode and bool mask dtype for StyleTTS

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

v0.3.1 (2026-02-25)

Bug Fixes

  • core: Fix in-place op tracking, validation, and test stability (326b8a9)

  • Propagate tensor labels back to original tensor after in-place ops (setitem, zero_, delitem) so subsequent operations see updated labels - Add validation exemption for scalar setitem assignments - Fix torch.Tensor → torch.tensor for correct special value detection - Remove xfail marker from test_varying_loop_noparam2 (now passes) - Add ruff lint ignores for pre-existing E721/F401 across codebase - Includes prior bug-blitz fixes across logging, postprocessing, cleanup, helper functions, visualization, and model tracing

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

v0.3.0 (2026-02-25)

Chores

  • ci: Replace black with ruff auto-format on push (e0cb9e1)

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

  • config: Add ruff config and replace black/isort with ruff in pre-commit (c27ced8)

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

  • config: Set major_on_zero to true (d63451c)

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

Features

  • tests: Restructure test suite into focused files with metadata coverage (31b0353)

Split monolithic test_validation_and_visuals.py (3130 lines, 155 tests) into: - conftest.py: shared fixtures and deterministic seeding - test_toy_models.py: 78 tests (66 migrated + 12 new API coverage tests) - test_metadata.py: 44 comprehensive metadata field tests (7 test classes) - test_real_world_models.py: 75 tests with local imports and importorskip

New tests cover: log_forward_pass parameters (layers_to_save, save_function_args, activation_postfunc, mark_distances), get_model_metadata, ModelHistory access patterns, TensorLogEntry field validation, recurrent metadata, and GeluModel.

Removed 13 genuine size-duplicate tests (ResNet101/152, VGG19, etc.). All optional dependencies now use pytest.importorskip for graceful skipping.

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

v0.2.0 (2026-02-24)

Bug Fixes

  • ci: Enable verbose PyPI upload and disable attestations for debugging (579506d)

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

  • ci: Fetch tags in release workflow so semantic-release finds v0.1.36 (0f0a3d0)

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

  • ci: Pin semantic-release to v9 and add debug output (b68388c)

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

  • ci: Prevent major version bump on 0.x releases (db4a31e)

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

  • ci: Remove direct URL dependency rejected by PyPI and clean up workflow (500368b)

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

  • ci: Use GitHub App token to bypass branch protection in release workflow (f2cf8ae)

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

Features

  • build: Migrate to pyproject.toml with semantic-release and GitHub Actions (f8e01c9)

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

v0.1.36 (2025-09-26)