Skip to content

Releases: johnmarktaylor91/torchlens

v0.16.1

07 Mar 22:00

Choose a tag to compare

v0.16.1 (2026-03-07)

This release is published under the GPL-3.0-only License.

Bug Fixes

  • tests: Use relative import for example_models in test_large_graphs (e2d0ae4)

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

  • vis: Harden ELK heap scaling and fix flaky signal safety test (41b9f89)

  • Bump ELK Node.js heap scaling from 8x to 16x JSON size to prevent OOM
    on 250k+ node graphs

  • Mark 100k node tests as @rare (too slow for regular runs)

  • Fix flaky TestSignalSafety: use setitimer(50ms) instead of alarm(1s),
    increase model iterations to 50k, skip if alarm doesn't fire

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com


Detailed Changes: v0.16.0...v0.16.1

v0.16.0

07 Mar 19:59

Choose a tag to compare

v0.16.0 (2026-03-07)

This release is published under the GPL-3.0-only License.

Bug Fixes

  • validation: Batch of bug fixes for edge-case models (dca5a7e)

  • Fix vmap/functorch compatibility: skip logging inside functorch transforms to avoid missing batching rules (torch_funcs.py) - Fix tensor_nanequal infinite recursion: wrap decorated tensor ops (.isinf, .resolve_conj, etc.) in pause_logging() (tensor_utils.py) - Fix perturbation for range-restricted functions: use uniform random within original value range instead of scaled normal (core.py) - Fix atomic_bool_val crash inside vmap context (output_tensors.py) - Fix output node initialized_inside_model flag (graph_traversal.py) - Add scatter_ full-overwrite exemption (exemptions.py) - Add max/min indices exemption for integer dtype outputs - Add bernoulli scalar-p exemption, constant-output exemption - Add non-perturbed parent special-value check for nested args (einsum) - Fix buffer_xrefs invariant: accept ancestor module matches - Fix real-world model configs: CvT, CLAP, EnCodec, SpeechT5, Informer, Autoformer, MobileBERT kwarg names

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

  • vis: Fix ELK rendering and scale for large graphs (ea96a85)

  • Fix neato -n position format: use points (not inches) — fixes empty
    nodes, missing edges, and invisible labels in ELK output

  • Auto-scale Node.js heap and stack with graph JSON size

  • Auto-scale ELK and neato timeouts with node count

  • Use straight-line edges for graphs > 1k nodes (spline routing is O(n^2))

  • Warn users to use SVG format for graphs > 25k nodes (PDF renders empty)

  • Add dot-vs-ELK aesthetic comparison tests at 15/100/500/1k/3k nodes

  • Add 1M node test (rare marker) for trophy-file rendering

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

Features

  • vis: Add ELK layout engine for large graph visualization (5245872)

  • New elk_layout.py: ELK-based node placement via Node.js/elkjs subprocess,
    with graceful fallback to sfdp when unavailable

  • New vis_node_placement parameter ("auto"/"dot"/"elk"/"sfdp") threaded
    through show_model_graph, log_forward_pass, and render_graph

  • Auto mode uses dot for <3500 nodes, ELK (or sfdp fallback) for larger

  • RandomGraphModel in example_models.py: seeded random model generator
    with calibrated node counts (within ~5% of target up to 100k+)

  • 39 tests in test_large_graphs.py: node count accuracy, validation,
    engine selection, ELK utilities, rendering at 3k-100k scales,
    dot threshold benchmark

  • Increased Node.js stack size (--stack-size=65536) to handle graphs
    up to 250k+ nodes without stack overflow

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

  • vis: Add hierarchical module grouping to ELK layout (bcc1ad2)

  • New build_elk_graph_hierarchical(): builds nested ELK compound nodes
    from module containment structure (containing_modules_origin_nested)

  • ELK's "INCLUDE_CHILDREN" hierarchy handling preserves module grouping
    in the layout — nodes within the same module cluster together

  • inject_elk_positions() now recurses into compound nodes, accumulating
    absolute positions from nested ELK coordinates

  • render_with_elk() passes entries_to_plot for hierarchical layout,
    falls back to flat DOT parsing when entries not available

  • Tests for hierarchical graph building and nested position injection

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

Testing

  • vis: Add rare marker for 250k-node tests, fill test coverage gaps (c749b94)

  • New pytest marker "rare": always excluded by default via addopts,
    run explicitly with pytest -m rare

  • Add 250k node count, validation, and ELK render tests (marked rare)

  • Fill gaps: validation tests at 5k/10k/20k/50k/100k,
    ELK render tests at 5k/20k, node count test at 20k

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com


Detailed Changes: v0.15.15...v0.16.0

v0.15.15

07 Mar 04:55

Choose a tag to compare

v0.15.15 (2026-03-07)

This release is published under the GPL-3.0-only License.

Bug Fixes

  • gc: Release ParamLog._param_ref on cleanup, add GC test suite (#GC-1, #GC-12) (83e1bd2)

  • Add ParamLog.release_param_ref() to cache grad metadata then null _param_ref

  • cleanup() now nulls all _param_ref before clearing entries

  • Add ModelLog.release_param_refs() public API for early param release

  • Add _param_logs_by_module to cleanup's internal containers list

  • New test_gc.py with 10 tests covering ModelLog/param GC, memory growth,
    save_new_activations stability, and transient data clearing

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

Documentation

  • Move RESULTS.md to repo root for visibility (2e814dd)

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

  • tests: Add public test results summary (fd7b33f)

Committed tests/RESULTS.md with suite overview, model compatibility matrix (121 toy + 85 real-world), profiling baselines, and pointers to generated reports. Transparent scoreboard for the repo.

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

Testing

  • models: Add autoencoders, state space models, and architecture coverage (716cc9d)

Toy models (18 new in example_models.py):

  • Autoencoders: VanillaAutoencoder, ConvAutoencoder, SparseAutoencoder,
    DenoisingAutoencoder, VQVAE, BetaVAE, ConditionalVAE
  • State space: SimpleSSM, SelectiveSSM (Mamba-style), GatedSSMBlock, StackedSSM
  • Additional: SiameseNetwork, MLPMixer, SimpleGCN, SimpleGAT,
    SimpleDiffusion, SimpleNormalizingFlow, CapsuleNetwork

Real-world models (5 new in test_real_world_models.py):

  • SSMs: Mamba, Mamba-2, RWKV, Falcon-Mamba (via transformers)
  • Autoencoders: ViT-MAE ForPreTraining (via transformers)

All 22 new tests pass. Updated RESULTS.md to reflect 736 total tests, 139 toy models, 92 real-world models.

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

  • models: Close remaining stress-test gaps — MAML, NeRF, RecurrentGemma, VOLO (a1f2254)

Toy models (+2):

  • MAMLInnerLoop: higher-order gradients (torch.autograd.grad inside forward)
  • TinyNeRF: differentiable volumetric rendering (ray marching + alpha compositing)

Real-world models (+2):

  • RecurrentGemma: Griffin architecture (linear recurrence + local attention hybrid)
  • VOLO: outlooker attention (distinct from standard self-attention)

Closes 37/38 stress-test patterns from taxonomy. Only remaining gap is test-time training (TTT layers) which requires gradient computation within inference — fundamentally incompatible with TorchLens forward-pass logging.

Total: 249 toy models, 185 real-world models, 892 tests.

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

  • models: Exhaustive architecture coverage across 30+ categories (8411a72)

Add 32 new toy models (Groups M-R) covering distinct computational patterns: - Group M: Attention variants (MQA, GQA, RoPE, ALiBi, slot, cross-attention) - Group N: Gating & skip patterns (highway, SE, depthwise-sep, inverted-residual, FPN) - Group O: Generative & self-supervised (hierarchical VAE, gated conv, masked conv, SimCLR, stop-gradient/BYOL, AdaIN) - Group P: Exotic architectures (hypernetwork, DEQ, neural ODE, NTM memory, SwiGLU) - Group Q: Graph neural networks (GraphSAGE, GIN, EdgeConv, graph transformer) - Group R: Additional patterns (MoE, spatial transformer, dueling DQN, RMS norm, sparse pruning, Fourier mixing)

Add 37 new real-world model tests: - Decoder-only LLMs: LLaMA, Mistral, Phi, Gemma, Qwen2, Falcon, BLOOM, OPT - Encoder-only: ALBERT, DeBERTa-v2, XLM-RoBERTa - Encoder-decoder: Pegasus, LED - Efficient transformers: FNet, Nystromformer, BigBird - MoE: Mixtral, Switch Transformer - Vision transformers: DeiT, CvT, SegFormer - Detection: DETR, Mask R-CNN (train+eval) - Perceiver IO, PatchTST, Decision Transformer - timm: HRNet, EfficientNetV2, LeViT, CrossViT, PVT-v2, Twins-SVT, FocalNet - GNN (PyG): GraphSAGE, GIN, Graph Transformer

Total: 805 tests, 213 toy models, 129 real-world tests.

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

  • models: Exhaustive coverage expansion — 20 toy + 33 real-world architectures (4f4e7ae)

Toy models (+20): GRU, NiN, ChannelShuffle, PixelShuffle, PartialConv, FiLM, CoordinateAttention, DifferentialAttention, RelativePositionAttention, EarlyExit, MultiScaleParallel, GumbelVQ, EndToEndMemoryNetwork, RBFNetwork, SIREN, MultiTask, WideAndDeep, ChebGCN, PrototypicalNetwork, ECA.

Real-world models (+33): GPT-J, GPTBigCode, GPT-NeoX, FunnelTransformer, CANINE, MobileBERT, mBART, ProphetNet, WavLM, Data2VecAudio, UniSpeech, ConvNeXt-v2, NFNet, DaViT, CoAtNet, RepVGG, ReXNet, PiT, Visformer, GC-ViT, EfficientFormer, FastViT, NesT, Sequencer2D, TResNet, SigLIP, BLIP-2, Deformable DETR, LayoutLM, TimeSeriesTransformer, ChebConv, SGConv, TAGConv.

Total: 241 toy models, 183 real-world models, 882 tests. RESULTS.md updated with all new entries and pattern coverage table.

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

  • models: Final coverage pass — 6 novel computational patterns (e33c71a)

New toy models targeting genuinely missing graph patterns:

  • LinearAttentionModel: kernel-based phi(Q)(phi(K)^T V), no softmax
  • SimpleFNO: FFT -> learned spectral weights -> iFFT (Fourier Neural Operator)
  • PerceiverModel: cross-attention to fixed learned latent bottleneck
  • ASPPModel: multi-rate parallel dilated convolutions (DeepLab ASPP)
  • ControlNetModel: parallel encoder copy + zero-conv injection
  • SimpleEGNN: E(n) equivariant message passing with coordinate updates

Total: 247 toy models, 183 real-world models, 888 tests. RESULTS.md updated with new patterns and counts.

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

  • models: Gap-fill 8 toy models + 21 real-world models for exhaustive coverage (7d5f879)

Toy models (8 new, 221 total):

  • LeNet5, BiLSTM, Seq2SeqWithAttention, TripletNetwork
  • BarlowTwinsModel, DeepCrossNetwork, AxialAttentionModel, CBAMBlock

Real-world models (21 new, 150 total):

  • TorchVision: MobileNetV3, Keypoint R-CNN (train+eval)
  • timm: Res2Net, gMLP, ResMLP, EVA-02
  • HF decoder-only: OLMo
  • HF vision: DINOv2
  • HF efficient: Longformer, Reformer
  • HF audio: AST, CLAP, EnCodec, SEW, SpeechT5, VITS
  • HF time series: Informer, Autoformer
  • PyG GNN: GATv2, R-GCN

834 total tests, 221 toy models, 150 real-world tests.

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com


Detailed Changes: v0.15.14...v0.15.15

v0.15.14

07 Mar 02:02

Choose a tag to compare

v0.15.14 (2026-03-07)

This release is published under the GPL-3.0-only License.

Performance Improvements

  • capture: Lazy _fsize_nice properties, remove _trim_and_reorder, batch pause_logging, drop deepcopy (986bd91)

Four low-risk optimizations targeting remaining allocation pressure and per-operation overhead in the instrumentation path:

  1. Lazy _fsize_nice properties — tensor_fsize_nice, grad_fsize_nice, parent_params_fsize_nice, total_params_fsize_nice, params_fsize_nice, and fsize_nice converted from eagerly computed strings to @Property methods. Eliminates ~2700 human_readable_size() calls per Swin-T pass.

  2. Remove _trim_and_reorder from postprocess — the OrderedDict rebuild of every LayerPassLog's dict (685 calls, ~0.04s on Swin-T) is purely cosmetic. Python 3.7+ dicts maintain insertion order. Function definition kept for opt-in use.

  3. Batch pause_logging for tensor memory — inline nelement() * element_size() at the two hottest call sites (_build_param_fields, _log_output_tensor_info) inside a single pause_logging() context. Eliminates per-call context manager overhead (~1088 calls).

  4. Remove activation_postfunc deepcopy — copy.deepcopy() on a callable is unnecessary; callables are effectively immutable.

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com


Detailed Changes: v0.15.13...v0.15.14

v0.15.13

07 Mar 01:28

Choose a tag to compare

v0.15.13 (2026-03-07)

This release is published under the GPL-3.0-only License.

Performance Improvements

  • capture: Speed-optimized defaults and remaining bottleneck elimination (#110, 7c9a00c)

Seven targeted optimizations that reduce Swin-T log_forward_pass from 5.91s to 1.55s (3.8x):

  1. Unified save_source_context flag (was save_call_stacks) — controls both per-function call stacks AND module source/signature fetching. Default: False. 2. save_rng_states=False default — skips per-op RNG state capture. Auto-enabled by validate_forward_pass. Uses torch_only=True when enabled (skips Python/NumPy RNG). 3. Inline isinstance in wrapped_func — _collect_tensor_args() and _collect_output_tensors() replace BFS crawls for flat arg/output cases. Falls back to BFS only for nested containers. 4. dict scan for buffer prep/cleanup — replaces iter_accessible_attributes (dir() + MRO walk) with direct dict iteration. 10x faster for buffer tagging and tensor cleanup. 5. Hoisted warnings.catch_warnings() — moved from per-attribute (46K entries) to caller level. 6. Lazy module metadata — _get_class_metadata skips inspect.getsourcelines/inspect.signature when save_source_context=False. Only captures class name and docstrings. 7. Module-level import weakref — moved from per-call in
    _trim_and_reorder to module level.

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com


Detailed Changes: v0.15.12...v0.15.13

v0.15.12

06 Mar 23:49

Choose a tag to compare

v0.15.12 (2026-03-06)

This release is published under the GPL-3.0-only License.

Performance Improvements

  • capture: O(1) tensor/param extraction via per-function ArgSpec lookup table (b1d6c56)

Replace expensive 3-level BFS crawl (~1.44s, 39% self-time, ~1.9M getattr calls) with O(1) position-based lookups using a static ArgSpec table of 350+ entries. Three-tier strategy: static table for known torch functions, dynamic cache for user-defined modules, BFS fallback (fires at most once per unique class).

Also hoists warnings.catch_warnings() from per-attribute (~77K entries) to per-call level, and adds usage stats collection + coverage test infrastructure.

Benchmark: Swin-T log_forward_pass 5.91s → 4.41s (-25%).

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com


Detailed Changes: v0.15.11...v0.15.12

v0.15.11

06 Mar 05:48

Choose a tag to compare

v0.15.11 (2026-03-06)

This release is published under the GPL-3.0-only License.

Bug Fixes

  • gc: Convert back-references to weakrefs and add optional call stack collection (2bed167)

Convert 5 circular back-references from strong to weakref.ref() so ModelLog and its children (LayerPassLog, LayerLog, ModuleLog, BufferAccessor, LayerAccessor) no longer prevent timely garbage collection. GPU tensors are now freed immediately when the last strong reference to ModelLog is dropped, instead of waiting for Python's gen-2 GC cycle.

Also add save_call_stacks parameter to log_forward_pass() (default True). When False, skips _get_func_call_stack() on every tensor operation, which is the main per-op overhead in production use. Call stacks remain on by default for pedagogical use.

Fixes: GC-2, GC-3, GC-4, PERF-19

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

Documentation

  • Add folder-wise CLAUDE.md files and comprehensive inline documentation (deaec2d)

  • Add CLAUDE.md to every package directory (torchlens/, capture/, data_classes/,
    decoration/, postprocess/, utils/, validation/, visualization/, tests/, scripts/,
    .github/) with file maps, key concepts, gotchas, and cross-references

  • Add module-level docstrings, function/class docstrings, and inline comments
    across all 39 source files explaining non-obvious logic, ordering dependencies,
    design decisions, and invariants

  • Fix coverage HTML output directory in pyproject.toml to point to
    tests/test_outputs/reports/coverage_html (matching conftest.py)

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com


Detailed Changes: v0.15.10...v0.15.11

v0.15.10

05 Mar 20:40

Choose a tag to compare

v0.15.10 (2026-03-05)

This release is published under the GPL-3.0-only License.

Bug Fixes

  • tests: Move coverage HTML output to reports directory (916b2d1)

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com


Detailed Changes: v0.15.9...v0.15.10

v0.15.9

05 Mar 14:03

Choose a tag to compare

v0.15.9 (2026-03-05)

This release is published under the GPL-3.0-only License.

Bug Fixes

  • tests: Use render_graph return value instead of reading cleaned-up .gv file (7998776)

Commit 147c7b7 added cleanup=True to dot.render(), which deletes the intermediate .gv source file after rendering. The TestVisualizationParams tests were reading that source file and all 15 failed with FileNotFoundError.

render_graph now returns dot.source so tests can inspect the graphviz source without depending on the intermediate file.

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com


Detailed Changes: v0.15.8...v0.15.9

v0.15.8

05 Mar 01:12

Choose a tag to compare

v0.15.8 (2026-03-05)

This release is published under the GPL-3.0-only License.

Bug Fixes

  • tests: Generate HTML coverage report in sessionfinish hook (29095f9)

The pyproject.toml configured coverage_html output directory but the pytest_sessionfinish hook only generated the text report. Add cov.html_report() call so HTML reports are written alongside the text summary.

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com


Detailed Changes: v0.15.7...v0.15.8