Releases · johnmarktaylor91/torchlens

07 Mar 22:00

torchlens-release

v0.16.1

b040b6d

v0.16.1

v0.16.1 (2026-03-07)

This release is published under the GPL-3.0-only License.

Bug Fixes

tests: Use relative import for example_models in test_large_graphs (e2d0ae4)

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

vis: Harden ELK heap scaling and fix flaky signal safety test (41b9f89)
Bump ELK Node.js heap scaling from 8x to 16x JSON size to prevent OOM
on 250k+ node graphs
Mark 100k node tests as @rare (too slow for regular runs)
Fix flaky TestSignalSafety: use setitimer(50ms) instead of alarm(1s),
increase model iterations to 50k, skip if alarm doesn't fire

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

Detailed Changes: v0.16.0...v0.16.1

Contributors

rare

Assets 4

07 Mar 19:59

torchlens-release

v0.16.0

478d872

v0.16.0

v0.16.0 (2026-03-07)

This release is published under the GPL-3.0-only License.

Bug Fixes

validation: Batch of bug fixes for edge-case models (dca5a7e)
Fix vmap/functorch compatibility: skip logging inside functorch transforms to avoid missing batching rules (torch_funcs.py) - Fix tensor_nanequal infinite recursion: wrap decorated tensor ops (.isinf, .resolve_conj, etc.) in pause_logging() (tensor_utils.py) - Fix perturbation for range-restricted functions: use uniform random within original value range instead of scaled normal (core.py) - Fix atomic_bool_val crash inside vmap context (output_tensors.py) - Fix output node initialized_inside_model flag (graph_traversal.py) - Add scatter_ full-overwrite exemption (exemptions.py) - Add max/min indices exemption for integer dtype outputs - Add bernoulli scalar-p exemption, constant-output exemption - Add non-perturbed parent special-value check for nested args (einsum) - Fix buffer_xrefs invariant: accept ancestor module matches - Fix real-world model configs: CvT, CLAP, EnCodec, SpeechT5, Informer, Autoformer, MobileBERT kwarg names

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

vis: Fix ELK rendering and scale for large graphs (ea96a85)
Fix neato -n position format: use points (not inches) — fixes empty
nodes, missing edges, and invisible labels in ELK output
Auto-scale Node.js heap and stack with graph JSON size
Auto-scale ELK and neato timeouts with node count
Use straight-line edges for graphs > 1k nodes (spline routing is O(n^2))
Warn users to use SVG format for graphs > 25k nodes (PDF renders empty)
Add dot-vs-ELK aesthetic comparison tests at 15/100/500/1k/3k nodes
Add 1M node test (rare marker) for trophy-file rendering

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

Features

vis: Add ELK layout engine for large graph visualization (5245872)
New elk_layout.py: ELK-based node placement via Node.js/elkjs subprocess,
with graceful fallback to sfdp when unavailable
New vis_node_placement parameter ("auto"/"dot"/"elk"/"sfdp") threaded
through show_model_graph, log_forward_pass, and render_graph
Auto mode uses dot for <3500 nodes, ELK (or sfdp fallback) for larger
RandomGraphModel in example_models.py: seeded random model generator
with calibrated node counts (within ~5% of target up to 100k+)
39 tests in test_large_graphs.py: node count accuracy, validation,
engine selection, ELK utilities, rendering at 3k-100k scales,
dot threshold benchmark
Increased Node.js stack size (--stack-size=65536) to handle graphs
up to 250k+ nodes without stack overflow

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

vis: Add hierarchical module grouping to ELK layout (bcc1ad2)
New build_elk_graph_hierarchical(): builds nested ELK compound nodes
from module containment structure (containing_modules_origin_nested)
ELK's "INCLUDE_CHILDREN" hierarchy handling preserves module grouping
in the layout — nodes within the same module cluster together
inject_elk_positions() now recurses into compound nodes, accumulating
absolute positions from nested ELK coordinates
render_with_elk() passes entries_to_plot for hierarchical layout,
falls back to flat DOT parsing when entries not available
Tests for hierarchical graph building and nested position injection

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

Testing

vis: Add rare marker for 250k-node tests, fill test coverage gaps (c749b94)
New pytest marker "rare": always excluded by default via addopts,
run explicitly with pytest -m rare
Add 250k node count, validation, and ELK render tests (marked rare)
Fill gaps: validation tests at 5k/10k/20k/50k/100k,
ELK render tests at 5k/20k, node count test at 20k

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

Detailed Changes: v0.15.15...v0.16.0

Assets 4

07 Mar 04:55

torchlens-release

v0.15.15

c124ac7

v0.15.15

v0.15.15 (2026-03-07)

This release is published under the GPL-3.0-only License.

Bug Fixes

gc: Release ParamLog._param_ref on cleanup, add GC test suite (#GC-1, #GC-12) (83e1bd2)
Add ParamLog.release_param_ref() to cache grad metadata then null _param_ref
cleanup() now nulls all _param_ref before clearing entries
Add ModelLog.release_param_refs() public API for early param release
Add _param_logs_by_module to cleanup's internal containers list
New test_gc.py with 10 tests covering ModelLog/param GC, memory growth,
save_new_activations stability, and transient data clearing

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

Documentation

Move RESULTS.md to repo root for visibility (2e814dd)

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

tests: Add public test results summary (fd7b33f)

Committed tests/RESULTS.md with suite overview, model compatibility matrix (121 toy + 85 real-world), profiling baselines, and pointers to generated reports. Transparent scoreboard for the repo.

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

Testing

models: Add autoencoders, state space models, and architecture coverage (716cc9d)

Toy models (18 new in example_models.py):

Autoencoders: VanillaAutoencoder, ConvAutoencoder, SparseAutoencoder,
DenoisingAutoencoder, VQVAE, BetaVAE, ConditionalVAE
State space: SimpleSSM, SelectiveSSM (Mamba-style), GatedSSMBlock, StackedSSM
Additional: SiameseNetwork, MLPMixer, SimpleGCN, SimpleGAT,
SimpleDiffusion, SimpleNormalizingFlow, CapsuleNetwork

Real-world models (5 new in test_real_world_models.py):

SSMs: Mamba, Mamba-2, RWKV, Falcon-Mamba (via transformers)
Autoencoders: ViT-MAE ForPreTraining (via transformers)

All 22 new tests pass. Updated RESULTS.md to reflect 736 total tests, 139 toy models, 92 real-world models.

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

models: Close remaining stress-test gaps — MAML, NeRF, RecurrentGemma, VOLO (a1f2254)

Toy models (+2):

MAMLInnerLoop: higher-order gradients (torch.autograd.grad inside forward)
TinyNeRF: differentiable volumetric rendering (ray marching + alpha compositing)

Real-world models (+2):

RecurrentGemma: Griffin architecture (linear recurrence + local attention hybrid)
VOLO: outlooker attention (distinct from standard self-attention)

Closes 37/38 stress-test patterns from taxonomy. Only remaining gap is test-time training (TTT layers) which requires gradient computation within inference — fundamentally incompatible with TorchLens forward-pass logging.

Total: 249 toy models, 185 real-world models, 892 tests.

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

models: Exhaustive architecture coverage across 30+ categories (8411a72)

Add 32 new toy models (Groups M-R) covering distinct computational patterns: - Group M: Attention variants (MQA, GQA, RoPE, ALiBi, slot, cross-attention) - Group N: Gating & skip patterns (highway, SE, depthwise-sep, inverted-residual, FPN) - Group O: Generative & self-supervised (hierarchical VAE, gated conv, masked conv, SimCLR, stop-gradient/BYOL, AdaIN) - Group P: Exotic architectures (hypernetwork, DEQ, neural ODE, NTM memory, SwiGLU) - Group Q: Graph neural networks (GraphSAGE, GIN, EdgeConv, graph transformer) - Group R: Additional patterns (MoE, spatial transformer, dueling DQN, RMS norm, sparse pruning, Fourier mixing)

Add 37 new real-world model tests: - Decoder-only LLMs: LLaMA, Mistral, Phi, Gemma, Qwen2, Falcon, BLOOM, OPT - Encoder-only: ALBERT, DeBERTa-v2, XLM-RoBERTa - Encoder-decoder: Pegasus, LED - Efficient transformers: FNet, Nystromformer, BigBird - MoE: Mixtral, Switch Transformer - Vision transformers: DeiT, CvT, SegFormer - Detection: DETR, Mask R-CNN (train+eval) - Perceiver IO, PatchTST, Decision Transformer - timm: HRNet, EfficientNetV2, LeViT, CrossViT, PVT-v2, Twins-SVT, FocalNet - GNN (PyG): GraphSAGE, GIN, Graph Transformer

Total: 805 tests, 213 toy models, 129 real-world tests.

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

models: Exhaustive coverage expansion — 20 toy + 33 real-world architectures (4f4e7ae)

Toy models (+20): GRU, NiN, ChannelShuffle, PixelShuffle, PartialConv, FiLM, CoordinateAttention, DifferentialAttention, RelativePositionAttention, EarlyExit, MultiScaleParallel, GumbelVQ, EndToEndMemoryNetwork, RBFNetwork, SIREN, MultiTask, WideAndDeep, ChebGCN, PrototypicalNetwork, ECA.

Real-world models (+33): GPT-J, GPTBigCode, GPT-NeoX, FunnelTransformer, CANINE, MobileBERT, mBART, ProphetNet, WavLM, Data2VecAudio, UniSpeech, ConvNeXt-v2, NFNet, DaViT, CoAtNet, RepVGG, ReXNet, PiT, Visformer, GC-ViT, EfficientFormer, FastViT, NesT, Sequencer2D, TResNet, SigLIP, BLIP-2, Deformable DETR, LayoutLM, TimeSeriesTransformer, ChebConv, SGConv, TAGConv.

Total: 241 toy models, 183 real-world models, 882 tests. RESULTS.md updated with all new entries and pattern coverage table.

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

models: Final coverage pass — 6 novel computational patterns (e33c71a)

New toy models targeting genuinely missing graph patterns:

LinearAttentionModel: kernel-based phi(Q)(phi(K)^T V), no softmax
SimpleFNO: FFT -> learned spectral weights -> iFFT (Fourier Neural Operator)
PerceiverModel: cross-attention to fixed learned latent bottleneck
ASPPModel: multi-rate parallel dilated convolutions (DeepLab ASPP)
ControlNetModel: parallel encoder copy + zero-conv injection
SimpleEGNN: E(n) equivariant message passing with coordinate updates

Total: 247 toy models, 183 real-world models, 888 tests. RESULTS.md updated with new patterns and counts.

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

models: Gap-fill 8 toy models + 21 real-world models for exhaustive coverage (7d5f879)

Toy models (8 new, 221 total):

LeNet5, BiLSTM, Seq2SeqWithAttention, TripletNetwork
BarlowTwinsModel, DeepCrossNetwork, AxialAttentionModel, CBAMBlock

Real-world models (21 new, 150 total):

TorchVision: MobileNetV3, Keypoint R-CNN (train+eval)
timm: Res2Net, gMLP, ResMLP, EVA-02
HF decoder-only: OLMo
HF vision: DINOv2
HF efficient: Longformer, Reformer
HF audio: AST, CLAP, EnCodec, SEW, SpeechT5, VITS
HF time series: Informer, Autoformer
PyG GNN: GATv2, R-GCN

834 total tests, 221 toy models, 150 real-world tests.

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

Detailed Changes: v0.15.14...v0.15.15

Assets 4

07 Mar 02:02

torchlens-release

v0.15.14

a24b534

v0.15.14

v0.15.14 (2026-03-07)

This release is published under the GPL-3.0-only License.

Performance Improvements

capture: Lazy _fsize_nice properties, remove _trim_and_reorder, batch pause_logging, drop deepcopy (986bd91)

Four low-risk optimizations targeting remaining allocation pressure and per-operation overhead in the instrumentation path:

Lazy _fsize_nice properties — tensor_fsize_nice, grad_fsize_nice, parent_params_fsize_nice, total_params_fsize_nice, params_fsize_nice, and fsize_nice converted from eagerly computed strings to @Property methods. Eliminates ~2700 human_readable_size() calls per Swin-T pass.
Remove _trim_and_reorder from postprocess — the OrderedDict rebuild of every LayerPassLog's dict (685 calls, ~0.04s on Swin-T) is purely cosmetic. Python 3.7+ dicts maintain insertion order. Function definition kept for opt-in use.
Batch pause_logging for tensor memory — inline nelement() * element_size() at the two hottest call sites (_build_param_fields, _log_output_tensor_info) inside a single pause_logging() context. Eliminates per-call context manager overhead (~1088 calls).
Remove activation_postfunc deepcopy — copy.deepcopy() on a callable is unnecessary; callables are effectively immutable.

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

Detailed Changes: v0.15.13...v0.15.14

Contributors

Property

Assets 4

07 Mar 01:28

torchlens-release

v0.15.13

efc5229

v0.15.13

v0.15.13 (2026-03-07)

This release is published under the GPL-3.0-only License.

Performance Improvements

capture: Speed-optimized defaults and remaining bottleneck elimination (#110, 7c9a00c)

Seven targeted optimizations that reduce Swin-T log_forward_pass from 5.91s to 1.55s (3.8x):

Unified save_source_context flag (was save_call_stacks) — controls both per-function call stacks AND module source/signature fetching. Default: False. 2. save_rng_states=False default — skips per-op RNG state capture. Auto-enabled by validate_forward_pass. Uses torch_only=True when enabled (skips Python/NumPy RNG). 3. Inline isinstance in wrapped_func — _collect_tensor_args() and _collect_output_tensors() replace BFS crawls for flat arg/output cases. Falls back to BFS only for nested containers. 4. dict scan for buffer prep/cleanup — replaces iter_accessible_attributes (dir() + MRO walk) with direct dict iteration. 10x faster for buffer tagging and tensor cleanup. 5. Hoisted warnings.catch_warnings() — moved from per-attribute (46K entries) to caller level. 6. Lazy module metadata — _get_class_metadata skips inspect.getsourcelines/inspect.signature when save_source_context=False. Only captures class name and docstrings. 7. Module-level import weakref — moved from per-call in
_trim_and_reorder to module level.

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

Detailed Changes: v0.15.12...v0.15.13

Assets 4

06 Mar 23:49

torchlens-release

v0.15.12

57aaa83

v0.15.12

v0.15.12 (2026-03-06)

This release is published under the GPL-3.0-only License.

Performance Improvements

capture: O(1) tensor/param extraction via per-function ArgSpec lookup table (b1d6c56)

Replace expensive 3-level BFS crawl (~1.44s, 39% self-time, ~1.9M getattr calls) with O(1) position-based lookups using a static ArgSpec table of 350+ entries. Three-tier strategy: static table for known torch functions, dynamic cache for user-defined modules, BFS fallback (fires at most once per unique class).

Also hoists warnings.catch_warnings() from per-attribute (~77K entries) to per-call level, and adds usage stats collection + coverage test infrastructure.

Benchmark: Swin-T log_forward_pass 5.91s → 4.41s (-25%).

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

Detailed Changes: v0.15.11...v0.15.12

Assets 4

06 Mar 05:48

torchlens-release

v0.15.11

55df020

v0.15.11

v0.15.11 (2026-03-06)

This release is published under the GPL-3.0-only License.

Bug Fixes

gc: Convert back-references to weakrefs and add optional call stack collection (2bed167)

Convert 5 circular back-references from strong to weakref.ref() so ModelLog and its children (LayerPassLog, LayerLog, ModuleLog, BufferAccessor, LayerAccessor) no longer prevent timely garbage collection. GPU tensors are now freed immediately when the last strong reference to ModelLog is dropped, instead of waiting for Python's gen-2 GC cycle.

Also add save_call_stacks parameter to log_forward_pass() (default True). When False, skips _get_func_call_stack() on every tensor operation, which is the main per-op overhead in production use. Call stacks remain on by default for pedagogical use.

Fixes: GC-2, GC-3, GC-4, PERF-19

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

Documentation

Add folder-wise CLAUDE.md files and comprehensive inline documentation (deaec2d)
Add CLAUDE.md to every package directory (torchlens/, capture/, data_classes/,
decoration/, postprocess/, utils/, validation/, visualization/, tests/, scripts/,
.github/) with file maps, key concepts, gotchas, and cross-references
Add module-level docstrings, function/class docstrings, and inline comments
across all 39 source files explaining non-obvious logic, ordering dependencies,
design decisions, and invariants
Fix coverage HTML output directory in pyproject.toml to point to
tests/test_outputs/reports/coverage_html (matching conftest.py)

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

Detailed Changes: v0.15.10...v0.15.11

Assets 4

05 Mar 20:40

torchlens-release

v0.15.10

9530221

v0.15.10

v0.15.10 (2026-03-05)

This release is published under the GPL-3.0-only License.

Bug Fixes

tests: Move coverage HTML output to reports directory (916b2d1)

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

Detailed Changes: v0.15.9...v0.15.10

Assets 4

05 Mar 14:03

torchlens-release

v0.15.9

014acd8

v0.15.9

v0.15.9 (2026-03-05)

This release is published under the GPL-3.0-only License.

Bug Fixes

tests: Use render_graph return value instead of reading cleaned-up .gv file (7998776)

Commit 147c7b7 added cleanup=True to dot.render(), which deletes the intermediate .gv source file after rendering. The TestVisualizationParams tests were reading that source file and all 15 failed with FileNotFoundError.

render_graph now returns dot.source so tests can inspect the graphviz source without depending on the intermediate file.

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

Detailed Changes: v0.15.8...v0.15.9

Assets 4

05 Mar 01:12

torchlens-release

v0.15.8

e47f747

v0.15.8

v0.15.8 (2026-03-05)

This release is published under the GPL-3.0-only License.

Bug Fixes

tests: Generate HTML coverage report in sessionfinish hook (29095f9)

The pyproject.toml configured coverage_html output directory but the pytest_sessionfinish hook only generated the text report. Add cov.html_report() call so HTML reports are written alongside the text summary.

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

Detailed Changes: v0.15.7...v0.15.8

Assets 4

Releases: johnmarktaylor91/torchlens

v0.16.1

v0.16.1 (2026-03-07)

Bug Fixes

Contributors

Uh oh!

v0.16.0

v0.16.0 (2026-03-07)

Bug Fixes

Features

Testing

Uh oh!

v0.15.15

v0.15.15 (2026-03-07)

Bug Fixes

Documentation

Testing

Uh oh!

v0.15.14

v0.15.14 (2026-03-07)

Performance Improvements

Contributors

Uh oh!

v0.15.13

v0.15.13 (2026-03-07)

Performance Improvements

Uh oh!

v0.15.12

v0.15.12 (2026-03-06)

Performance Improvements

Uh oh!

v0.15.11

v0.15.11 (2026-03-06)

Bug Fixes

Documentation

Uh oh!

v0.15.10

v0.15.10 (2026-03-05)

Bug Fixes

Uh oh!

v0.15.9

v0.15.9 (2026-03-05)

Bug Fixes

Uh oh!

v0.15.8

v0.15.8 (2026-03-05)

Bug Fixes

Uh oh!