Releases · johnmarktaylor91/torchlens

05 Apr 16:15

torchlens-release

v1.0.2

3ad9577

v1.0.2 Latest

Latest

v1.0.2 (2026-04-05)

This release is published under the GPL-3.0-only License.

Bug Fixes

decoration: Python 3.14 compat -- two-pass decoration + TypeError catch (#138, e6f0f9a)

Python 3.14 (PEP 649) evaluates annotations lazily. During decorate_all_once(), wrapping Tensor.bool before inspecting Tensor.dim_order caused inspect.signature() to resolve bool in bool | list[torch.memory_format] to the wrapper function instead of the builtin type, raising TypeError on first call only.

Three fixes:

Catch TypeError alongside ValueError in get_func_argnames (safety net)
Split decorate_all_once() into two passes: collect argnames from
pristine namespace first, then decorate (eliminates root cause)
Replace _orig_to_decorated idempotency guard with _is_decorated flag
so partial decoration failure allows retry instead of locking in
incomplete state

6 new tests, gotchas.md updated.

Detailed Changes: v1.0.1...v1.0.2

Assets 4

23 Mar 01:31

torchlens-release

v1.0.1

1af6785

v1.0.1

v1.0.1 (2026-03-23)

This release is published under the GPL-3.0-only License.

Bug Fixes

decoration: Clear stale sq_item C slot after wrapping Tensor.getitem (b2c6085)

When getitem is replaced on a C extension type with a Python function, CPython sets the sq_item slot in tp_as_sequence. This makes PySequence_Check(tensor) return True (was False in clean PyTorch), causing torch.tensor([0-d_tensor, ...]) to iterate elements as sequences and call len() -- which raises TypeError for 0-d tensors. The slot is never cleared by restoring the original wrapper_descriptor or by delattr.

Fix: null sq_item via ctypes after every decoration/undecoration cycle (decorate_all_once, unwrap_torch, wrap_torch). Safe because tensor indexing uses mp_subscript (mapping protocol), not sq_item (sequence protocol). Verified via tp_name guard; fails silently on non-CPython.

Adds 9 regression tests covering all lifecycle paths.

Chores

Add secret detection pre-commit hooks (0e2889a)

Add detect-private-key (pre-commit-hooks) and detect-secrets (Yelp) to catch leaked keys, tokens, and high-entropy strings before they hit the repo.

Detailed Changes: v1.0.0...v1.0.1

Assets 4

13 Mar 22:06

torchlens-release

v1.0.0

bd9ca16

v1.0.0

v1.0.0 (2026-03-13)

This release is published under the GPL-3.0-only License.

Bug Fixes

decoration: Cast mode.device to str for mypy return-value check (45c0ff3)

CI mypy (stricter torch stubs) catches that mode.device returns torch.device, not str. Explicit str() cast satisfies the Optional[str] return type annotation.

Features

decoration: Lazy wrapping — import torchlens has no side effects (b5da8b8)

BREAKING CHANGE: torch functions are no longer wrapped at import time. Wrapping happens lazily on first log_forward_pass() call and persists.

Three changes:

Lazy decoration: removed decorate_all_once() / patch_detached_references() calls from init.py. _ensure_model_prepared() triggers wrapping on first use via wrap_torch().
Public wrap/unwrap API: - torchlens.wrap_torch() — install wrappers (idempotent) - torchlens.unwrap_torch() — restore original torch callables - torchlens.wrapped() — context manager (wrap on enter, unwrap on exit) - log_forward_pass(unwrap_when_done=True) — one-shot convenience Old names (undecorate_all_globally, redecorate_all_globally) kept as internal aliases.
torch.identity fix: decorated identity function now stored on _state._decorated_identity instead of monkey-patching torch.identity (which doesn't exist in PyTorch type stubs). Eliminates 2 mypy errors.

Tests updated: 75 pass including 12 new lifecycle tests.

Breaking Changes

decoration: Torch functions are no longer wrapped at import time. Wrapping happens lazily on first log_forward_pass() call and persists.

Detailed Changes: v0.22.0...v1.0.0

Assets 4

13 Mar 20:13

torchlens-release

v0.22.0

fcb912c

v0.22.0

v0.22.0 (2026-03-13)

This release is published under the GPL-3.0-only License.

Chores

types: Remove stale typing noise (5ba099d)

Documentation

maintenance: Refresh maintainer notes (685a358)
maintenance: Split CLAUDE.md/AGENTS.md into architect vs implementation roles (18f8ae9)

Break the symlink mirroring convention: CLAUDE.md now holds architect-level context (what, why, how it connects) while AGENTS.md holds implementation-level context (conventions, gotchas, known bugs, test commands). Pure-implementation subdirs (.github, scripts, tests, utils) get AGENTS.md only. Also populates .project-context/ templates (architecture, conventions, gotchas, decisions).

Features

decoration: Add global undecorate override (b0bafeb)
viz: Add dagua torchlens integration (35d5bcd)

Refactoring

types: Finish package mypy cleanup (6f9a3fe)

Detailed Changes: v0.21.3...v0.22.0

Assets 4

11 Mar 23:34

torchlens-release

v0.21.3

04fa27a

v0.21.3

v0.21.3 (2026-03-11)

This release is published under the GPL-3.0-only License.

Bug Fixes

tests: Make SIGALRM signal safety test deterministic (b3fc461)

Replace timer-based SIGALRM with direct os.kill() inside forward() so the signal always fires mid-logging. Eliminates flaky skips when the forward pass completes before the timer.

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

Detailed Changes: v0.21.2...v0.21.3

Assets 4

09 Mar 17:53

torchlens-release

v0.21.2

fa72687

v0.21.2

v0.21.2 (2026-03-09)

This release is published under the GPL-3.0-only License.

Bug Fixes

vis: Avoid graphviz.Digraph memory bomb when ELK fails on large graphs (f5563ee)

When ELK layout fails (OOM/timeout) on 1M+ node graphs, the fallback path previously built a graphviz.Digraph in Python — nested subgraph body-list copies exploded memory and hung indefinitely. Now render_elk_direct handles the failure internally: reuses already-collected Phase 1 data to generate DOT text without positions and renders directly with sfdp, bypassing graphviz.Digraph entirely.

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

vis: Bypass ELK for large graphs — use Python topological layout (37cce3a)

ELK's stress algorithm allocates TWO O(n²) distance matrices (n² × 16 bytes). At 100k nodes that's 160 GB, at 1M nodes it's 16 TB — the root cause of the std::bad_alloc. The old >150k stress switch could never work.

For graphs above 100k nodes, we now skip ELK entirely and compute a topological rank layout in Python (Kahn's algorithm, O(n+m)). Module bounding boxes are computed from node positions. The result feeds into the same neato -n rendering path, preserving cluster boxes.

If ELK fails for smaller graphs, the Python layout is also used as a fallback instead of the old sfdp path that built a graphviz.Digraph (which exploded on nested subgraph body-list copies).

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

Detailed Changes: v0.21.1...v0.21.2

Assets 4

09 Mar 15:04

torchlens-release

v0.21.1

364ea39

v0.21.1

v0.21.1 (2026-03-09)

This release is published under the GPL-3.0-only License.

Bug Fixes

postprocess: Fix mypy type errors in _build_module_param_info (11ea006)

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

Chores

Trigger CI (99f4102)

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

Performance Improvements

postprocess: Optimize pipeline for large models (a211417)
Per-step verbose timing: unwrap grouped _vtimed blocks into individual step timing with graph-stats summary, enabling users to identify which specific step is slow (O16) - Cache module_str by containing_modules tuple to avoid redundant string joins in Step 6 (O8) - Early-continue guards in _undecorate_all_saved_tensors to skip BFS on layers with empty captured_args/kwargs (O5) - Pre-compute buffer_layers_by_module dict in _build_module_logs, eliminating O(modules × buffers) scan per module (O6) - Single-pass arglist rebuild in Step 11 rename, replacing 3-pass enumerate + index set + filter pattern (O2) - Replace OrderedDict with dict in _trim_and_reorder (Python 3.7+ preserves insertion order) for lower allocation overhead (O4) - Reverse-index approach in _refine_iso_groups: O(members × neighbors) instead of O(members²) all-pairs combinations (O9) - Pre-compute param types per subgraph as frozenset before pair loop in _merge_iso_groups_to_layers (O10) - Set-based O(n) collision
detection replacing O(n²) .count() calls in _find_isomorphic_matches (O12)

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

Detailed Changes: v0.21.0...v0.21.1

Assets 4

09 Mar 12:04

torchlens-release

v0.21.0

bd8b348

v0.21.0

v0.21.0 (2026-03-09)

This release is published under the GPL-3.0-only License.

Bug Fixes

capture: Fix mypy type errors in output_tensors field dict (d54e9a9)

Annotate fields_dict as Dict[str, Any] and extract param_shapes with proper type to satisfy mypy strict inference.

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

vis: Pass heap limits to ELK Worker thread to prevent OOM on 1M nodes (23ef8d8)

The Node.js Worker running ELK layout had no explicit maxOldGenerationSizeMb in its resourceLimits — only stackSizeMb was set. The --max-old-space-size flag controls the main thread's V8 isolate, not the Worker's. This caused the Worker to OOM at ~16GB on 1M-node graphs despite the main thread being configured for up to 64GB.

Add maxOldGenerationSizeMb and maxYoungGenerationSizeMb to Worker
resourceLimits, passed via _TL_HEAP_MB env var
Add _available_memory_mb() to detect system RAM and cap heap allocation
to (available - 4GB), preventing competition with Python process
Include available system memory in OOM diagnostic messages

Also includes field/param renames from feat/grand-rename branch.

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

Documentation

Update all CLAUDE.md files with deepdive session 4 findings (b15c5bf)

Sync all project and subpackage documentation with current codebase:

Updated line counts across all 36 modules
Added elk_layout.py documentation to visualization/
Added arg_positions.py and salient_args.py to capture/
Documented 13 new bugs (ELK-IF-THEN, BFLOAT16-TOL, etc.)
Updated test counts (1,004 tests across 16 files)
Added known bugs sections to validation/, utils/, decoration/
Updated data_classes/ with new fields and properties

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

Features

Rename all data structure fields and function args for clarity (f0d7452)

Rename ~68 fields across all 8 data structures (ModelLog, LayerPassLog, LayerLog, ParamLog, ModuleLog, BufferLog, ModulePassLog, FuncCallLocation) plus user-facing function arguments. Key changes:

tensor_contents → activation, grad_contents → gradient
All _fsize → _memory (e.g. tensor_fsize → tensor_memory)
func_applied_name → func_name, gradfunc → grad_fn_name
is_bottom_level_submodule_output → is_leaf_module_output
containing_module_origin → containing_module
spouse_layers → co_parent_layers, orig_ancestors → root_ancestors
model_is_recurrent → is_recurrent, elapsed_time_* → time_*
vis_opt → vis_mode, save_only → vis_save_only
Fix typo: output_descendents → output_descendants

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

Detailed Changes: v0.20.5...v0.21.0

Assets 4

09 Mar 02:49

torchlens-release

v0.20.5

9b65063

v0.20.5

v0.20.5 (2026-03-09)

This release is published under the GPL-3.0-only License.

Bug Fixes

vis: Prevent OOM kill on 1M-node ELK render (#128, d9a1525)

The 1M-node render was OOM-killed at ~74GB RSS because: 1. Model params (~8-10GB) stayed alive during ELK subprocess 2. preexec_fn forced fork+exec, COW-doubling the 74GB process 3. Heap/stack formulas produced absurd values (5.6TB heap, 15GB stack) 4. No memory cleanup before subprocess launch

Changes:

render_large_graph.py: separate log_forward_pass from render_graph,
free model/autograd before ELK render
elk_layout.py: cap heap at 64GB, stack floor 4096MB/cap 8192MB,
write JSON to temp file (free string before subprocess), gc.collect
before subprocess, set RLIMIT_STACK at module level (removes
preexec_fn and the forced fork+exec)

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

Detailed Changes: v0.20.4...v0.20.5

Assets 4

09 Mar 01:03

torchlens-release

v0.20.4

aa8c0eb

v0.20.4

v0.20.4 (2026-03-09)

This release is published under the GPL-3.0-only License.

Bug Fixes

postprocess: Backward-only flood in conditional branch detection + THEN labeling (#88, d737828)

Bug #88: _mark_conditional_branches flooded bidirectionally (parents + children), causing non-conditional nodes' children to be falsely marked as in_cond_branch. Fix restricts flooding to parent_layers only.

Additionally adds THEN branch detection via AST analysis when save_source_context=True, with IF/THEN edge labels in visualization. Includes 8 new test models, 22 new tests, and fixes missing 'verbose' in MODEL_LOG_FIELD_ORDER.

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

vis: Use Worker thread for ELK layout to fix stack overflow on large graphs (3fe6a84)

V8's --stack-size flag silently caps at values well below what's requested, causing "Maximum call stack size exceeded" on 1M+ node graphs. Switch to Node.js Worker threads with resourceLimits.stackSizeMb, which reliably delivers the requested stack size at the V8 isolate level.

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

Detailed Changes: v0.20.3...v0.20.4

Assets 4

Releases: johnmarktaylor91/torchlens

v1.0.2

v1.0.2 (2026-04-05)

Bug Fixes

Uh oh!

v1.0.1

v1.0.1 (2026-03-23)

Bug Fixes

Chores

Uh oh!

v1.0.0

v1.0.0 (2026-03-13)

Bug Fixes

Features

Breaking Changes

Uh oh!

v0.22.0

v0.22.0 (2026-03-13)

Chores

Documentation

Features

Refactoring

Uh oh!

v0.21.3

v0.21.3 (2026-03-11)

Bug Fixes

Uh oh!

v0.21.2

v0.21.2 (2026-03-09)

Bug Fixes

Uh oh!

v0.21.1

v0.21.1 (2026-03-09)

Bug Fixes

Chores

Performance Improvements

Uh oh!

v0.21.0

v0.21.0 (2026-03-09)

Bug Fixes

Documentation

Features

Uh oh!

v0.20.5

v0.20.5 (2026-03-09)

Bug Fixes

Uh oh!

v0.20.4

v0.20.4 (2026-03-09)

Bug Fixes

Uh oh!