Skip to content

Releases: johnmarktaylor91/torchlens

v0.7.1

28 Feb 20:08

Choose a tag to compare

v0.7.1 (2026-02-28)

This release is published under the GPL-3.0-only License.

Bug Fixes

  • logging: Handle complex-dtype tensors in tensor_nanequal (fe58f25)

torch.nan_to_num does not support complex tensors, which caused test_qml to fail when PennyLane quantum ops produced complex outputs. Use view_as_real/view_as_complex to handle NaN replacement for complex dtypes.

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com


Detailed Changes: v0.7.0...v0.7.1

v0.7.0

28 Feb 19:31

Choose a tag to compare

v0.7.0 (2026-02-28)

This release is published under the GPL-3.0-only License.

Bug Fixes

  • logging: Capture/restore RNG state for two-pass stochastic models (#58, 2b3079f)

Move RNG state capture/restore before pytorch decoration to prevent internal .clone() calls from being intercepted by torchlens' decorated torch functions. Also speed up test_stochastic_loop by using a higher starting value.

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

  • logging: Ensure output layer parents are saved with layers_to_save (#46, a3c74fa)

When layers_to_save is a subset, the fast pass now automatically includes parents of output layers in the save list. This ensures output layer tensor_contents is populated in postprocess_fast (which copies from parent).

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

  • logging: Replace copy.deepcopy with safe_copy to prevent infinite loops (#18, aa841fe)

copy.deepcopy hangs on complex tensor wrappers with circular references (e.g. ESCNN GeometricTensor). Replace with safe_copy_args/safe_copy_kwargs that clone tensors, recurse into standard containers, and leave other objects as references.

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

  • logging: Use argspec to disambiguate tuple/list input args (#43, 34f6687)

When a model's forward() expects a single arg that IS a tuple/list of tensors, torchlens incorrectly unpacked it into multiple positional args. Now uses inspect.getfullargspec to detect single-arg models and wraps the tuple/list as a single arg. Also handles immutable tuples in _fetch_label_move_input_tensors device-move logic.

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

  • vis: Functional ops at end of container modules rendered as ovals (#48, f886b36)

_check_if_only_non_buffer_in_module was too broad — it returned True for functional ops (like torch.relu) at the end of container modules with child submodules, causing them to render as boxes. Added a leaf-module check: only apply box rendering for modules with no child submodules.

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

Features

  • metadata: Track module training mode per submodule (#52, 6ede005)

Capture module.training in module_forward_decorator and store in ModelHistory.module_training_modes dict (keyed by module address). This lets users check whether each submodule was in train or eval mode during the forward pass.

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com


Detailed Changes: v0.6.2...v0.7.0

v0.6.2

28 Feb 17:48

Choose a tag to compare

v0.6.2 (2026-02-28)

This release is published under the GPL-3.0-only License.

Bug Fixes

  • postprocess: Correct _merge_buffer_entries call convention (#47, 0e539c4)

_merge_buffer_entries is a module-level function, not a method on ModelHistory. Fixes AttributeError when processing recurrent models with duplicate buffer entries.

Based on gilmoright's contribution in PR #56.


Detailed Changes: v0.6.1...v0.6.2

v0.6.1

28 Feb 17:31

Choose a tag to compare

v0.6.1 (2026-02-28)

This release is published under the GPL-3.0-only License.

Bug Fixes

  • tensor_log: Use getattr default in TensorLogEntry.copy() (538288d)

Prevents AttributeError when copying entries that predate newly added fields (e.g., deserialized from an older version).

Co-Authored-By: whisperLiang whisperLiang@users.noreply.github.com


Detailed Changes: v0.6.0...v0.6.1

v0.6.0

28 Feb 17:10

Choose a tag to compare

v0.6.0 (2026-02-28)

This release is published under the GPL-3.0-only License.

Features

  • flops: Add per-layer FLOPs computation for forward and backward passes (31b43e6)

Compute forward and backward FLOPs at logging time for every traced operation. Uses category-based dispatch: zero-cost ops (view, reshape, etc.), element-wise ops with per-element cost, and specialty handlers for matmul, conv, normalization, pooling, reductions, and loss functions. Unknown ops return None rather than guessing.

  • New torchlens/flops.py with compute_forward_flops / compute_backward_flops - ModelHistory gains total_flops_forward, total_flops_backward, total_flops properties and flops_by_type() method - TensorLogEntry and RolledTensorLogEntry gain flops_forward / flops_backward fields - 28 new tests in test_metadata.py (unit + integration) - scripts/check_flops_coverage.py dev utility for auditing op coverage - Move test_video_r2plus1_18 to slow test file

Based on whisperLiang's contribution in PR #53.

Co-Authored-By: whisperLiang whisperLiang@users.noreply.github.com


Detailed Changes: v0.5.0...v0.6.0

v0.5.0

28 Feb 15:36

Choose a tag to compare

v0.5.0 (2026-02-28)

This release is published under the GPL-3.0-only License.

Bug Fixes

  • logging: Revert children_tensor_versions to proven simpler detection (ade9c39)

The refactor version applied device/postfunc transforms to the stored value in children_tensor_versions, but validation compares against creation_args which are always raw. This caused fasterrcnn validation to fail. Revert to the simpler approach that stores raw arg copies and was verified passing twice.

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

  • postprocess: Gate output node variations on has_child_tensor_variations (4106be9)

Don't unconditionally store children_tensor_versions for output nodes. Gate on has_child_tensor_variations (set during exhaustive logging) to avoid false positives and preserve postfunc-applied tensor_contents.

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

  • postprocess: Rebuild pass assignments after loop detection and fix output handler (847b2a7)

Two fixes:

  1. _rebuild_pass_assignments: Multiple rounds of _expand_isomorphic_subgraphs can reassign a node to a new group while leaving stale same_layer_operations in the old group's members. This caused multiple raw tensors to map to the same layer:pass label, producing validation failures (e.g. fasterrcnn). The cleanup step groups tensors by their authoritative layer_label_raw and rebuilds consistent pass numbers.

  2. Output node handler: Replaced the has_child_tensor_variations gate with a direct comparison of actual output (with device/postfunc transforms) against tensor_contents using tensor_nanequal. This correctly handles in-place mutations through views (e.g. InPlaceZeroTensor) while preserving postfunc values for unmodified outputs.

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

  • validation: Handle bool and complex tensor perturbation properly (dfd2d7b)

  • Generate proper complex perturbations using torch.complex() instead of
    casting away imaginary part

  • Fix bool tensor crash by reordering .float().abs() (bool doesn't
    support abs, but float conversion handles it)

  • Add ContextUnet diffusion model to example_models.py for self-contained
    stable_diffusion test

  • Update test_stable_diffusion to use example_models.ContextUnet

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

Features

  • logging: Generalize was_getitem_applied to has_child_tensor_variations (1177f60)

Replace the getitem-specific parent detection with runtime mismatch detection that catches any case where a parent's tensor_contents diverges from what children actually received (getitem slicing, view mutations through shared storage, in-place ops after logging, etc.).

Key changes: - Rename was_getitem_applied → has_child_tensor_variations - Detection now compares arg copies against parent tensor_contents at child-creation time, with transform-awareness (device + postfunc) - Output nodes now detect value changes vs parent tensor_contents - Use tensor_nanequal (not torch.equal) for dtype/NaN consistency - Fix fast-mode: clear stale state on re-run, prevent double-postfunc - Use clean_to and try/finally for _pause_logging safety - Add 6 view-mutation stress tests (unsqueeze, reshape, transpose, multiple, chained, false-positive control)

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

Refactoring

  • loops: Cherry-pick approved changes from feat/loop-detection-hardening (33e0f22)

  • Rename 10 loop detection functions to clearer names (e.g. _assign_corresponding_tensors_to_same_layer → _detect_and_label_loops, _fetch_and_process_next_isomorphic_nodes → _advance_bfs_frontier) - Rename 6 local variables for clarity (e.g. node_to_iso_group_dict → node_to_iso_leader, subgraphs_dict → subgraph_info) - Add SubgraphInfo dataclass replacing dict-based subgraph bookkeeping - Replace list.pop(0) with deque.popleft() in BFS traversals - Remove ungrouped sweep in _merge_iso_groups_to_layers - Remove safe_copy in postprocess_fast (direct reference suffices) - Rewrite _get_hash_from_args to preserve positional indices, kwarg names, and dict keys via recursive _append_arg_hash helper - Remove vestigial index_in_saved_log field from TensorLogEntry, constants.py, logging_funcs.py, and postprocess.py - Fix PEP8: type(x) ==/!= Y → type(x) is/is not Y in two files - Split test_real_world_models.py into fast and slow test files - Add 12 new edge-case loop detection test models and
    test functions

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com


Detailed Changes: v0.4.1...v0.5.0

v0.4.1

26 Feb 20:35

Choose a tag to compare

v0.4.1 (2026-02-26)

This release is published under the GPL-3.0-only License.

Bug Fixes

  • loops: Refine iso groups to prevent false equivalence in loop detection (5aaad70)

When operations share the same equivalence type but occupy structurally different positions (e.g., sin(x) in a loop body vs sin(y) in a branch), the BFS expansion incorrectly groups them together. Add _refine_iso_groups to split such groups using direction-aware neighbor connectivity.

Also adds NestedParamFreeLoops test model and prefixes intentionally unused variables in test models with underscores to satisfy linting.

Code Style


Detailed Changes: v0.4.0...v0.4.1

v0.4.0

26 Feb 01:58

Choose a tag to compare

v0.4.0 (2026-02-26)

This release is published under the GPL-3.0-only License.

Chores

  • tests: Move visualization_outputs into tests/ directory (22ed018)

Anchor vis_outpath to tests/ via VIS_OUTPUT_DIR constant in conftest.py so test outputs don't pollute the project root.

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

Features

  • core: Generalize in-place op handling, fix loop grouping, and harden validation (c72d6a6)

  • Generalize in-place op detection in decorate_torch.py: use was_inplace flag based on output identity instead of hardcoded function name list, and propagate tensor labels for all in-place ops - Fix ungrouped isomorphic nodes in postprocess.py: sweep for iso nodes left without a layer group after pairwise grouping (fixes last-iteration loop subgraphs with no params) - Deduplicate ground truth output tensors by address in user_funcs.py to match trace_model.py extraction behavior - Add validation exemptions: bernoulli_/full arg mismatches from in-place RNG ops, meshgrid/broadcast_tensors multi-output perturbation, *_like ops that depend only on shape/dtype/device - Gracefully handle invalid perturbed arguments (e.g. pack_padded_sequence) instead of raising - Guard empty arg_labels in vis.py edge label rendering - Fix test stability: reduce s3d batch size, add eval mode and bool mask dtype for StyleTTS

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com


Detailed Changes: v0.3.1...v0.4.0

v0.3.1

25 Feb 21:07

Choose a tag to compare

v0.3.1 (2026-02-25)

This release is published under the GPL-3.0-only License.

Bug Fixes

  • core: Fix in-place op tracking, validation, and test stability (326b8a9)

  • Propagate tensor labels back to original tensor after in-place ops
    (setitem, zero_, delitem) so subsequent operations see updated labels

  • Add validation exemption for scalar setitem assignments

  • Fix torch.Tensor → torch.tensor for correct special value detection

  • Remove xfail marker from test_varying_loop_noparam2 (now passes)

  • Add ruff lint ignores for pre-existing E721/F401 across codebase

  • Includes prior bug-blitz fixes across logging, postprocessing, cleanup,
    helper functions, visualization, and model tracing

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com


Detailed Changes: v0.3.0...v0.3.1

v0.3.0

25 Feb 19:04

Choose a tag to compare

v0.3.0 (2026-02-25)

This release is published under the GPL-3.0-only License.

Chores

  • ci: Replace black with ruff auto-format on push (e0cb9e1)

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

  • config: Add ruff config and replace black/isort with ruff in pre-commit (c27ced8)

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

  • config: Set major_on_zero to true (d63451c)

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

Features

  • tests: Restructure test suite into focused files with metadata coverage (31b0353)

Split monolithic test_validation_and_visuals.py (3130 lines, 155 tests) into:

  • conftest.py: shared fixtures and deterministic seeding
  • test_toy_models.py: 78 tests (66 migrated + 12 new API coverage tests)
  • test_metadata.py: 44 comprehensive metadata field tests (7 test classes)
  • test_real_world_models.py: 75 tests with local imports and importorskip

New tests cover: log_forward_pass parameters (layers_to_save, save_function_args, activation_postfunc, mark_distances), get_model_metadata, ModelHistory access patterns, TensorLogEntry field validation, recurrent metadata, and GeluModel.

Removed 13 genuine size-duplicate tests (ResNet101/152, VGG19, etc.). All optional dependencies now use pytest.importorskip for graceful skipping.

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com


Detailed Changes: v0.2.0...v0.3.0