Releases: johnmarktaylor91/torchlens
v0.7.1
v0.7.1 (2026-02-28)
This release is published under the GPL-3.0-only License.
Bug Fixes
- logging: Handle complex-dtype tensors in tensor_nanequal (
fe58f25)
torch.nan_to_num does not support complex tensors, which caused test_qml to fail when PennyLane quantum ops produced complex outputs. Use view_as_real/view_as_complex to handle NaN replacement for complex dtypes.
Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com
Detailed Changes: v0.7.0...v0.7.1
v0.7.0
v0.7.0 (2026-02-28)
This release is published under the GPL-3.0-only License.
Bug Fixes
Move RNG state capture/restore before pytorch decoration to prevent internal .clone() calls from being intercepted by torchlens' decorated torch functions. Also speed up test_stochastic_loop by using a higher starting value.
Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com
When layers_to_save is a subset, the fast pass now automatically includes parents of output layers in the save list. This ensures output layer tensor_contents is populated in postprocess_fast (which copies from parent).
Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com
copy.deepcopy hangs on complex tensor wrappers with circular references (e.g. ESCNN GeometricTensor). Replace with safe_copy_args/safe_copy_kwargs that clone tensors, recurse into standard containers, and leave other objects as references.
Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com
When a model's forward() expects a single arg that IS a tuple/list of tensors, torchlens incorrectly unpacked it into multiple positional args. Now uses inspect.getfullargspec to detect single-arg models and wraps the tuple/list as a single arg. Also handles immutable tuples in _fetch_label_move_input_tensors device-move logic.
Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com
_check_if_only_non_buffer_in_module was too broad — it returned True for functional ops (like torch.relu) at the end of container modules with child submodules, causing them to render as boxes. Added a leaf-module check: only apply box rendering for modules with no child submodules.
Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com
Features
Capture module.training in module_forward_decorator and store in ModelHistory.module_training_modes dict (keyed by module address). This lets users check whether each submodule was in train or eval mode during the forward pass.
Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com
Detailed Changes: v0.6.2...v0.7.0
v0.6.2
v0.6.2 (2026-02-28)
This release is published under the GPL-3.0-only License.
Bug Fixes
_merge_buffer_entries is a module-level function, not a method on ModelHistory. Fixes AttributeError when processing recurrent models with duplicate buffer entries.
Based on gilmoright's contribution in PR #56.
Detailed Changes: v0.6.1...v0.6.2
v0.6.1
v0.6.1 (2026-02-28)
This release is published under the GPL-3.0-only License.
Bug Fixes
- tensor_log: Use getattr default in TensorLogEntry.copy() (
538288d)
Prevents AttributeError when copying entries that predate newly added fields (e.g., deserialized from an older version).
Co-Authored-By: whisperLiang whisperLiang@users.noreply.github.com
Detailed Changes: v0.6.0...v0.6.1
v0.6.0
v0.6.0 (2026-02-28)
This release is published under the GPL-3.0-only License.
Features
- flops: Add per-layer FLOPs computation for forward and backward passes (
31b43e6)
Compute forward and backward FLOPs at logging time for every traced operation. Uses category-based dispatch: zero-cost ops (view, reshape, etc.), element-wise ops with per-element cost, and specialty handlers for matmul, conv, normalization, pooling, reductions, and loss functions. Unknown ops return None rather than guessing.
- New torchlens/flops.py with compute_forward_flops / compute_backward_flops - ModelHistory gains total_flops_forward, total_flops_backward, total_flops properties and flops_by_type() method - TensorLogEntry and RolledTensorLogEntry gain flops_forward / flops_backward fields - 28 new tests in test_metadata.py (unit + integration) - scripts/check_flops_coverage.py dev utility for auditing op coverage - Move test_video_r2plus1_18 to slow test file
Based on whisperLiang's contribution in PR #53.
Co-Authored-By: whisperLiang whisperLiang@users.noreply.github.com
Detailed Changes: v0.5.0...v0.6.0
v0.5.0
v0.5.0 (2026-02-28)
This release is published under the GPL-3.0-only License.
Bug Fixes
- logging: Revert children_tensor_versions to proven simpler detection (
ade9c39)
The refactor version applied device/postfunc transforms to the stored value in children_tensor_versions, but validation compares against creation_args which are always raw. This caused fasterrcnn validation to fail. Revert to the simpler approach that stores raw arg copies and was verified passing twice.
Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com
- postprocess: Gate output node variations on has_child_tensor_variations (
4106be9)
Don't unconditionally store children_tensor_versions for output nodes. Gate on has_child_tensor_variations (set during exhaustive logging) to avoid false positives and preserve postfunc-applied tensor_contents.
Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com
- postprocess: Rebuild pass assignments after loop detection and fix output handler (
847b2a7)
Two fixes:
-
_rebuild_pass_assignments: Multiple rounds of _expand_isomorphic_subgraphs can reassign a node to a new group while leaving stale same_layer_operations in the old group's members. This caused multiple raw tensors to map to the same layer:pass label, producing validation failures (e.g. fasterrcnn). The cleanup step groups tensors by their authoritative layer_label_raw and rebuilds consistent pass numbers.
-
Output node handler: Replaced the has_child_tensor_variations gate with a direct comparison of actual output (with device/postfunc transforms) against tensor_contents using tensor_nanequal. This correctly handles in-place mutations through views (e.g. InPlaceZeroTensor) while preserving postfunc values for unmodified outputs.
Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com
-
validation: Handle bool and complex tensor perturbation properly (
dfd2d7b) -
Generate proper complex perturbations using torch.complex() instead of
casting away imaginary part -
Fix bool tensor crash by reordering .float().abs() (bool doesn't
support abs, but float conversion handles it) -
Add ContextUnet diffusion model to example_models.py for self-contained
stable_diffusion test -
Update test_stable_diffusion to use example_models.ContextUnet
Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com
Features
- logging: Generalize was_getitem_applied to has_child_tensor_variations (
1177f60)
Replace the getitem-specific parent detection with runtime mismatch detection that catches any case where a parent's tensor_contents diverges from what children actually received (getitem slicing, view mutations through shared storage, in-place ops after logging, etc.).
Key changes: - Rename was_getitem_applied → has_child_tensor_variations - Detection now compares arg copies against parent tensor_contents at child-creation time, with transform-awareness (device + postfunc) - Output nodes now detect value changes vs parent tensor_contents - Use tensor_nanequal (not torch.equal) for dtype/NaN consistency - Fix fast-mode: clear stale state on re-run, prevent double-postfunc - Use clean_to and try/finally for _pause_logging safety - Add 6 view-mutation stress tests (unsqueeze, reshape, transpose, multiple, chained, false-positive control)
Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com
Refactoring
-
loops: Cherry-pick approved changes from feat/loop-detection-hardening (
33e0f22) -
Rename 10 loop detection functions to clearer names (e.g. _assign_corresponding_tensors_to_same_layer → _detect_and_label_loops, _fetch_and_process_next_isomorphic_nodes → _advance_bfs_frontier) - Rename 6 local variables for clarity (e.g. node_to_iso_group_dict → node_to_iso_leader, subgraphs_dict → subgraph_info) - Add SubgraphInfo dataclass replacing dict-based subgraph bookkeeping - Replace list.pop(0) with deque.popleft() in BFS traversals - Remove ungrouped sweep in _merge_iso_groups_to_layers - Remove safe_copy in postprocess_fast (direct reference suffices) - Rewrite _get_hash_from_args to preserve positional indices, kwarg names, and dict keys via recursive _append_arg_hash helper - Remove vestigial index_in_saved_log field from TensorLogEntry, constants.py, logging_funcs.py, and postprocess.py - Fix PEP8: type(x) ==/!= Y → type(x) is/is not Y in two files - Split test_real_world_models.py into fast and slow test files - Add 12 new edge-case loop detection test models and
test functions
Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com
Detailed Changes: v0.4.1...v0.5.0
v0.4.1
v0.4.1 (2026-02-26)
This release is published under the GPL-3.0-only License.
Bug Fixes
- loops: Refine iso groups to prevent false equivalence in loop detection (
5aaad70)
When operations share the same equivalence type but occupy structurally different positions (e.g., sin(x) in a loop body vs sin(y) in a branch), the BFS expansion incorrectly groups them together. Add _refine_iso_groups to split such groups using direction-aware neighbor connectivity.
Also adds NestedParamFreeLoops test model and prefixes intentionally unused variables in test models with underscores to satisfy linting.
Code Style
- Auto-format with ruff (
b9413f1)
Detailed Changes: v0.4.0...v0.4.1
v0.4.0
v0.4.0 (2026-02-26)
This release is published under the GPL-3.0-only License.
Chores
- tests: Move visualization_outputs into tests/ directory (
22ed018)
Anchor vis_outpath to tests/ via VIS_OUTPUT_DIR constant in conftest.py so test outputs don't pollute the project root.
Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com
Features
-
core: Generalize in-place op handling, fix loop grouping, and harden validation (
c72d6a6) -
Generalize in-place op detection in decorate_torch.py: use
was_inplaceflag based on output identity instead of hardcoded function name list, and propagate tensor labels for all in-place ops - Fix ungrouped isomorphic nodes in postprocess.py: sweep for iso nodes left without a layer group after pairwise grouping (fixes last-iteration loop subgraphs with no params) - Deduplicate ground truth output tensors by address in user_funcs.py to match trace_model.py extraction behavior - Add validation exemptions: bernoulli_/full arg mismatches from in-place RNG ops, meshgrid/broadcast_tensors multi-output perturbation, *_like ops that depend only on shape/dtype/device - Gracefully handle invalid perturbed arguments (e.g. pack_padded_sequence) instead of raising - Guard empty arg_labels in vis.py edge label rendering - Fix test stability: reduce s3d batch size, add eval mode and bool mask dtype for StyleTTS
Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com
Detailed Changes: v0.3.1...v0.4.0
v0.3.1
v0.3.1 (2026-02-25)
This release is published under the GPL-3.0-only License.
Bug Fixes
-
core: Fix in-place op tracking, validation, and test stability (
326b8a9) -
Propagate tensor labels back to original tensor after in-place ops
(setitem, zero_, delitem) so subsequent operations see updated labels -
Add validation exemption for scalar setitem assignments
-
Fix torch.Tensor → torch.tensor for correct special value detection
-
Remove xfail marker from test_varying_loop_noparam2 (now passes)
-
Add ruff lint ignores for pre-existing E721/F401 across codebase
-
Includes prior bug-blitz fixes across logging, postprocessing, cleanup,
helper functions, visualization, and model tracing
Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com
Detailed Changes: v0.3.0...v0.3.1
v0.3.0
v0.3.0 (2026-02-25)
This release is published under the GPL-3.0-only License.
Chores
- ci: Replace black with ruff auto-format on push (
e0cb9e1)
Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com
- config: Add ruff config and replace black/isort with ruff in pre-commit (
c27ced8)
Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com
- config: Set major_on_zero to true (
d63451c)
Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com
Features
- tests: Restructure test suite into focused files with metadata coverage (
31b0353)
Split monolithic test_validation_and_visuals.py (3130 lines, 155 tests) into:
- conftest.py: shared fixtures and deterministic seeding
- test_toy_models.py: 78 tests (66 migrated + 12 new API coverage tests)
- test_metadata.py: 44 comprehensive metadata field tests (7 test classes)
- test_real_world_models.py: 75 tests with local imports and importorskip
New tests cover: log_forward_pass parameters (layers_to_save, save_function_args, activation_postfunc, mark_distances), get_model_metadata, ModelHistory access patterns, TensorLogEntry field validation, recurrent metadata, and GeluModel.
Removed 13 genuine size-duplicate tests (ResNet101/152, VGG19, etc.). All optional dependencies now use pytest.importorskip for graceful skipping.
Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com
Detailed Changes: v0.2.0...v0.3.0