johnmarktaylor91
diff --git a/‎.github/AGENTS.md‎
Lines changed: 0 additions & 1 deletion b/‎.github/AGENTS.md‎
Lines changed: 0 additions & 1 deletion
diff --git a/‎.github/AGENTS.md‎
Lines changed: 21 additions & 0 deletions b/‎.github/AGENTS.md‎
Lines changed: 21 additions & 0 deletions
diff --git a/‎.github/CLAUDE.md‎
Lines changed: 0 additions & 21 deletions b/‎.github/CLAUDE.md‎
Lines changed: 0 additions & 21 deletions
diff --git a/‎.gitignore‎
Lines changed: 13 additions & 0 deletions b/‎.gitignore‎
Lines changed: 13 additions & 0 deletions
diff --git a/‎.project-context/architecture.md‎
Lines changed: 164 additions & 0 deletions b/‎.project-context/architecture.md‎
Lines changed: 164 additions & 0 deletions
diff --git a/‎.project-context/conventions.md‎
Lines changed: 116 additions & 0 deletions b/‎.project-context/conventions.md‎
Lines changed: 116 additions & 0 deletions
diff --git a/‎.project-context/knowledge/decisions.md‎
Lines changed: 44 additions & 0 deletions b/‎.project-context/knowledge/decisions.md‎
Lines changed: 44 additions & 0 deletions
@@ -0,0 +1,21 @@
+# .github/ — CI/CD Configuration
+
+## Workflows
+
+| File | Trigger | What It Does |
+|------|---------|-------------|
+| `workflows/lint.yml` | Push/PR | Auto-linting with ruff (`ruff format` + `ruff check --fix`), auto-commits fixes via GitHub App |
+| `workflows/quality.yml` | Push/PR | Two jobs: (1) mypy type-checking on Python 3.11, (2) pip-audit dependency audit. Both use CPU torch. |
+| `workflows/release.yml` | Push to main | Semantic-release v9 (conventional commits). Publishes to PyPI via trusted OIDC + GitHub Releases. |
+
+## Release Pipeline Details
+- Semantic-release v9 (pinned `>=9,<10`), `major_on_zero = true`
+- `fetch-tags: true` in checkout step for proper version calculation
+- PyPI trusted publishing via OIDC (no API tokens)
+- GitHub App (`torchlens-release`) for auth
+- Branch protection via rulesets
+
+## Conventions
+- Conventional commits required: `fix(scope):`, `feat(scope):`, `chore(scope):`
+- `fix:` → patch bump, `feat:` → minor bump, `feat!:` → major bump
+- `chore:`, `docs:`, `ci:`, `test:` → no release
@@ -128,3 +128,16 @@ dmypy.json
 # Pyre type checker
 .pyre/
 /tests/test_outputs/
+
+# Architect-Worker system: ephemeral task state
+.project-context/tasks/
+
+# Python
+__pycache__/
+*.pyc
+.mypy_cache/
+.ruff_cache/
+.pytest_cache/
+*.egg-info/
+dist/
+build/
@@ -0,0 +1,164 @@
+# TorchLens Architecture
+
+## Module Map
+
+### `torchlens/_state.py` (~208 lines)
+Global toggle, session state, context managers. Single source of truth for `_logging_enabled`
+bool checked by every decorated wrapper. Also stores pre-computed lookup tables, WeakSet of
+prepared models, active ModelLog reference. **Must never import other torchlens modules**
+(prevents circular deps).
+
+### `torchlens/user_funcs.py` (~664 lines)
+Public API: `log_forward_pass()`, `show_model_graph()`, `validate_forward_pass()`,
+`get_model_metadata()`, `validate_batch_of_models_and_inputs()`. Orchestrates the two-pass
+strategy when selective layers requested.
+
+### `torchlens/constants.py` (~645 lines)
+7 FIELD_ORDER tuples (canonical field sets for LayerPassLog, ModelLog, etc.), function
+discovery sets (~90 IGNORED_FUNCS, ORIG_TORCH_FUNCS listing ~2000 functions to decorate).
+
+### `torchlens/decoration/` (2 files, ~1,710 lines)
+- `torch_funcs.py` — One-time decoration of ~2000 torch functions. Core interceptor with
+  barcode nesting detection, in-place detection, DeviceContext bypass.
+- `model_prep.py` — Two-phase model preparation (permanent `_prepare_model_once` + per-session
+  `_prepare_model_session`). Module forward decorator with exhaustive/fast-path split.
+
+### `torchlens/capture/` (7 files, ~4,960 lines)
+Real-time tensor operation logging during forward pass.
+- `trace.py` — Forward-pass orchestration, session setup/cleanup
+- `output_tensors.py` — Core logging: builds LayerPassLog entries, exhaustive/fast dispatch
+- `source_tensors.py` — Logs input and buffer tensors as source nodes
+- `tensor_tracking.py` — Barcode system, parent-child links, backward hooks
+- `arg_positions.py` — O(1) tensor extraction via 3-tier lookup (639 static entries)
+- `salient_args.py` — Extracts significant function args for metadata
+- `flops.py` — Per-operation FLOPs computation (~290 ops)
+
+### `torchlens/postprocess/` (6 files, ~3,179 lines)
+18-step pipeline. Order is critical — many steps depend on prior output.
+- `graph_traversal.py` — Steps 1-4: output layers, ancestor marking, orphan removal, distance flood
+- `control_flow.py` — Steps 5-7: conditional branches (backward-only flood + AST THEN detection),
+  module fixing, buffer cleanup
+- `loop_detection.py` — Step 8: isomorphic subgraph expansion, layer assignment
+- `labeling.py` — Steps 9-12: label generation, rename, trim/reorder, lookup keys
+- `finalization.py` — Steps 13-18: undecorate, ParamLog, ModuleLog, LayerLog, mark complete
+
+### `torchlens/data_classes/` (10 files, ~3,821 lines)
+- `model_log.py` — ModelLog: top-level container, 70+ attrs
+- `layer_pass_log.py` — LayerPassLog: per-pass entry (~85+ fields)
+- `layer_log.py` — LayerLog: aggregate class grouping passes
+- `buffer_log.py` — BufferLog(LayerPassLog): buffer-specific computed properties
+- `module_log.py` — ModuleLog, ModulePassLog, ModuleAccessor
+- `param_log.py` — ParamLog (lazy grad via `_param_ref`)
+- `func_call_location.py` — Structured call stack frame with lazy properties
+- `internal_types.py` — FuncExecutionContext, VisualizationOverrides
+- `interface.py` — ModelLog query methods: `__getitem__`, `to_pandas()`, 7-step lookup cascade
+- `cleanup.py` — Post-session teardown, cycle breaking
+
+### `torchlens/validation/` (3 files, ~2,795 lines)
+- `core.py` — BFS orchestration, forward replay, perturbation checks
+- `exemptions.py` — 4 data-driven exemption registries + 16 posthoc checks
+- `invariants.py` — 18 metadata invariant categories (A-R): structural + semantic
+
+### `torchlens/visualization/` (3 files, ~2,777+ lines)
+- `rendering.py` — Graphviz rendering: nodes, edges, module subgraphs, IF/THEN labels, override system
+- `elk_layout.py` — ELK-based layout for large graphs, Worker thread, sfdp fallback
+- `dagua_bridge.py` — ModelLog → DaguaGraph conversion for dagua renderer
+
+### `torchlens/utils/` (7 files, ~950 lines)
+Stateless helpers: arg handling, tensor ops (safe_copy, tensor_nanequal), RNG capture/restore,
+barcode hashing, object introspection, display formatting, collection manipulation.
+
+## Data Flow
+
+```
+import torchlens
+  → decorate_all_once()       # wraps ~2000 torch functions permanently
+  → patch_detached_references()  # patches `from torch import cos` style refs
+
+log_forward_pass(model, input)
+  → _prepare_model_once(model)   # permanent: tl_module_address, forward wrappers
+  → _prepare_model_session(model) # per-call: requires_grad, buffers, session attrs
+  → active_logging(model_log)    # enables _logging_enabled toggle
+  →   model(input)               # forward pass — each torch op hits decorated wrapper
+  →     torch_func_decorator     # barcode nesting → bottom-level ops logged
+  →       log_function_output_tensors_exhaustive()  # builds LayerPassLog entry
+  →       OR log_function_output_tensors_fast()     # reuses prior graph structure
+  → postprocess(model_log)       # 18-step pipeline
+  →   Steps 1-4: graph cleanup (outputs, ancestors, orphans, distances)
+  →   Steps 5-7: control flow (conditionals, module fixing, buffer dedup)
+  →   Step 8: loop detection (isomorphic subgraph expansion)
+  →   Steps 9-12: labeling (raw→final labels, rename, reorder, lookup keys)
+  →   Steps 13-18: finalization (undecorate, ParamLog, ModuleLog, LayerLog)
+  → return ModelLog
+```
+
+Key types flowing between modules:
+- `Dict[str, Dict]` — raw tensor dict during capture (`_raw_tensor_dict` on ModelLog)
+- `LayerPassLog` — per-pass tensor operation entry (~85+ fields)
+- `LayerLog` — aggregate grouping passes of the same layer
+- `ModuleLog` / `ModulePassLog` — per-module metadata
+- `ParamLog` — per-parameter metadata with lazy gradient access
+
+## Key Abstractions
+
+### Toggle Architecture
+Single `_logging_enabled` bool in `_state.py`. Wrappers check it on every call — when False,
+one branch check, negligible overhead. No re-wrapping/un-wrapping per forward pass.
+
+### Two-Pass Strategy
+When user requests specific layers (not "all"/"none"), Pass 1 runs exhaustive to discover full
+graph structure, Pass 2 runs fast saving only requested activations. Counter alignment between
+passes maintained via identical increment logic.
+
+### Barcode Nesting Detection
+Random 8-char barcodes detect bottom-level vs wrapper functions. Barcode set on tensor before
+call; if unchanged after → no nested torch calls → log it. If changed → nested call already
+logged it.
+
+### Operation Equivalence Types
+Structural fingerprint: `{func_name}_{arg_hash}[_outindex{i}][_module{origin}]`. Used by
+loop detection (Step 8) to group operations into layers.
+
+### LayerLog Delegation
+Single-pass layers: `__getattr__` delegates to `passes[1]`. Multi-pass per-pass fields:
+raises **ValueError** (not AttributeError, to avoid Python's property/__getattr__ trap).
+
+## Dependency Graph
+```
+_state.py          ← imported by everything (no outgoing torchlens imports)
+constants.py       ← imported by capture/, postprocess/, data_classes/
+utils/             ← imported by capture/, postprocess/, data_classes/, validation/
+decoration/        → calls capture/ (via decorated wrappers)
+                   → reads _state.py
+capture/           → creates data_classes/ entries (LayerPassLog)
+                   → reads _state.py, constants.py
+postprocess/       → mutates data_classes/ entries
+                   → reads constants.py
+data_classes/      → references _state.py (TYPE_CHECKING only)
+validation/        → reads data_classes/, calls original torch funcs
+visualization/     → reads data_classes/ (LayerLog, ModelLog)
+user_funcs.py      → orchestrates decoration/, capture/, postprocess/, validation/, visualization/
+```
+
+## Known Complexity
+
+### Loop Detection (postprocess/loop_detection.py)
+Most complex single module. BFS expansion of isomorphic subgraphs, iso group refinement with
+direction-aware neighbor connectivity, adjacency union-find for layer assignment. Step 6's
+module suffix mutation makes `_rebuild_pass_assignments` necessary (not defensive). ~826 lines.
+
+### Exhaustive/Fast-Path Split (capture/output_tensors.py)
+Two parallel code paths that must maintain counter alignment. Fast path skips most metadata
+but must match exhaustive path's operation ordering exactly.
+
+### ELK Layout (visualization/elk_layout.py)
+Node.js subprocess with V8 heap sizing, Worker thread to prevent stack overflow, stress
+algorithm with O(n^2) memory (NEVER use for >100k nodes), Kahn's topological sort for seeding.
+
+### Circular References (data_classes/)
+ModelLog ↔ LayerPassLog ↔ ModelLog cycles. ModuleLog ↔ ModelLog cycles. ParamLog pins
+nn.Parameter. All rely on Python's cyclic GC. Explicit `cleanup()` available.
+
+### DeviceContext Bypass (decoration/torch_funcs.py)
+Python wrappers bypass C-level TorchFunctionMode dispatch. Factory functions need manual
+device kwarg injection when `torch.device('meta')` context is active (HuggingFace use case).
@@ -0,0 +1,116 @@
+# TorchLens Conventions
+
+## Naming
+
+### Files & Modules
+- Snake_case for all Python files
+- Subpackage CLAUDE.md files document each package
+
+### Variables & Attributes
+- `tl_` prefix on tensor/module attributes during logging
+  - Permanent attrs (survive sessions): `tl_module_address`, `tl_module_type`
+  - Session attrs (cleaned per-call): `tl_source_model_log`, `tl_module_pass_num`, etc.
+- `_raw_` prefix for pre-postprocessing state (e.g., `tl_tensor_label_raw`)
+- `_final_` prefix for post-processed state
+- `_orig_` prefix for original (pre-decoration) references
+- `clean_` prefix for pre-decoration torch function imports (e.g., `clean_clone = torch.clone`)
+
+### Labels
+- Source tensors: `{type}_{num}_raw` during capture (e.g., `input_0_raw`, `buffer_1_raw`)
+- Function outputs: `{type}_{num}_{counter}_raw` during capture
+- Final labels: human-readable after postprocess/labeling.py (e.g., `conv2d_1_5`)
+- Pass-qualified: `{label}:{pass_num}` (e.g., `conv2d_1_5:2`)
+
+### Classes
+- PascalCase: `ModelLog`, `LayerPassLog`, `LayerLog`, `BufferLog`, `ModuleLog`, `ParamLog`
+- Accessors: `LayerAccessor`, `ModuleAccessor`, `ParamAccessor`, `BufferAccessor`
+- Internal: `FuncExecutionContext`, `VisualizationOverrides`, `FuncCallLocation`
+
+### Constants
+- UPPER_SNAKE_CASE: `FIELD_ORDER`, `ORIG_TORCH_FUNCS`, `IGNORED_FUNCS`
+- `_DEVICE_CONSTRUCTOR_NAMES`, `_ATTR_SKIP_SET` for internal sets
+
+## Error Handling
+- Validation errors: `MetadataInvariantError(check_name, message)` — named checks A through R
+- LayerLog multi-pass access: raises **ValueError** (not AttributeError) to avoid Python's
+  property/__getattr__ trap
+- `salient_args.py` extractors: try-except returns `{}` on any error (failure-safe)
+- Validation replay: exceptions caught and returned as None (Bug #151 — known silent pass)
+- `FuncCallLocation`: lazy properties loaded via `linecache` on first access, not at construction
+
+## Testing Patterns
+
+### Fixtures (tests/conftest.py)
+- `default_input1` through `default_input4`: `(6,3,224,224)` standard image tensors
+- `zeros_input`, `ones_input`: edge-case inputs
+- `vector_input` `(5,)`, `input_2d` `(5,5)`, `input_complex` `(3,3)` complex
+- `small_input` `(2,3,32,32)`: fast metadata tests
+- Deterministic seeding: `torch.manual_seed(0)`, `torch.use_deterministic_algorithms(True)`
+
+### Markers
+- `@pytest.mark.slow` — real-world model tests taking >5 min
+- `@pytest.mark.smoke` — 18 critical-path tests for fast validation (~6s total)
+- `@pytest.mark.rare` — always excluded unless `-m rare` specified
+
+### Test Categories
+- **Toy models** (`test_toy_models.py`): `validate_saved_activations()` + `show_model_graph()` for every test
+- **Real-world** (`test_real_world_models.py`): `pytest.importorskip()` for optional deps
+- **Metadata** (`test_metadata.py`): `log_forward_pass()` directly, assert field properties
+- **Aesthetic** (`test_output_aesthetics.py`): generates PDFs for human visual inspection
+
+### Model Definitions
+All test models live in `tests/example_models.py` (~5,400 lines). New models go here.
+
+### Output
+All test outputs → `tests/test_outputs/` (gitignored):
+- `reports/` — coverage, aesthetic report, profiling
+- `visualizations/` — PDFs organized by model family subdirectories
+
+## Import Order
+stdlib → third-party → local (enforced by ruff)
+
+```python
+import os
+from typing import Dict, List, Optional
+
+import torch
+from torch import nn
+
+from .utils.tensor_utils import safe_copy
+from ._state import _logging_enabled
+```
+
+## Documentation
+- Docstring format: NumPy style
+- Type hints on all functions (including internal)
+- Top-level file comments on `.py` files where purpose isn't obvious
+- Each subpackage has a `CLAUDE.md` with file table, key functions, gotchas, known bugs
+
+## Git
+
+### Commit Messages
+Conventional commits for semantic-release:
+```
+<type>(<scope>): <description> (#<issue>)
+```
+
+Types: `fix`, `feat`, `chore`, `docs`, `ci`, `refactor`, `test`, `style`
+
+Scopes (common): `logging`, `vis`, `postprocess`, `capture`, `validation`, `decoration`,
+`data`, `state`, `utils`, `ci`, `release`, `types`
+
+### Branch Naming
+- Feature branches: `codex/<task-id>` (kebab-case task IDs)
+- One branch at a time besides main
+
+### CI/CD
+- `lint.yml`: ruff format + check on push/PR, auto-commits fixes
+- `quality.yml`: mypy + pip-audit on push/PR
+- `release.yml`: semantic-release v9 on push to main, PyPI via OIDC
+
+## Field Management
+FIELD_ORDER tuples in `constants.py` define complete field sets. When adding a new field:
+1. Add to class definition (LayerPassLog, ModelLog, etc.)
+2. Add to corresponding FIELD_ORDER in `constants.py`
+3. Add test in `test_metadata.py`
+4. Update `to_pandas()` if user-facing
@@ -0,0 +1,44 @@
+# TorchLens Architectural Decisions
+
+## 2024 — Toggle Architecture (Permanent Decoration)
+Context: Originally TorchLens re-wrapped/un-wrapped torch functions on every `log_forward_pass` call.
+Decision: Wrap all ~2000 torch functions once at `import torchlens` time, gate with single `_logging_enabled` bool.
+Rationale: Eliminates per-call decoration overhead (~200ms), makes wrappers stateless. Single bool check when disabled is negligible.
+Alternatives considered: Context-manager-based decoration (too slow), monkey-patching per call (fragile).
+
+## 2024 — Global State in _state.py
+Context: Decorated wrappers need access to session state (active ModelLog, toggle, etc.).
+Decision: Single `_state.py` module holds all mutable state. No imports from other torchlens modules.
+Rationale: Prevents circular imports. Wrappers only need to import `_state`, not heavy torchlens modules.
+Alternatives considered: Thread-local state (too complex), class-based state (no benefit over module globals).
+
+## 2025 — LayerLog Class Hierarchy (PR #92)
+Context: TensorLog was both per-pass and aggregate. RolledTensorLog was a separate class for rolled views.
+Decision: Split into LayerPassLog (per-pass) and LayerLog (aggregate). Eliminate RolledTensorLog.
+Rationale: Clean separation of concerns. LayerLog delegates to single-pass LayerPassLog via `__getattr__`.
+Alternatives considered: Keep RolledTensorLog (too much duplication).
+
+## 2025 — BufferLog Stays as Subclass
+Context: BufferLog has `name`/`module_address` fields that don't apply to generic LayerLog.
+Decision: BufferLog(LayerPassLog) keeps buffer-specific properties. Single-pass LayerLogs access them via delegation.
+Rationale: LayerLog is too generic for buffer metadata. Delegation handles the single-pass common case.
+
+## 2025 — Backward-Only Conditional Flood (Bug #88, PR #127)
+Context: Bidirectional flood from terminal booleans falsely marked non-conditional children.
+Decision: `_mark_conditional_branches` floods backward-only (parent_layers). AST-based THEN detection when `save_source_context=True`.
+Rationale: Forward flood follows data flow, not control flow. Backward-only correctly marks ancestors of the branch decision.
+
+## 2026 — ELK Stress Bypass for >100k Nodes (PR #132)
+Context: ELK stress allocates two n^2 × 8-byte distance matrices. 100k nodes = 160GB.
+Decision: >100k nodes bypass ELK entirely → Python topological layout (Kahn's algorithm, O(n+m)).
+Rationale: No size guard possible in elkjs. The old >150k stress switch was fundamentally broken.
+
+## 2026 — Dagua Integration (Opt-In)
+Context: Graphviz rendering has limitations for large graphs and interactive exploration.
+Decision: Add dagua as optional renderer (`vis_renderer="dagua"`). Graphviz remains default.
+Rationale: Dagua provides GPU-accelerated layout and richer interaction model, but visual semantics still under iteration. Keep stable default.
+
+## 2026 — Global Undecorate Override (PR latest)
+Context: Advanced users need clean PyTorch environment for benchmarking, profiling, or debugging decorator interactions.
+Decision: Expose `undecorate_all_globally()` / `redecorate_all_globally()` as explicit user API.
+Rationale: Permanent decoration is the right default, but escape hatch needed for power users.