@@ -10,11 +10,11 @@ Public summary of TorchLens test suite outcomes. Updated after each release.
1010
1111| Metric | Value |
1212| --------| -------|
13- | Total tests | 834 |
13+ | Total tests | 882 |
1414| Smoke tests (` -m smoke ` ) | 18 |
1515| Test files | 14 |
16- | Example models (toy) | 221 |
17- | Real-world models | 150 |
16+ | Example models (toy) | 241 |
17+ | Real-world models | 183 |
1818
1919** Run the suite:**
2020``` bash
@@ -30,8 +30,8 @@ pytest tests/test_profiling.py -vs # profiling report
3030
3131| File | Tests | What it covers |
3232| ------| ------:| ----------------|
33- | test_toy_models.py | 222 | API coverage on 221 example models (log, validate, visualize, metadata) |
34- | test_real_world_models.py | 150 | Real-world architectures: validation + visualization |
33+ | test_toy_models.py | 242 | API coverage on 241 example models (log, validate, visualize, metadata) |
34+ | test_real_world_models.py | 183 | Real-world architectures: validation + visualization |
3535| test_metadata.py | 107 | Field invariants, FLOPs, timing, RNG, func_call_location, corruption detection |
3636| test_param_log.py | 70 | ParamLog, ParamAccessor, shared params, grad metadata |
3737| test_decoration.py | 61 | Toggle state, detached imports, pause_logging, JIT compat, signal safety |
@@ -49,43 +49,49 @@ pytest tests/test_profiling.py -vs # profiling report
4949
5050## Model Compatibility
5151
52- ### Toy Models (221 architectures)
52+ ### Toy Models (241 architectures)
5353
54- All 221 example models in ` tests/example_models.py ` pass ` validate_forward_pass ` .
54+ All 241 example models in ` tests/example_models.py ` pass ` validate_forward_pass ` .
5555
5656** Core patterns:** simple feedforward (incl. LeNet-5), branching, conditionals,
575748 loop/recurrence variants, in-place ops, view mutations, edge cases.
5858
5959** Attention variants:** multi-head, multi-query (MQA), grouped-query (GQA), RoPE,
6060ALiBi, slot attention, cross-attention (Perceiver-style), axial attention,
6161CBAM (channel+spatial), scaled dot-product, transformer encoder/decoder,
62- embedding+positional.
62+ embedding+positional, differential attention (noise cancellation),
63+ relative position bias (T5-style), coordinate attention (factorized H/W),
64+ efficient channel attention (ECA, 1D conv).
6365
6466** Gating & skip patterns:** highway network, squeeze-and-excitation, depthwise
6567separable conv, inverted residual (MobileNetV2), feature pyramid network (FPN),
66- residual blocks, shared-param branching.
68+ residual blocks, shared-param branching, channel shuffle (ShuffleNet-style) .
6769
6870** Generative & self-supervised:** VAE, hierarchical VAE, VQ-VAE, beta-VAE, CVAE,
69- GAN (generator + discriminator), diffusion, normalizing flow, WaveNet-style gated
70- convolutions, PixelCNN masked convolutions, SimCLR contrastive, BYOL-style
71- stop-gradient, Barlow Twins (cross-correlation), adaptive instance normalization
72- (AdaIN).
71+ Gumbel-Softmax VQ, GAN (generator + discriminator), diffusion, normalizing flow,
72+ WaveNet-style gated convolutions, PixelCNN masked convolutions, SimCLR contrastive,
73+ BYOL-style stop-gradient, Barlow Twins (cross-correlation), adaptive instance
74+ normalization (AdaIN).
7375
74- ** Sequence models:** BiLSTM (bidirectional), seq2seq with Bahdanau attention.
76+ ** Sequence models:** GRU, BiLSTM (bidirectional), seq2seq with Bahdanau attention.
7577
7678** Exotic architectures:** hypernetwork (weight generation), deep equilibrium model
7779(DEQ, fixed-point iteration), neural ODE (Euler integration), NTM-style memory
78- augmented network, SwiGLU FFN, Fourier mixing (FNet-style), spatial transformer
79- network.
80+ augmented network, end-to-end memory network (multi-hop), SwiGLU FFN, Fourier
81+ mixing (FNet-style), spatial transformer network, SIREN (sinusoidal activations),
82+ radial basis function network (RBF).
8083
8184** Graph neural networks:** GCN, GAT, GraphSAGE, GIN, EdgeConv (DGCNN), graph
82- transformer.
85+ transformer, Chebyshev spectral GCN .
8386
8487** Architecture patterns:** MLP-Mixer, Siamese, triplet network (metric learning),
85- capsule network, U-Net, TCN (temporal conv net), super-resolution (PixelShuffle),
86- PointNet, actor-critic, two-tower recommender, deep & cross network (recommender),
87- depth estimator, dueling DQN, mixture of experts (MoE), RMS normalization,
88- sparse/pruned networks.
88+ prototypical network (few-shot), capsule network, U-Net, TCN (temporal conv net),
89+ super-resolution (PixelShuffle), PointNet, actor-critic, two-tower recommender,
90+ deep & cross network (recommender), wide & deep (recommender), depth estimator,
91+ dueling DQN, mixture of experts (MoE), RMS normalization, sparse/pruned networks,
92+ early exit (multi-head), multi-scale parallel streams (HRNet-style), multi-task
93+ (shared trunk + task heads), FiLM conditioning, partial convolution (inpainting),
94+ Network in Network (1x1 conv + GAP), pixel shuffle upsampling.
8995
9096** Autoencoders:** vanilla, convolutional, sparse, denoising, VQ-VAE, beta-VAE, CVAE.
9197
@@ -99,30 +105,33 @@ sparse/pruned networks.
99105| ** CORnet** | Z, S, R, RT | 4/4 pass |
100106| ** timm (original)** | BEiT, GluonResNeXt, ECAResNet, MobileViT, ADV-Inception, CaiT, CoAT, ConViT, DarkNet, GhostNet, MixNet, PoolFormer, ResNeSt, EdgeNeXt, HardCoreNAS, SEMNASNet, XCiT, SEResNet | 18/18 pass |
101107| ** timm (additional)** | HRNet, EfficientNetV2, LeViT, CrossViT, PVT-v2, Twins-SVT, FocalNet, Res2Net, gMLP, ResMLP, EVA-02 | 11/11 pass |
108+ | ** timm (set 3)** | ConvNeXt-v2, NFNet, DaViT, CoAtNet, RepVGG, ReXNet, PiT, Visformer, GC-ViT, EfficientFormer, FastViT, NesT, Sequencer2D, TResNet | 14/14 pass |
102109| ** Audio (original)** | Conv-TasNet, Wav2Letter, HuBERT, Wav2Vec2, DeepSpeech, Conformer, Whisper-tiny | 7/7 pass |
103110| ** Audio (additional)** | AST, CLAP, EnCodec, SEW, SpeechT5, VITS | 6/6 pass |
111+ | ** Audio (set 3)** | WavLM, Data2VecAudio, UniSpeech | 3/3 pass |
104112| ** Language (original)** | LSTM, RNN, GPT-2, BERT, DistilBERT, ELECTRA, T5-small, BART, RoBERTa, Sentence-BERT | 10/10 pass |
105- | ** Decoder-Only LLMs** | LLaMA, Mistral, Phi, Gemma, Qwen2, Falcon, BLOOM, OPT, OLMo | 9/9 pass |
106- | ** Encoder-Only (additional)** | ALBERT, DeBERTa-v2, XLM-RoBERTa | 3/3 pass |
107- | ** Encoder-Decoder (additional)** | Pegasus, LED | 2/2 pass |
113+ | ** Decoder-Only LLMs** | LLaMA, Mistral, Phi, Gemma, Qwen2, Falcon, BLOOM, OPT, OLMo, GPT-J, GPTBigCode, GPT-NeoX | 12/12 pass |
114+ | ** Encoder-Only (additional)** | ALBERT, DeBERTa-v2, XLM-RoBERTa, Funnel Transformer, CANINE, MobileBERT | 6/6 pass |
115+ | ** Encoder-Decoder (additional)** | Pegasus, LED, mBART, ProphetNet | 4/4 pass |
108116| ** Efficient Transformers** | FNet, Nystromformer, BigBird, Longformer, Reformer | 5/5 pass |
109117| ** State Space Models** | Mamba, Mamba-2, RWKV, Falcon-Mamba | 4/4 pass |
110118| ** Mixture of Experts** | Mixtral, Switch Transformer, MoE (toy) | 3/3 pass |
111119| ** Autoencoders** | ViT-MAE (ForPreTraining) | 1/1 pass |
112- | ** Multimodal / Special** | Stable Diffusion (UNet), StyleTTS, QML, Lightning, CLIP, BLIP, ViT-MAE | 7/7 pass |
120+ | ** Multimodal / Special** | Stable Diffusion (UNet), StyleTTS, QML, Lightning, CLIP, BLIP, ViT-MAE, SigLIP, BLIP-2 | 9/9 pass |
113121| ** Vision Transformers (HF)** | DeiT, CvT, SegFormer, DINOv2 | 4/4 pass |
114122| ** Perceiver** | Perceiver IO | 1/1 pass |
115123| ** Segmentation** | DeepLab-v3 (ResNet50), DeepLab-v3 (MobileNet), LRASPP, FCN-ResNet50 | 4/4 pass |
116124| ** Detection (original)** | Faster R-CNN (train+eval), FCOS (train+eval), RetinaNet (train+eval), SSD300 (train+eval) | 8/8 pass |
117- | ** Detection (additional)** | DETR, Mask R-CNN (train+eval), Keypoint R-CNN (train+eval) | 5/5 pass |
125+ | ** Detection (additional)** | DETR, Mask R-CNN (train+eval), Keypoint R-CNN (train+eval), Deformable DETR | 6/6 pass |
118126| ** Quantized** | ResNet50 (quantized) | 1/1 pass |
119127| ** Video** | R(2+1)D-18, MC3-18, MViT-v2-S, R3D-18, S3D | 5/5 pass |
120128| ** Optical Flow** | RAFT-Small, RAFT-Large | 2/2 pass |
121- | ** Time Series** | PatchTST, Informer, Autoformer | 3/3 pass |
129+ | ** Time Series** | PatchTST, Informer, Autoformer, TimeSeriesTransformer | 4/4 pass |
122130| ** Reinforcement Learning** | Decision Transformer | 1/1 pass |
123- | ** Graph Neural Networks** | DimeNet, GraphSAGE (PyG), GIN (PyG), Graph Transformer (PyG), GATv2 (PyG), R-GCN (PyG) | 6/6 pass |
131+ | ** Graph Neural Networks** | DimeNet, GraphSAGE (PyG), GIN (PyG), Graph Transformer (PyG), GATv2 (PyG), R-GCN (PyG), ChebConv (PyG), SGConv (PyG), TAGConv (PyG) | 9/9 pass |
132+ | ** Document Understanding** | LayoutLM | 1/1 pass |
124133| ** Other** | Taskonomy | 1/1 pass |
125- | | ** Total** | ** 150/150 pass** |
134+ | | ** Total** | ** 183/183 pass** |
126135
127136* Tests requiring optional packages (torch_geometric, taskonomy) may show as SKIPPED.*
128137
@@ -189,6 +198,43 @@ The test suite explicitly covers these distinct computational motifs:
189198| Keypoint detection | — | Keypoint R-CNN |
190199| Time series decomposition | — | Autoformer, Informer |
191200| Self-supervised ViT | — | DINOv2 |
201+ | GRU (gated recurrent) | GRUModel | — |
202+ | Network in Network (1×1 conv + GAP) | NiNModel | — |
203+ | Channel shuffle (group conv) | ChannelShuffleModel | ShuffleNet |
204+ | Pixel shuffle upsampling | PixelShuffleModel | — |
205+ | Partial convolution (inpainting) | PartialConvModel | — |
206+ | FiLM conditioning (affine modulation) | FiLMModel | — |
207+ | Coordinate attention (factorized H/W) | CoordinateAttentionModel | — |
208+ | Differential attention (noise cancel) | DifferentialAttentionModel | — |
209+ | Relative position bias (T5-style) | RelativePositionAttentionModel | — |
210+ | Early exit (multi-head) | EarlyExitModel | — |
211+ | Multi-scale parallel streams (HRNet) | MultiScaleParallelModel | HRNet |
212+ | Gumbel-Softmax VQ | GumbelVQModel | — |
213+ | Multi-hop memory network | EndToEndMemoryNetwork | — |
214+ | Radial basis function (RBF) | RBFNetwork | — |
215+ | Sinusoidal activations (SIREN) | SIRENModel | — |
216+ | Multi-task (shared trunk + heads) | MultiTaskModel | — |
217+ | Wide & deep (recommender) | WideAndDeepModel | — |
218+ | Chebyshev spectral GCN | ChebGCN | ChebConv (PyG) |
219+ | Prototypical network (few-shot) | PrototypicalNetwork | — |
220+ | Efficient channel attention (ECA) | ECAModel | — |
221+ | Parallel attention + FFN | — | GPT-J |
222+ | Sequence reduction (funnel) | — | Funnel Transformer |
223+ | Character-level tokenization | — | CANINE |
224+ | Bottleneck BERT | — | MobileBERT |
225+ | N-gram prediction | — | ProphetNet |
226+ | Deformable attention | — | Deformable DETR |
227+ | Document layout embeddings | — | LayoutLM |
228+ | Normalizer-free training | — | NFNet |
229+ | Reparameterizable convolution | — | RepVGG |
230+ | Sigmoid contrastive loss | — | SigLIP |
231+ | Q-Former (querying transformer) | — | BLIP-2 |
232+ | Spectral graph convolution (Chebyshev) | — | ChebConv (PyG) |
233+ | Simple graph convolution (SGC) | — | SGConv (PyG) |
234+ | Topology adaptive graph conv | — | TAGConv (PyG) |
235+ | LSTM spatial mixing | — | Sequencer2D |
236+ | Pooling-based ViT | — | PiT |
237+ | Dual attention (spatial+channel) | — | DaViT |
192238
193239---
194240
0 commit comments