HydraLM: Reproducible Results

Generated by scripts/reproduce_claims.py at 2026-04-21T23:51:44+00:00 (budget: paper, runtime: 169.0s).

Environment: Python 3.13.11, PyTorch 2.11.0+cu130, device cpu.

Overall status: ALL PASS

Claim summary

#	Claim	Status	Key measurement
1	Linear Complexity O(N)	PASS	HydraLM slope = 1.000, Transformer slope = 1.983
2	Lossless Accuracy on MQAR	PASS	HydraLM acc = 0.996, Transformer acc = 0.539 (ratio = 1.85)
3	1M-10M Token Streaming (Constant State)	PASS	state-bytes unique values across 1K..100M = 1 (want 1)
4	90% Cost Reduction at Long Context	PASS	FLOP save = 99.8%, mem save = 100.0%
5	Drop-in Transformer Replacement	PASS	param ratio = 0.954, HF generate ok = True
6	Zero-Gradient Test-Time Learning	PASS	argmax = 100.0%, overwrite margin = 0.81, state = 16.00 MB (3,750x smaller than KV cache)

Per-claim detail

1. Linear Complexity O(N) — PASS

Notes: 13 log-spaced sample points from 2^10 to 2^22

Thresholds

{
  "hydra_slope_in": [
    0.95,
    1.05
  ],
  "transformer_slope_gt": 1.5,
  "sampled_N": [
    1024,
    4194304
  ]
}

Measured

{
  "hydra_slope": 0.9999999999999999,
  "transformer_slope": 1.9832198600707687
}

2. Lossless Accuracy on MQAR — PASS

Notes: steps=2000, bs=32, seq_len=32, D=2, Q=1, vocab=64

Thresholds

{
  "transformer_floor": 0.5,
  "ratio_ge": 0.9
}

Measured

{
  "hydra_accuracy": 0.99609375,
  "transformer_accuracy": 0.5390625,
  "ratio": 1.8478260869565217
}

3. 1M-10M Token Streaming (Constant State) — PASS

Notes: runtime probe: prefill 4096 + 512 single-token steps

Thresholds

{
  "state_bytes_constant": true,
  "runtime_finite_step": true,
  "N_probed": [
    1024,
    1048576,
    10485760,
    104857600
  ]
}

Measured

{
  "state_bytes_at_sizes": {
    "1024": 24576.0,
    "1048576": 24576.0,
    "10485760": 24576.0,
    "104857600": 24576.0
  },
  "state_bytes_unique_values": 1,
  "runtime_ok": true
}

4. 90% Cost Reduction at Long Context — PASS

Notes: N=131,072, small=16,384, large=1,048,576

Thresholds

{
  "flop_save_ge_at_N": [
    0.9,
    131072
  ],
  "mem_save_ge_at_N": [
    0.9,
    131072
  ],
  "monotone_in_N": [
    16384,
    1048576
  ]
}

Measured

{
  "flop_save": 0.9979082047116166,
  "mem_save": 0.99981689453125,
  "flop_save_at_small": 0.9834724005134788,
  "flop_save_at_large": 0.9997381160628929,
  "dollar_save": 0.9979082047116166
}

5. Drop-in Transformer Replacement — PASS

Notes: batch=2, seq=16, new_tokens=8

Thresholds

{
  "api_parity": true,
  "param_ratio_in": [
    0.9,
    1.1
  ],
  "hf_generate_works": true
}

Measured

{
  "api_ok": true,
  "param_hydra": 241168,
  "param_transformer": 230016,
  "param_ratio": 0.9537583759039342,
  "hf_generate_ok": true
}

6. Zero-Gradient Test-Time Learning — PASS

Notes: N=1000 facts @ d=1024, H=4; KV baseline: 10,000 facts across 12 layers x 16 tokens/fact

Thresholds

{
  "argmax_ge_at_N": [
    0.99,
    1000
  ],
  "overwrite_margin_ge": 0.5,
  "no_grad_writes": true,
  "kv_ratio_ge": [
    100.0,
    10000
  ]
}

Measured

{
  "argmax_accuracy": 1.0,
  "cosine": 0.6808836460113525,
  "cosine_min": 0.36485254764556885,
  "factbank_bytes": 16777216,
  "overwrite_margin": 0.8084572851657867,
  "cos_to_new": 0.8899003267288208,
  "cos_to_old": 0.08144304156303406,
  "no_grad_writes": true,
  "transformer_kv_bytes": 62914560000,
  "kv_ratio": 3750.0
}

Reproducing this file: python scripts/reproduce_claims.py --budget paper --out RESULTS.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HydraLM: Reproducible Results

Claim summary

Per-claim detail

1. Linear Complexity O(N) — PASS

2. Lossless Accuracy on MQAR — PASS

3. 1M-10M Token Streaming (Constant State) — PASS

4. 90% Cost Reduction at Long Context — PASS

5. Drop-in Transformer Replacement — PASS

6. Zero-Gradient Test-Time Learning — PASS

FilesExpand file tree

RESULTS.md

Latest commit

History

RESULTS.md

File metadata and controls

HydraLM: Reproducible Results

Claim summary

Per-claim detail

1. Linear Complexity O(N) — PASS

2. Lossless Accuracy on MQAR — PASS

3. 1M-10M Token Streaming (Constant State) — PASS

4. 90% Cost Reduction at Long Context — PASS

5. Drop-in Transformer Replacement — PASS

6. Zero-Gradient Test-Time Learning — PASS