Skip to content

paper: fix protocol self-contradiction + bit-accounting inconsistencies#41

Merged
FluffyAIcode merged 1 commit into
mainfrom
AgentMemory/paper-protocol-bits-fixes-c478
Apr 24, 2026
Merged

paper: fix protocol self-contradiction + bit-accounting inconsistencies#41
FluffyAIcode merged 1 commit into
mainfrom
AgentMemory/paper-protocol-bits-fixes-c478

Conversation

@FluffyAIcode
Copy link
Copy Markdown
Owner

Review findings addressed

Two internal-credibility issues flagged in review:

(1) Protocol boundaries were self-contradictory

  • §5 D4 headline results run snapshot-mode, n=4 passages, boundary k=2.
  • §6 E8 results run in-forward, no-boundary, n=32, 95% CI.
  • But §7.2 caveat used to declare "All numbers in this paper are snapshot-mode" — directly contradicting §6.

Fix:

  • §1 adds a new paragraph "Strict-GPU, live-vLLM; two measurement protocols" that declares both protocols up-front and maps each to the question it answers.
  • §1 paper-structure paragraph rewritten to explicitly mark §5 as snapshot-protocol and §6 as in-forward-rigorous-protocol.
  • §4 gains a new §4.4 "In-forward rigorous evaluation" subsection defining in-forward application, boundary policy (k=0 default, k=2 only where a codec is structurally undeployable at k=0), sample size (n=32, mean ± 95% CI via Student-t), and harness (benchmarks/rigorous_eval.py + frozen sha256 parity gate).
  • §4.3 "Boundary-layer protection" explicitly tagged (snapshot protocol) and notes that the rigorous protocol uses k=0 by default.
  • Every results-table caption now cites the protocol it was produced under: Tables 2, 3, 4, 5 tagged "snapshot-mode"; Tables 7, 8, 9 already tagged "in-forward rigorous"; Table 6 (latency) unchanged because it is a codec-isolation benchmark.
  • §5 title renamed to "Main results: D4 variant, snapshot protocol" and the overview paragraph explicitly states that all tables in §5 use snapshot.
  • §6 preamble explicitly links every result in §6 to "in-forward rigorous evaluation" and explains why (the E8 upgrade's practical benefit manifests under cross-layer accumulation).
  • §7.2 caveat rewritten to map each protocol to the question it answers instead of falsely claiming global snapshot.

(2) Bit-accounting was inconsistent

D4 side:

  • eq:bits-d4 prose said q=152 → 1056 bits.
  • Table 2 reported 1088 bits at q=152.
  • Table 4 footnoted "1056 exact, reported rounded to 1088".
  • Appendix C reported 1088 as canonical.
  • §8.6 admitted KakeyaLattice pays 3–11% more bits, but §5/§6 still called the comparison "matched bits".

E8 side:

  • Abstract said E8 costs "fixed +32 bits per vector" vs D4.
  • Table 7 showed +32 at q≤10 but +16 at q≥38.
  • Appendix C agreed with Table 7.

Fix: distinguish exact (information-theoretic) from packed (integer-aligned deployment) bit rates everywhere.

  • §3 "Bit budget" paragraph rewritten to cover both: 4·log₂(2q+1) − 1 is the per-block exact rate; ⌈·⌉ is the packed value; at q=152 this is 1056.3 exact → 1088 packed.
  • §3.2 "Matched bit budget" lever updated: q_range=152 → 1056.3 bits exact, 1088 slot-packed.
  • Table 4 row previously Bit cost (exact) | 1056 | 1056 (reported rounded to 1088) split into two clean rows: Bit cost (exact, Eq. (5)) | 1056.3 | 1056.3 and Bit cost (slot-packed, deployment) | — | 1088 (paired against TQ b=8 at 1056).
  • §8.4 renamed "Positioning: near-matched-rate fidelity improvement" — the comparison is explicitly declared near-matched-rate, not iso-rate.
  • §8.6 renamed "Near-matched rate: the 3–11% premium" with a quantitative per-q-tier premium table in prose: +11% at q=10, +4% at q=38, +3% at q=152.
  • §5.1 preamble and Pareto observation Kakeya KV cache: bug fix, Gemma 4 benchmark standard, cross-model reports (4 models) #1 now explicitly say "near-matched rate, not iso-rate".

E8 cost abstract + Table 7 caption + §6 design-facts bullet + "overhead differential shrinks" parenthetical all rewritten to say:

In exact (information-theoretic) rate the E8 variant pays +D/4 = +32 bits/vector at every q_range. In packed (integer-aligned deployment) rate the gap is +32 at low q_range (parity saving still visible above the rounding floor) and +16 at high q_range (parity saving absorbed into ceiling arithmetic).

The ceiling arithmetic is shown explicitly:

  • ⌈4·log₂(2q+1) − 1⌉ = 9, 13, 17, 21, 25, 29, 33 at q = 2, 5, 10, 19, 38, 76, 152 for D4.
  • ⌈8·log₂(2q+1)⌉ = 19, 26, 36, 51, 67 at q = 2, 4, 10, 38, 152 for E8.

Per-vector packed cost (D=128 head, 32 bits fp16 overhead) matches Table 7 / Appendix C exactly: D4 = {320, 448, 576, 704, 832, 960, 1088} and E8 = {336, 448, 608, 848, 1104}.

Build

cd reports/paper
rm -f *.aux *.log *.out *.toc *.fls *.fdb_latexmk *.pdf
latexmk -pdf -interaction=nonstopmode -halt-on-error kakeyalattice.tex

Result: 27 pages, 433 KB (+1 page vs previous commit, from the expanded protocol + bit-accounting prose). Zero undefined references, zero LaTeX errors. Only cosmetic hyperref "Token not allowed in a PDF string (Unicode)" warnings from math-mode $D_4$ / $E_8$ in PDF-outline bookmark strings; the printed document itself is unaffected.

Section outline after fix

§1  Introduction + two-protocols paragraph + paper-structure paragraph
§2  Design philosophy: Kakeya–Brascamp–Lieb–Tropp chain
§3  The KakeyaLattice codec  (bit-budget now distinguishes exact / packed)
§4  Experimental methodology
      §4.1  Models and environment
      §4.2  Snapshot-mode capture and replace
      §4.3  Boundary-layer protection (snapshot protocol)
      §4.4  In-forward rigorous evaluation           ← NEW
      §4.5  Quality metrics
§5  Main results: D4 variant, snapshot protocol     ← title renamed
§6  Measured performance of the E8 nested lattice variant  (in-forward rigorous)
§7  Caveats  (§7.2 no longer falsely claims global snapshot)
§8  Comparison with TurboQuant
      §8.4  Positioning: near-matched-rate fidelity improvement    ← renamed
      §8.6  Near-matched rate: the 3–11% premium                   ← renamed

Files

  • reports/paper/kakeyalattice.tex (329 insertions, 123 deletions)
  • reports/paper/kakeyalattice.pdf (rebuilt)

No code / hook / codec / benchmark / release-artifact changes in this PR.

…cies

Two internal-credibility issues flagged by review:

(1) Protocol boundaries between sections were self-contradictory.
    \u00a75 D4 headline results run snapshot-mode, n=4, boundary k=2.
    \u00a76 E8 results run in-forward, no-boundary, n=32, 95% CI.
    But \u00a77.2 caveat declared 'All numbers in this paper are
    snapshot-mode', directly contradicting \u00a76.

    Fix: explicitly declare both protocols up-front (\u00a71
    'Strict-GPU, live-vLLM; two measurement protocols' paragraph;
    \u00a71 paper-structure paragraph; new \u00a74.4 'In-forward
    rigorous evaluation' subsection). Every results-table caption
    now cites the protocol it was produced under (tab:pareto-qwen3,
    tab:multimodel, tab:theory-vs-exp, tab:128k each tagged
    'snapshot-mode'; tab:v15-* already tagged 'in-forward rigorous').
    \u00a75 title renamed to 'Main results: D4 variant, snapshot
    protocol'; \u00a76 preamble explicitly links every result to
    'in-forward rigorous evaluation'. \u00a77.2 caveat rewritten to
    map each protocol to the question it answers instead of
    falsely claiming global snapshot.

(2) Bit-accounting was inconsistent.
    - eq:bits-d4 prose said q=152 -> 1056 bits; Table 2 reported
      1088 bits; Table 4 footnoted '1056 exact, rounded to 1088';
      Appx C reported 1088. \u00a78.6 admitted KakeyaLattice pays
      3-11% more bits but \u00a75/6 still called it 'matched bits'.
    - Abstract said E8 costs 'fixed +32 bits per vector' vs D4;
      Table 7 showed +32 at q<=10 but +16 at q>=38; Appx C agreed.

    Fix: distinguish exact (information-theoretic,
    4*log2(2q+1)-1 per D4 block) from packed (integer-aligned
    deployment value) everywhere. \u00a73 bit-budget paragraph
    rewritten to cover both. \u00a73.2 'Matched bit budget' lever
    updated: q=152 -> 1056.3 bits exact, 1088 slot-packed. Table 4
    row split into 'Bit cost (exact, Eq. 5)' = 1056.3 and 'Bit cost
    (slot-packed, deployment)' = 1088 (paired against TQ b=8 at
    1056). \u00a78.6 renamed to 'Near-matched rate: the 3-11%
    premium' with a quantitative per-q-tier premium table in the
    prose. \u00a78.4 renamed 'Positioning: near-matched-rate
    fidelity improvement'. \u00a75.1 preamble and Pareto
    observation #1 now explicitly say 'near-matched rate, not
    iso-rate'.

    Abstract E8 cost sentence rewritten: '+32 bits per vector at
    low q_range (where D4's parity-saving still clears the
    rounding floor) and +16 bits per vector at high q_range
    (where the parity saving is absorbed into integer alignment)'.
    tab:v15-bits caption updated to call out packed vs. exact and
    disclose that the gap is not fixed. The 'overhead differential
    shrinks' parenthetical in \u00a76 rewritten to explain the
    ceiling arithmetic precisely (ceil(4*log2(2q+1)-1) = 9,13,17,
    21,25,29,33 at q = 2,5,10,19,38,76,152 for D4; ceil(8*log2
    (2q+1)) = 19,26,36,51,67 at q = 2,4,10,38,152 for E8). The
    exact (unpacked) E8-D4 gap is confirmed constant at +D/4 = +32
    bits/vector; the packed gap varies as described.

Build: rm -rf aux/log + latexmk -pdf clean -> 27 pages (+1 vs
previous), 433 KB, zero undefined references, zero LaTeX errors.
Only cosmetic hyperref 'Token not allowed in a PDF string
(Unicode)' warnings on D_4/E_8 PDF-outline bookmarks; printed
document unaffected.
@FluffyAIcode FluffyAIcode marked this pull request as ready for review April 24, 2026 07:26
@FluffyAIcode FluffyAIcode merged commit 19718d1 into main Apr 24, 2026
FluffyAIcode pushed a commit that referenced this pull request Apr 24, 2026
…or, surface caveats

Addresses three reviewer comments on the v1.5 paper (15-page post-trim
version at main HEAD).  Full evaluation of each feedback before
editing — parts of feedback 2 were ALREADY fixed in PR #41 (bit
accounting) and this PR surfaces the remaining overclaims.

FEEDBACK 2 — bit accounting / 'matched bits'

Reviewer flag: 'matched bit points' language survives even after
Table 4 acknowledges exact=1056 vs packed=1088 (+3%-11% rate
premium).  Abstract and conclusion still imply strict iso-rate.

Fixed:
  - Abstract paragraph 3: rewrote to open with 'three near-matched
    bit tiers (the D4 variant sits 3-11% above its paired
    TurboQuant-b rate, not strictly iso-rate, because
    4*log2(2q+1)-1 rounds up to the next integer; see §8)'.
    Near-matched qualifier is now visible from the abstract
    itself, not just from §8.6.
  - Conclusion paragraph 1: rewrote to close with 'at a 3-11%
    packed-rate premium over the paired TurboQuant operating
    points (the comparison is near-matched rate, not strictly
    iso-rate ... §8.6)'.
  - §8 TurboQuant comparison opening: removed 'a drop-in change'
    phrasing at the end of the levers-in-common paragraph;
    replaced with 'the swap is structurally drop-in (same slot
    layout, same overhead scalars, same boundary layer policy)
    at a modest rate premium: see §8.6 for the quantitative
    3-11% packed-rate premium discussion'.  The 'drop-in'
    engineering claim stands (slot-compatible); the iso-rate
    implication is removed.
  - Conclusion paragraph 4 positioning: changed from 'additive
    near-matched-rate drop-in PR' to 'near-matched-rate,
    deployment-realistic fidelity improvement ... not as a
    strictly iso-rate or unconditional upgrade'.  Explicit about
    what the framing is NOT.

The '≈8% improvement' wording was already removed in an earlier
trim (PR #42); the conclusion now says '12/12 K-MSE wins ... at
a 3-11% packed-rate premium' instead.

FEEDBACK 3a — Prop 2.2 proof is too short / framing is too strong

Reviewer flag: Proposition 2.2 with a one-line proof sketch claims
a full Voronoi -> covering radius -> restricted Kakeya cover ->
empirical 0.919 ratio chain.  Conclusion and §8 treat it as
'unconditional rate-distortion boundary', overweighting a textbook
lattice result wrapped in Kakeya framing.

Fixed:
  - Renamed Proposition 2.2 to Fact 2.2 ('D4 / E8 second-moment
    shaping gain, classical').  Registered \newtheorem{fact}
    sharing the theorem counter.
  - Dropped the covering-radius inequality (which was not proven
    and not used downstream); Fact now states only the
    second-moment ratios G(D4)/G(Z4) = 0.919 and G(E8)/G(Z8) =
    0.860 with citations to Zamir-Feder, Conway-Sloane,
    Eyuboglu-Forney, Erez-Zamir — all textbook references, no
    novelty claimed.
  - Replaced the one-line proof sketch with an explicit
    acknowledgement: 'Fact 2.2 is a textbook consequence of
    lattice source coding and is stated here only so that §5's
    measured ≈ 0.919 K-MSE ratio has an explicit reference.  We
    use it as the engineering target the five-lever preprocessing
    pipeline is designed to hit, not as a new theoretical
    contribution.'
  - Downgraded the Kakeya chain claim: 'The connection to the
    Kakeya chain ... is literature-traceable framing: the codec
    gains nothing formal from the ancestry, and loses nothing if
    a reader prefers to treat Fact 2.2 as a pure lattice-coding
    statement.'  Retention justified on naming (direction-cover
    structure) and on the prior-work contrast (v1.3 used a
    conditional Kakeya lower bound; present codec replaces it).
  - Conclusion paragraph 1: 'is unconditionally bounded by
    0.919 via Prop 2.2' -> 'shaping-gain target is the textbook
    second-moment ratio G(Z4)/G(D4) ≈ 0.919 (Fact 2.2); the
    Kakeya framing (§2) names the codec's direction-cover
    structure and situates it inside the ... ancestry of
    lattice-second-moment bounds but does not provide new formal
    results at deployment dimension'.  Ancestry retained as
    intellectual honesty, not as load-bearing theory.

FEEDBACK 3b — citation error on [carbery-valdimarsson]

Reviewer flag: [17] (carbery-valdimarsson) is cited at §2 chain
paragraph as part of the 'matrix concentration (Tropp's matrix
Chernoff via Lieb concavity)' bracket, but its actual title is
'The endpoint multilinear Kakeya theorem via the Borsuk-Ulam
theorem' — a proof of multilinear Kakeya, not a
matrix-concentration result.

Fixed:
  - §2 chain: moved \cite{carbery-valdimarsson} out of the
    matrix-concentration bracket and into the 'multilinear Kakeya
    endpoint' bracket alongside \cite{guth-endpoint}, which is
    where it actually belongs.  (Carbery-Valdimarsson gives an
    alternate Borsuk-Ulam proof of the same endpoint theorem.)
    Matrix-concentration bracket now correctly cites only
    \cite{tropp-matrix-chernoff}.
  - §9.1 Related work 'Kakeya problem and harmonic analysis'
    paragraph: moved \cite{carbery-valdimarsson} from the
    Brascamp-Lieb bracket to an inline parenthetical next to
    \cite{guth-endpoint} — 'Guth's multilinear Kakeya endpoint
    (with the Borsuk-Ulam-based proof of [17])'.

FEEDBACK 4 — D4 main-results downstream quality evidence

Reviewer flag: the consolidated-verdict '9/12 Δppl wins' row in
§5.2 looks strong at a glance; the accompanying prose acknowledges
the three losses are 'inside the bf16 floor at n=4', but readers
who only scan the table see '9/12 wins, 3 baseline wins' and
miss the caveat.  Publication-grade standards require n ≥ 32.

Fixed:
  - §5.2 consolidated-verdict table: the 'Δppl ratio (< 1.0)'
    row now shows '9/12 wins | (3 sub-floor) | 0' instead of
    '9/12 wins | 0 | 3 baseline wins'.  The sub-floor ties are
    visually grouped with the ties column, not the baseline-wins
    column.
  - Added a table-footnote with asterisk marker directly beneath
    the table: 'At n=4 passages the FlashAttention bf16
    reproducibility floor is ~1% |Δppl| (§7); the three baseline
    wins rows on |Δppl| are all sub-floor near-lossless ties ...
    Deployment-realistic n=32 |Δppl| numbers with 95% CI are
    in §6'.  Reader cannot miss the caveat.
  - §5.2 closing paragraph: rewrote to say 'Publication-grade
    |Δppl| tightening requires n ≥ 32 (§7 caveat 1); the
    E8-vs-D4 comparison in §6 uses n=32 with Student-t 95% CI
    to eliminate this confound'.
  - Abstract paragraph 3: expanded the 9/12 win statement to
    'On |Δppl| at n=4 passages the D4 variant wins 9/12 pairs;
    the three losses are all < 1% absolute on both sides,
    inside the FlashAttention bf16 reproducibility floor at
    this sample size (§7), and the deployment-operating-point
    win at qrange=10 is 4/4 at 1.8-5.3x improvement'.  The
    qualifier propagates to the abstract.
  - Conclusion paragraph 2: same qualifier propagated.

BUILD

latexmk -pdf clean rebuild: 15 pages (unchanged from PR #42
trim), 455 KB (+3 KB due to new table footnote + expanded
conclusion + Fact 2.2 explanation paragraph), zero undefined
references, zero LaTeX errors.  The \newtheorem{fact} declaration
shares the theorem counter with proposition / theorem / lemma /
remark so Fact 2.2 / Proposition 2.1 / Proposition 2.2 numbering
is internally consistent.

This PR is DRAFT per user instruction pattern; do not auto-merge.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants