Skip to content

row-spine: per-column arrangement compression (alpha, default off)#37111

Open
frankmcsherry wants to merge 10 commits into
MaterializeInc:mainfrom
frankmcsherry:pr-column-codecs
Open

row-spine: per-column arrangement compression (alpha, default off)#37111
frankmcsherry wants to merge 10 commits into
MaterializeInc:mainfrom
frankmcsherry:pr-column-codecs

Conversation

@frankmcsherry

Copy link
Copy Markdown
Contributor

Motivation

Row-spine arrangements currently compress at the whole-row level via the
just-shipped dictionary codec (enable_arrangement_dictionary_compression_alpha).
That works well when whole rows repeat, but leaves a lot on the table for wide
rows where individual columns have exploitable structure (low-cardinality enums,
clustered integers, repeated strings) but the rows as a whole are nearly unique.

This PR adds a per-column compression framework: each column in a row gets
its own codec, chosen independently from a menu, based on batch-wide statistics
collected at seal time.

What's here

A new mz_row_spine::codec module:

  • ColumnCodec — a self-delimiting per-column codec enum: Raw,
    Constant, Huffman (boxed canonical Huffman, entropy), For
    (Frame-of-Reference = subtract-min then bitpack to a fixed byte width), and
    Dictionary. Because each codec is self-delimiting, they compose freely down
    a row.
  • RowCodec — encodes/decodes a whole row by dispatching column-by-column.
  • RowStats / ColumnStats — accumulate per-column statistics over a batch
    and pick the smallest-estimated codec per column, charging the per-codec table
    storage (dictionary maps, Huffman tables) against the estimate so fixed
    overhead can't lose us bytes on small or high-cardinality columns.
  • Canonical Huffman byte codec (huffman.rs) with forward-streaming decode.

Decode is deferred to the operator that needs datums: ReadItem carries the
compressed bytes plus a reference to the codec, and decode happens into a
thread-local scratch for comparisons or into the caller's RowArena for
extend_datums / to_row.

Relationship to dictionary compression

This is purely additive and orthogonal to the existing dictionary codec:

  • It makes zero changes to the old dictionary path (ColumnsCodec /
    dictionary builder / row_codec module).
  • It uses a separate container field (column_codec, alongside the existing
    codec) and a separate flag
    (enable_arrangement_column_compression_alpha, default off).
  • The two are mutually exclusive at runtime: build_column_codec returns
    None whenever a dictionary codec is installed.

So the recently-shipped dictionary feature is untouched and independently
controllable; either flag can be flipped or walked back with no effect on the
other.

Flag wiring

enable_arrangement_column_compression_alpha mirrors the dictionary flag end to
end: captured once at replica creation, threaded through
InstanceConfig/ReplicaConfig, and applied as a one-shot store into the
process-global COLUMN_COMPRESSION flag at instance creation (deliberately not
re-applied on config update, so flipping it never retroactively rewrites a live
replica's arrangements).

Testing

Unit tests in codec.rs / huffman.rs cover round-trip encode/decode for each
codec and the selector. Manual validation against a TPCH load generator showed
per-column selection reducing arrangement footprint substantially on wide tables
(e.g. ~-56% on lineitem, ~-43% across a mixed TPCH set) versus uncompressed,
competitive with or better than whole-row dictionary while decoding more cheaply.

This release will add an alpha enable_arrangement_column_compression_alpha
feature flag (default off) for per-column arrangement compression.

🤖 Generated with Claude Code

@frankmcsherry frankmcsherry requested a review from a team as a code owner June 17, 2026 19:58
@frankmcsherry frankmcsherry requested a review from a team as a code owner June 17, 2026 20:22
frankmcsherry added a commit that referenced this pull request Jun 18, 2026
…37113)

## What

`ArrangementFlavor::flat_map` and `flat_map_ok` constructed a fresh
`RowArena::new()` *inside* the per-row `logic` closure, so an arena was
allocated and dropped on every row processed. This hoists the arena to
the
enclosing scope and `clear()`s it once per row instead, reusing its
allocation
spine across invocations.

## Why

Addresses post-merge review feedback from @antiguru on #37110 (the
`ExtendDatums` refactor): the arena is the caller-provided decode target
a
compressed row representation decodes into. With per-column arrangement
codecs
(follow-up #37111) each row actually populates the arena, so
reallocating it per
row would churn; clearing-and-reusing keeps the spine.

Note that `RowArena::clear()` today drops the inner per-value `Vec<u8>`
buffers
(it clears the outer `Vec<Vec<u8>>`), so this reuses the outer spine but
not the
individual buffers — recycling those is a separate, larger `RowArena`
change.

## Tests

No behavior change; existing `compute` tests cover these paths.

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Comment thread src/compute-types/src/dyncfgs.rs Outdated
frankmcsherry added a commit that referenced this pull request Jun 18, 2026
## What

Reworks `RowArena` from "one `Vec<u8>` allocation per `push_bytes`, all
freed on `clear`" into a bump allocator over a stack of byte regions:

- `push_bytes` **copies** bytes into the active (last) region. When that
region lacks spare capacity it allocates a **new, larger** region
(doubling) rather than growing the current one — so a region that
already holds data is never reallocated and references returned earlier
stay valid. The outer `Vec` may reallocate as regions are added, but
that only moves the `Vec<u8>` headers, not the heap buffers they own.
- `push_bytes` now accepts any `B: Deref<Target = [u8]>` (e.g.
`Vec<u8>`, `&[u8]`). Existing `Vec<u8>` callers are unchanged; borrowed
sources no longer need a throwaway allocation just to hand bytes over.
- `clear` retains only the **single largest region** (emptied) and drops
the rest, right-sizing the arena. An arena reused across `clear` cycles
— e.g. decoding arrangement rows one at a time — becomes allocation-free
in steady state.
- `reserve` now reserves *bytes* in the active region (was: slots in the
outer vector); `with_capacity` sizes the initial region. Neither has
load-bearing callers (the `reserve` users in `expr` pass size hints).

## Why

Follows up the `ExtendDatums` arena work (#37110) and the per-row arena
hoist (#37113). Once the arena is the decode target for compressed
arrangement rows (per-column codecs, #37111), a fresh allocation per
value per row is the dominant cost; a reused doubling region removes it.
The pattern mirrors columnation's region allocator.

The trade-off is the standard bump-allocator one: a single very large
row leaves the arena holding a large region until the next `clear`. For
the per-row-cleared decode/eval paths the high-water mark is just the
largest single row, so it stays bounded.

## Tests

Adds `mz_ore::test`s in `repr::row` covering:
- references from `push_bytes` remaining valid after later pushes force
new regions,
- `push_unary_row` reading a row back from a non-zero region offset
(confirms decoding is position/alignment independent),
- reuse across `clear`.

No behavior change for callers; no release note.

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
frankmcsherry added a commit to frankmcsherry/materialize that referenced this pull request Jun 18, 2026
Addresses review feedback on MaterializeInc#37111: scope
`enable_arrangement_column_compression_alpha` to the replica
(`ParameterScope::Replica`) so it can carry per-replica / per-size-family
overrides, like the other replica-local physical flags (lgalloc, persist pager,
column-paged batcher).

This also replaces the manual capture chain the flag previously borrowed from
dictionary compression — InstanceConfig/ReplicaConfig fields, the controller's
per-replica capture, and the `handle_create_instance` store — with the
replica-scoped idiom: the per-replica-resolved value arrives in the replica's
`worker_config`, and `apply_worker_config` mirrors it into the process-global
`mz_row_spine::COLUMN_COMPRESSION` flag (next to the lgalloc-region store).
The controller-level capture read only the environment value and would have
missed per-replica overrides. Re-applying on each config tick is safe: every
batch records its own codec, so a flip only affects batches sealed afterwards.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
frankmcsherry added a commit to frankmcsherry/materialize that referenced this pull request Jun 18, 2026
Addresses review feedback on MaterializeInc#37111: scope
`enable_arrangement_column_compression_alpha` to the replica
(`ParameterScope::Replica`) so it can carry per-replica / per-size-family
overrides, like the other replica-local physical flags (lgalloc, persist pager,
column-paged batcher).

This also replaces the manual capture chain the flag previously borrowed from
dictionary compression — InstanceConfig/ReplicaConfig fields, the controller's
per-replica capture, and the `handle_create_instance` store — with the
replica-scoped idiom: the per-replica-resolved value arrives in the replica's
`worker_config`, and `apply_worker_config` mirrors it into the process-global
`mz_row_spine::COLUMN_COMPRESSION` flag (next to the lgalloc-region store).
The controller-level capture read only the environment value and would have
missed per-replica overrides. Re-applying on each config tick is safe: every
batch records its own codec, so a flip only affects batches sealed afterwards.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

antiguru commented Jun 18, 2026

Copy link
Copy Markdown
Member

Edit: I @antiguru actually requested the review, but Claude didn't know, and got confused.


Review — row-spine per-column arrangement compression

This review was written by Claude (Claude Code), at the request of @frankmcsherry / the PR author.

TL;DR

The feature code (codec.rs, huffman.rs, the lib.rs wiring) is genuinely good: clean, well-documented, correctly self-delimiting codecs, sound mutual-exclusion with the dictionary path, and a default-off alpha flag wired exactly like its predecessor. But the PR as it stands is not mergeable, for one mechanical reason and a couple of correctness/robustness gaps worth closing before this goes in even behind a flag.


❌ Blocker 1 — 134,008 lines of unrelated junk

The final commit 6d74d99 ("compute: make per-column compression flag replica-scoped") swept in 80 unrelated working files alongside the legitimate one-line flag-scoping change:

  • misc/build-time/* (incl. a 55,905-line cargo-timing.html, llvm-lines dumps, logs)
  • misc/decoupling-scans/* (~14k lines of scan output)
  • play/optimizer-rebuild/*, play/mz-bridge-recipe/package-lock.json
  • misc/demo-data/*, .agents/skills/mz-demo-data, FEATURE_FLAG_AUDIT.md

None are referenced by the feature. The real diff is just:

src/compute-types/src/dyncfgs.rs |  13 +
src/compute/src/compute_state.rs |  12 +
src/row-spine/src/codec.rs       | 592 +
src/row-spine/src/huffman.rs     | 410 +
src/row-spine/src/lib.rs         | 320 +

plus the two Python flag-registration files (mzcompose/__init__.py, parallel_workload/action.py), which are legitimate (bin/lint-test-flags requires them).

Action: rebuild the last commit so it carries only the flag-scoping change (git reset the junk paths out + git commit --amend, or interactive-rebase 6d74d99). Until then the diff is unreviewable in a normal view and will trip bin/lint on a pile of these files.


Correctness

2. Variable / mixed arity is silent corruption in release (robustness gap).
RowStats::observe grows its columns vec to the max arity ever seen, and choose() uses each column's own count. But RowCodec::encode indexes self.columns[index] per row column — a row with more columns than the codec models panics (index OOB); a row with fewer leaves count != self.columns.len(), caught only by a debug_assert_eq! (codec.rs:218) that's gone in release. And decode_into always walks all self.columns, so a short record makes For/Dictionary::decode slice cursor[..width] off the end → panic even in release, or Raw reads into the next record → corruption.

build_column_codec already hints the author saw this (it skips empty rows when gathering stats), but that's a half-measure: if a container ever holds both an empty (arity-0) row and non-empty rows, the empty row still goes through push_intoencode emits a 0-length record, and decoding it walks N columns off the end. In practice a single key/val container has uniform arity, so I believe this is not reachable today — but it's an unenforced invariant guarded only by a debug assert. Recommendation: have RowStats track the first row's arity and build() return None if any observed row's arity differs (and don't silently skip empties — they're an arity signal). Turns a latent release-mode panic into a clean "no codec."

3. HuffmanCode::from_frequencies doc/behavior mismatch. The doc says it returns None "when fewer than two distinct bytes appear," but code_lengths returns Some for the single-symbol case (length 1). So a single-distinct-byte column does build a code (1 bit/byte). Correct and round-trips, but the comment is wrong and no test exercises single-symbol selection through ColumnCodec::Huffman. Fix the comment, or actually return None for present.len() == 1 if exclusion was intended.

4. FoR / be_uint — verified correct ✓. Checked against the row encoding (Tag::NonNegativeInt64_24 = tag byte + N significant bytes). FoR is pure big-endian byte arithmetic over the fixed-length pattern including the tag, so it round-trips exactly regardless of tag/sign, correctly gated on fixed_len && len <= 8 && max_int > min_int, and width from max-min means it can never lose to Raw. round_trip_for_numeric is valid.

5. Dictionary codes are insertion-ordered, not sorted. observe assigns code = distinct.len() on first sight. Fine for correctness (decode is a lookup), but worth a one-line comment so a future reader doesn't assume order-preservation — and it forecloses the compare optimization below.


Performance (flag is off, so non-blocking — but this is whether it can ship)

6. The comparison path fully decodes both operands. PartialEq/PartialOrd on encoded items decode both whole rows into thread-local scratch (Huffman bit-by-bit, dict lookups) then compare raw bytes. Differential merge/consolidate/seek is comparison-bound, so with the flag on this is O(full row) per comparison instead of O(distinguishing prefix). This is the single biggest risk to viability and deserves a dedicated benchmark (merge throughput, flag on vs off) before leaving alpha. The two-buffer thread-local design itself is sound. Possible win: decode column-by-column, stop at first differing column (Constant compares free); making Dictionary/For order-preserving would let those compare encoded, but Huffman isn't order-preserving so a general fast path needs the lazy column-wise decode.

7. Per-row allocations on the read/scan path. to_row does Vec::new() per call; extend_datums (the documented hot scan path) does Vec::new() and then arena.push_bytes(buf) — an allocation plus a copy per row, vs. the no-codec path's allocation-free tight loop. Both should reuse a thread-local scratch like materialized already does.

8. Huffman::decode_one is bit-at-a-time — O(max_len) branches + bounds-checked read_bit per symbol. A length-limited (MAX_BITS=24) table-driven decode would be much faster, worth it if Huffman is actually selected often (see test gap below).

9. Merge re-encodes everything (inherent, fine). merge_capacity correctly drops the codec and push_into decodes each incoming item to a Row then re-encodes under a fresh codec — unavoidable since codecs aren't reusable across merges; just another full decode + to_row alloc per row during merges.


Test coverage gaps

  1. Huffman selection end-to-end. No test asserts ColumnCodec::Huffman is ever chosenround_trip_text_heavy and round_trip_dictionary_lowcard land on Dictionary, round_trip_for_numeric on For. So the Huffman arm through RowCodec and the model-cost accounting in choose() are never exercised. Add a case engineered to make Huffman win (high-cardinality, skewed bytes, >65,536 distinct so Dictionary is disqualified).
  2. The merge path (push_into with item.column_codec.is_some()) — never tested; the container test only installs a codec and pushes fresh rows.
  3. Variable/empty-row arity (finding #2) — once the guard is added.
  4. MAX_BITS decline path in code_lengths — uncovered.
  5. DICT_MAX_CARDINALITY boundary — exactly 65,536 / 65,537 distinct to confirm 2-byte width and fall-off to None.

Minor nits

  • codec.rs:388 let _ = best_bytes; — the final best_bytes write in the Dictionary arm is dead; the discard silences the warning but reads oddly. A short comment or dropping the last assignment would help.
  • CMP_SCRATCH_A/B thread-locals never shrink (retain the largest row seen for the thread's life) — negligible, worth a comment.
  • self_code (codec.rs:192) is a one-line indirection over index.get(...).expect(...).
  • compute_state.rs "Safe to re-apply on every tick" is the right call and well-justified. 👍

Suggested benchmarks

  1. Merge/consolidate throughput, flag on vs off on a wide low-cardinality arrangement (e.g. TPCH lineitem) — exercises the full-row-decode comparison path (#6). This is the gating measurement.
  2. Arrangement scan latency (extend_datums) on/off — captures the per-row alloc regression (#7).
  3. Memory footprint on/off across the TPCH set — reproduce the "-56% lineitem / -43% mixed" claim in a repeatable harness (Feature Benchmark / mz-benchmark entry so CI tracks it).
  4. Seal cost on high-cardinality columns — the per-batch stats pass builds a BTreeMap up to 65k entries per column and then discards it; confirm seal stays cheap.

Net: solid, well-isolated feature behind an off-by-default flag. Strip the junk commit (blocker), add the arity guard, and fill the Huffman-selection + merge-path test gaps; treat the comparison-path decode cost as the key thing to measure before promoting past alpha.


Generated by Claude Code

frankmcsherry and others added 9 commits June 18, 2026 10:50
First piece of the entropy-coded row work. A self-contained, tested canonical
Huffman coder over the byte alphabet, seeded from batch-wide frequencies. Not
yet wired into the containers (allowed unused for now); integration into the
codec slot follows.

- `HuffmanCode` is described entirely by its 256 per-byte code lengths; canonical
  codes and decode tables derive from the lengths, so a stored model costs only
  the lengths. Built from frequencies via standard Huffman; declines (returns
  `None`) when <2 symbols appear or the optimal code would exceed a 24-bit bound
  (caller then leaves the data uncompressed).
- Decoding is forward-streaming (read bits MSB-first, emit one symbol at a time
  with O(1) state) so that comparisons can decode operands without materializing
  them — the property that lets `ReadItem::Ord` avoid a per-item allocation.
- `FrequencyCounter` accumulates the model; `BitWriter`/`BitReader` do MSB-first
  bit packing.

Tests round-trip empty, single/two-symbol, skewed, uniform (verifying exactly
8-bit codes, i.e. no expansion), and pseudo-random inputs, and check that a code
rebuilt from its lengths matches the original.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Foundation for per-column compression of arrangements: each column is compressed
independently with a codec chosen from batch-wide statistics, instead of one
scheme for the whole row. Standalone + tested; container wiring follows.

- `ColumnCodec` (enum, one variant per scheme): `Raw`, `Constant` (column is one
  value batch-wide -> zero bytes/row), `Huffman`. Codecs are self-delimiting in
  the encoded stream, so they compose: `RowCodec` decodes a row by walking the
  column codecs in order, each consuming exactly its bytes — no central
  length-prefix block. New schemes (FoR = subtract-min then bit-pack, dictionary)
  slot in as a variant plus a size estimate.
- `RowStats` selects per column the smallest-estimated codec (including `Raw`),
  and `build()` returns `None` when every column is best left `Raw`. This
  structurally avoids the overhead trap the experiments exposed (uniform
  per-column Huffman and dictionary both blew up small many-column arrangements):
  cheap columns stay `Raw`/`Constant`, expensive tables are installed only where
  they pay.

Built on the ExtendDatums/arena iterator cleanup; reuses the canonical Huffman
coder. Tests cover mixed/constant/text-heavy round-trips and that an
all-raw batch yields no codec.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Wires the per-column codec framework end-to-end behind a default-off
COLUMN_COMPRESSION flag, reusing the decode-on-read pattern: a container holds an
optional RowCodec, records are stored RowCodec-encoded, and DatumSeq carries the
compressed bytes + codec ref and decodes on demand.

- DatumContainer gains `column_codec: Option<RowCodec>` + a staging buffer,
  threaded through with_capacity/merge_capacity/clear/heap_size; merge drops the
  codec (rebuilt at next seal). push_into encodes the raw row columns per-column
  (implies no dictionary codec); index defers decode.
- DatumSeq gains `column_codec: Option<&RowCodec>`. to_row/into_owned/clone_onto
  decode the record to raw row bytes; cmp/eq decode each operand into a
  thread-local scratch (two, non-aliasing); extend_datums decodes into the arena;
  next/bytes_iter keep entropy.is_none()-style tripwires.
- seal() builds + installs a RowCodec per key/value container via RowStats over
  the raw columns (all four builders); column compression layers over raw, so it
  is skipped when a dictionary codec is present.

Flag off => no codec built => unchanged behavior. New container round-trip test
covers encode/decode, compare/seek, and extend_datums for codec-encoded items.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…lection

Fixes the per-column overhead the catalog benchmark exposed: `ColumnCodec` was an
enum sized to its largest variant (`Huffman`'s ~1.3 KB inline tables), so every
column — even `Raw`/`Constant` — cost ~1.3 KB in the per-column `Vec`, blowing up
small many-column arrangements (e.g. a 7-row catalog index 750 B -> ~9 KB).

- `Huffman(Box<HuffmanCode>)`: the enum is now pointer-sized, so `Raw`/`Constant`
  columns cost only a discriminant + pointer; the table is heap-allocated only
  for columns that actually use Huffman.
- The selector now charges the model's storage (its inline tables) against the
  Huffman estimate, so Huffman is adopted only when per-row savings outweigh the
  one-time table — keeping it off small columns where the table would dominate.

`heap_size` accounts the boxed table. Test now uses enough rows that Huffman is
selected (exercising its decode path); small batches correctly fall back to Raw.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Fills out the per-column codec menu so the selector can fit numeric and
low-cardinality columns, not just text.

- `For` (frame-of-reference): for fixed-width (<= 8 byte) columns, interpret the
  bytes as a big-endian integer and store `value - min` in the fewest bytes that
  hold `max - min`. It is pure arithmetic on the byte pattern (no datum-type
  knowledge), so it always reconstructs exactly; it simply isn't *chosen* when the
  byte order leaves residuals large, so it's safe to offer everywhere.
- `Dictionary` (per-column): map each distinct value to a fixed-width code into a
  value table, for columns up to 2^16 distinct values.

`ColumnStats` now also tracks fixed length, min/max integer, and distinct values
(bounded); `choose` estimates FoR and dictionary sizes (table storage charged)
alongside Huffman/Constant/Raw and picks the smallest. New round-trip tests cover
FoR-numeric and dictionary-low-cardinality selection.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…_compression_alpha

Add an `enable_arrangement_column_compression_alpha` dyncfg (default off) that
controls per-column arrangement compression, mirroring the existing
`enable_arrangement_dictionary_compression_alpha` wiring end to end: captured
once at replica creation, threaded through InstanceConfig/ReplicaConfig, and
applied as a one-shot store into the process-global COLUMN_COMPRESSION flag at
instance creation. The two flags are independent and the codecs are mutually
exclusive at runtime.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… lists

bin/lint-test-flags requires every system flag to be known to
parallel-workload's FlipFlagsAction and to the mzcompose system-parameter
lists. Mirror the dictionary-compression flag's entries for the new
per-column compression flag.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Addresses review feedback on MaterializeInc#37111: scope
`enable_arrangement_column_compression_alpha` to the replica
(`ParameterScope::Replica`) so it can carry per-replica / per-size-family
overrides, like the other replica-local physical flags (lgalloc, persist pager,
column-paged batcher).

This also replaces the manual capture chain the flag previously borrowed from
dictionary compression — InstanceConfig/ReplicaConfig fields, the controller's
per-replica capture, and the `handle_create_instance` store — with the
replica-scoped idiom: the per-replica-resolved value arrives in the replica's
`worker_config`, and `apply_worker_config` mirrors it into the process-global
`mz_row_spine::COLUMN_COMPRESSION` flag (next to the lgalloc-region store).
The controller-level capture read only the environment value and would have
missed per-replica overrides. Re-applying on each config tick is safe: every
batch records its own codec, so a flip only affects batches sealed afterwards.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Review follow-ups on MaterializeInc#37111:

- Correctness: `RowCodec::decode_into` walks a fixed column count, so a row whose
  arity differs from the codec's would read past its record (a panic, or silent
  corruption in release, caught before only by a debug assert). `RowStats` now
  tracks the first observed row's arity and `build()` declines on any mismatch.
  `build_column_codec` no longer skips empty rows — an empty (arity-0) row is a
  real arity signal, so a mixed-arity container correctly gets no codec.
- Docs/comments: fix `HuffmanCode::from_frequencies` doc (a single distinct byte
  yields a 1-bit code; `None` only for empty input or codes exceeding MAX_BITS);
  note dictionary codes are insertion-ordered (not order-preserving); explain the
  intentionally-unread final `best_bytes` write and that CMP_SCRATCH buffers don't
  shrink.
- Tests: Huffman selected end-to-end (high-cardinality skewed column), the merge
  re-encode path (push an encoded item into another codec'd container), mixed /
  empty-row arity declines, `code_lengths` MAX_BITS decline, and the
  `DICT_MAX_CARDINALITY` retain/drop boundary.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Comment thread src/row-spine/src/lib.rs
#[allow(dead_code)]
mod codec;
#[allow(dead_code)]
mod huffman;

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not actually dead anymore I think?

Comment thread src/row-spine/src/lib.rs
Comment on lines +1559 to +1564
if let Some(codec) = self.column_codec {
// `iter.data` is RowCodec-encoded; decode to raw row bytes.
let mut buf = Vec::new();
codec.decode_into(self.iter.data, &mut buf);
return unsafe { Row::from_bytes_unchecked(&buf) };
}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably better to decode into an arena here if one is available than allocate per row

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup. We don't have that until #37115 lands, I think.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should go read that then :)

let mut best = ColumnCodec::Raw;
let mut best_bytes = self.raw_bytes;

if let Some(code) = self.freq.build() {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably pretty minor in the grand scheme of things but we do the expensive check first. Calling build on the huffman code calls code_lengths which is slightly expensive (BinaryHeap over >= 256 symbols plus allocs). One could optimize this by running the cheap codecs first to lower best_bytes and then gate the huffman build behind a cheap lower bound like let huffman_floor = self.count.saturating_mul(2).saturating_add(size_of::<HuffmanCode>() + 256);

Again, pretty minor and might need some stats/profile to prove if its really a problem.

@antiguru antiguru left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems good, but I'd like to see a test/nightly run with the flag enabled. At the moment it seems mostly untested delta the unit tests (thanks for adding!)

I'll approve, don't see a blocker here.

Comment on lines +502 to +503
// NB: per-column compression is replica-scoped and applied in `apply_worker_config`
// (called just above), not captured on `InstanceConfig`.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Comment reads like a note-to-self.

// off the end of an empty record at decode time.
match self.arity {
None => self.arity = Some(count),
Some(arity) if arity != count => self.mixed_arity = true,

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In what situation would we expected mixed arity rows?

"true" if version >= MzVersion.parse_mz("v0.132.0-dev") else "false"
),
"enable_alter_swap": "true",
"enable_arrangement_column_compression_alpha": "false",

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to see a test/nightly run with compression enabled.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants