Skip to content

Fix dictionary/output buffer aliasing corruption#1

Merged
handlerug merged 1 commit into
handlerug:push-qxkytkvukoqsfrom
Shane98c:fix/dictionary-output-buffer-aliasing
Jun 18, 2026
Merged

Fix dictionary/output buffer aliasing corruption#1
handlerug merged 1 commit into
handlerug:push-qxkytkvukoqsfrom
Shane98c:fix/dictionary-output-buffer-aliasing

Conversation

@Shane98c

Copy link
Copy Markdown

Builds on the dictionary-decompression support added in this branch (101arrowz#18).

The bug

When a dictionary is loaded, rdic() replaces st.w with the dictionary history:

st.w = dic.slice();
st.e = dic.length;

In decompress() (no output buffer provided), the fast path then reuses st.w directly as the output buffer when its length equals the frame's decompressed size:

if (st.w.length == st.u) {
  bufs.push(buf = st.w);   // <-- st.w is now the dictionary history
  ol += st.u;
}

So output is written into the same buffer that holds the dictionary history, while sequence back-references read from it. As soon as a back-reference points into the dictionary region that has already been overwritten by output, the result is silently corrupted (no error thrown). It only triggers when the dictionary-content length happens to equal the decompressed size, so it slips through typical tests.

The fix

Skip the fast path when a dictionary is present, so a separate output buffer is used and the dictionary history stays read-only:

if (!dic && st.w.length == st.u) { ... }

Reproduction

# 200-byte raw-content dictionary; payload identical to it so it compresses to
# a single back-reference into the dictionary; decompressed size (200) == dict
# content length (200) -> triggers the fast path.
python3 -c 'b=bytes((i*7+13)%256 for i in range(200)); open("d","wb").write(b); open("p","wb").write(b)'
zstd -19 -D d p -o p.zst

zstd -d -D d p.zst -o out      # reference zstd: correct
# fzstd.decompress(p.zst, undefined, d) BEFORE this fix: corrupted (differs from `out`)
#                                       AFTER this fix:  matches `out`

A regression test covering exactly this case is included in tests/simple_cases_test.ts (generates the dictionary, embeds the 22-byte compressed frame, asserts the decode equals the dictionary content).

Context

Found while integrating this branch into a read-only Icechunk reader (icechunk-js), which uses zstd dictionary compression for virtual chunk locations. Verified the fix against the reference zstd CLI and confirmed it does not regress non-dictionary decompression.

@handlerug

Copy link
Copy Markdown
Owner

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants