Fix dictionary/output buffer aliasing corruption by Shane98c · Pull Request #1 · handlerug/fzstd

Shane98c · 2026-06-18T17:37:36Z

Builds on the dictionary-decompression support added in this branch (101arrowz#18).

The bug

When a dictionary is loaded, rdic() replaces st.w with the dictionary history:

st.w = dic.slice();
st.e = dic.length;

In decompress() (no output buffer provided), the fast path then reuses st.w directly as the output buffer when its length equals the frame's decompressed size:

if (st.w.length == st.u) {
  bufs.push(buf = st.w);   // <-- st.w is now the dictionary history
  ol += st.u;
}

So output is written into the same buffer that holds the dictionary history, while sequence back-references read from it. As soon as a back-reference points into the dictionary region that has already been overwritten by output, the result is silently corrupted (no error thrown). It only triggers when the dictionary-content length happens to equal the decompressed size, so it slips through typical tests.

The fix

Skip the fast path when a dictionary is present, so a separate output buffer is used and the dictionary history stays read-only:

if (!dic && st.w.length == st.u) { ... }

Reproduction

# 200-byte raw-content dictionary; payload identical to it so it compresses to
# a single back-reference into the dictionary; decompressed size (200) == dict
# content length (200) -> triggers the fast path.
python3 -c 'b=bytes((i*7+13)%256 for i in range(200)); open("d","wb").write(b); open("p","wb").write(b)'
zstd -19 -D d p -o p.zst

zstd -d -D d p.zst -o out      # reference zstd: correct
# fzstd.decompress(p.zst, undefined, d) BEFORE this fix: corrupted (differs from `out`)
#                                       AFTER this fix:  matches `out`

A regression test covering exactly this case is included in tests/simple_cases_test.ts (generates the dictionary, embeds the 22-byte compressed frame, asserts the decode equals the dictionary content).

Context

Found while integrating this branch into a read-only Icechunk reader (icechunk-js), which uses zstd dictionary compression for virtual chunk locations. Verified the fix against the reference zstd CLI and confirmed it does not regress non-dictionary decompression.

handlerug · 2026-06-18T18:40:28Z

Thanks!

Fix dictionary/output buffer aliasing corruption

54c0b06

This was referenced Jun 18, 2026

Support loading a dictionary for decompression 101arrowz/fzstd#18

Open

Support dictionary-compressed virtual chunk locations EarthyScience/icechunk-js#22

Merged

handlerug merged commit 8b0734c into handlerug:push-qxkytkvukoqs Jun 18, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix dictionary/output buffer aliasing corruption#1

Fix dictionary/output buffer aliasing corruption#1
handlerug merged 1 commit into
handlerug:push-qxkytkvukoqsfrom
Shane98c:fix/dictionary-output-buffer-aliasing

Shane98c commented Jun 18, 2026

Uh oh!

handlerug commented Jun 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Shane98c commented Jun 18, 2026

The bug

The fix

Reproduction

Context

Uh oh!

handlerug commented Jun 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants