Skip to content

Commit 3dfd5b8

Browse files
dfa1claude
andauthored
feat: context recycling (reset) + dictionaries on contexts (load/ref) (#26)
* feat: add ZSTD_CCtx_reset / ZSTD_DCtx_reset context recycling Bind ZSTD_CCtx_reset and ZSTD_DCtx_reset behind a new ZstdResetDirective enum (SESSION_ONLY, PARAMETERS, SESSION_AND_PARAMETERS), exposed as ZstdCompressCtx.reset(...) / ZstdDecompressCtx.reset(...). Lets a pooled or long-lived context recycle its native state between frames without freeing and recreating it. A parameter reset clears the context back to defaults, so the compress context drops its cached level back to Zstd.defaultCompressionLevel() to stay in sync with the native state. Document the recipe in docs/how-to.md, flip both symbols to bound in docs/supported.md (advanced-parameter count 8 -> 10, total 55 -> 57), and add the 0.5 changelog entry. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * feat: load/ref dictionaries on contexts (loadDictionary / refDictionary) Bind ZSTD_CCtx/DCtx_loadDictionary on the one-shot contexts (previously only on streams) and the new ZSTD_CCtx_refCDict / ZSTD_DCtx_refDDict. A sticky dictionary on a context lets compression combine a dictionary with the advanced parameters (checksum, window log, long-distance matching) via the compress2 path — impossible through the per-call compress(src, dict) overloads, which route the legacy dictionary path. refDictionary attaches a pre-digested CDict/DDict by reference (no copy, no per-call digest), the pooled-context hot path that pairs with reset. A parameter reset clears either. loadDictionary takes a ZstdDictionary or a native MemorySegment (zero-copy); refDictionary borrows the digest, so the caller keeps it alive. Tests cover dict+checksum combine, ref-across-session-reset, reset/null clearing, segment load, and zstd-jni interop on both frames. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
1 parent e9b3d5e commit 3dfd5b8

11 files changed

Lines changed: 606 additions & 8 deletions

File tree

CHANGELOG.md

Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,39 @@ All notable changes to this project are documented here. Format loosely follows
44
[Keep a Changelog](https://keepachangelog.com/); versions are released as `v*`
55
git tags, which trigger publication to Maven Central.
66

7+
## [0.5]
8+
9+
### Added
10+
- `ZstdCompressCtx.reset(ZstdResetDirective)` / `ZstdDecompressCtx.reset(...)`
11+
recycle a context's native state between frames without freeing and recreating
12+
it. `SESSION_ONLY` keeps the level, parameters, and dictionary; `PARAMETERS` /
13+
`SESSION_AND_PARAMETERS` restore the defaults. Binds `ZSTD_CCtx_reset` /
14+
`ZSTD_DCtx_reset`.
15+
- `ZstdCompressCtx.loadDictionary(...)` / `ZstdDecompressCtx.loadDictionary(...)`
16+
(a `ZstdDictionary` or a native `MemorySegment`) and `refDictionary(...)` (a
17+
pre-digested `ZstdCompressDict` / `ZstdDecompressDict`, attached by reference,
18+
no copy). A sticky dictionary on the context lets compression combine a
19+
dictionary with the advanced parameters (checksum, window log, long-distance
20+
matching) — impossible through the per-call `compress(src, dict)` overloads,
21+
which route the legacy dictionary path. A parameter `reset(...)` clears it.
22+
Binds `ZSTD_CCtx_loadDictionary` / `ZSTD_DCtx_loadDictionary` (now on contexts,
23+
not just streams), `ZSTD_CCtx_refCDict`, `ZSTD_DCtx_refDDict`.
24+
25+
### Changed
26+
- `NativeLibrary.classifier()` now throws a clear `UnsatisfiedLinkError` naming
27+
the unsupported CPU arch instead of silently mapping it to x86_64 (which
28+
deferred failure to a cryptic `dlopen` error). Added an explicit `amd64`
29+
branch so Linux JVMs (which report `os.arch=amd64`) still resolve x86_64.
30+
([ea1ac84](https://github.com/dfa1/zstd-java/commit/ea1ac84))
31+
32+
### Fixed
33+
- Native JARs are much smaller. The ELF shared library is now stripped at link
34+
time (`-s`), dropping debug info (`libzstd.so` 4.0M -> ~650K), and the
35+
multi-MB `.pdb` debug database and `.lib` import library that lld emits next
36+
to the Windows `.dll` are no longer bundled (neither is needed at runtime).
37+
Net: linux-x86_64 native jar 1.2M -> 285K, windows-x86_64 1.2M -> 372K.
38+
([ea1ac84](https://github.com/dfa1/zstd-java/commit/ea1ac84))
39+
740
## [0.4]
841

942
### Added

docs/how-to.md

Lines changed: 58 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,64 @@ try (ZstdCompressCtx cctx = new ZstdCompressCtx().level(19);
1818
Pick the level explicitly with `Zstd.maxCompressionLevel()` /
1919
`minCompressionLevel()` when you need the extreme ends.
2020

21+
## Reset a context to recycle it
22+
23+
A context is already reusable across whole `compress` / `decompress` calls. Reset
24+
goes further: it recycles the *native state* of one context — for pooled contexts,
25+
or to abort a half-written frame and start clean — without freeing and recreating
26+
it. Pick what to clear with `ZstdResetDirective`:
27+
28+
```java
29+
try (ZstdCompressCtx cctx = new ZstdCompressCtx().level(19)) {
30+
byte[] a = cctx.compress(first);
31+
32+
// Cheap: drop any unflushed frame state, keep the level and parameters.
33+
cctx.reset(ZstdResetDirective.SESSION_ONLY);
34+
byte[] b = cctx.compress(second);
35+
36+
// Full wipe: parameters back to default, dictionary cleared, level reset to
37+
// Zstd.defaultCompressionLevel(). Only valid between frames, not mid-frame.
38+
cctx.reset(ZstdResetDirective.SESSION_AND_PARAMETERS);
39+
}
40+
```
41+
42+
`ZstdDecompressCtx.reset(...)` works the same way. Reuse alone amortises
43+
allocation; reset lets a long-lived or pooled context return to a known state
44+
without churning native memory.
45+
46+
## Compress with a dictionary *and* advanced parameters
47+
48+
The per-call `compress(src, dict)` overloads take the legacy dictionary path,
49+
which ignores the advanced parameters (checksum, window log, long-distance
50+
matching) set on the context. To combine the two, make the dictionary *sticky*
51+
with `loadDictionary` — then the normal `compress` path honours both:
52+
53+
```java
54+
try (ZstdCompressCtx cctx = new ZstdCompressCtx().level(19).checksum(true)) {
55+
cctx.loadDictionary(dict); // ZstdDictionary, or a native MemorySegment
56+
byte[] frame = cctx.compress(record); // dictionary + checksum, together
57+
}
58+
```
59+
60+
For a dictionary reused across a pool of contexts, digest it once and attach it
61+
by reference — no per-call digesting, no copy. It pairs with `reset` for a
62+
pooled, recycled context:
63+
64+
```java
65+
try (ZstdCompressDict cdict = new ZstdCompressDict(dict, 19)) {
66+
// one cctx per pooled worker, all sharing the one digested dictionary
67+
try (ZstdCompressCtx cctx = new ZstdCompressCtx()) {
68+
cctx.refDictionary(cdict); // borrowed; cdict must outlive cctx
69+
byte[] a = cctx.compress(first);
70+
cctx.reset(ZstdResetDirective.SESSION_ONLY); // recycle, keep the dictionary
71+
byte[] b = cctx.compress(second);
72+
}
73+
}
74+
```
75+
76+
A loaded or referenced dictionary stays until replaced, cleared with `null`, or
77+
dropped by a parameter `reset`. `ZstdDecompressCtx` mirrors all of this.
78+
2179
## Compress many small payloads with a dictionary
2280

2381
For many small, similar payloads (log lines, JSON records, protobufs), a

docs/supported.md

Lines changed: 10 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -33,7 +33,7 @@ rather than the deprecated `ZSTD_getDecompressedSize`.
3333
| Dictionary training (ZDICT) | 8 / 12 | trainFromBuffer, cover/fastCover optimizers, finalizeDictionary, getDictHeaderSize |
3434
| Streaming — compress | 3 / 22 | `ZstdOutputStream` (compressStream2 + buffer sizes) |
3535
| Streaming — decompress | 3 / 15 | `ZstdInputStream` (decompressStream + buffer sizes) |
36-
| Advanced parameters | 8 / 38 | all `ZSTD_cParameter` + `ZSTD_dParameter` via `ZstdCompressParameter`/`ZstdDecompressParameter`; `compress2`, `C/DCtx_setParameter`, `loadDictionary`, `c/dParam_getBounds`; MT inert on single-thread build |
36+
| Advanced parameters | 12 / 38 | all `ZSTD_cParameter` + `ZSTD_dParameter` via `ZstdCompressParameter`/`ZstdDecompressParameter`; `compress2`, `C/DCtx_setParameter`, `C/DCtx_reset`, `C/DCtx_loadDictionary`, `CCtx_refCDict`/`DCtx_refDDict`, `c/dParam_getBounds`; MT inert on single-thread build |
3737
| Frame inspection | 10 / 13 | `ZstdFrame` + getFrameProgression; `_advanced` not bound |
3838
| Memory sizing | 8 / 14 | sizeof_C/DCtx, sizeof_C/DDict, estimate C/DCtx + C/DDict size |
3939
| Low-level block | 0 / 12 | expert block/continue API not bound |
@@ -63,10 +63,12 @@ rather than the deprecated `ZSTD_getDecompressedSize`.
6363
| `ZSTD_compress2`, `ZSTD_CCtx_setParameter` | `ZstdCompressCtx.parameter` / `checksum` / `longDistanceMatching` / `windowLog` (all of `ZstdCompressParameter`) |
6464
| `ZSTD_DCtx_setParameter` | `ZstdDecompressCtx.parameter` / `windowLogMax` (`ZstdDecompressParameter`) |
6565
| `ZSTD_CCtx_setPledgedSrcSize` | `ZstdOutputStream.withPledgedSize` |
66+
| `ZSTD_CCtx_reset`, `ZSTD_DCtx_reset` | `ZstdCompressCtx.reset` / `ZstdDecompressCtx.reset` (`ZstdResetDirective`) |
6667
| `ZSTD_getDictID_fromCDict`, `ZSTD_getDictID_fromDDict` | `ZstdCompressDict.id()` / `ZstdDecompressDict.id()` |
6768
| `ZSTD_getErrorString` | `ZstdErrorCode.description()` |
6869
| `ZSTD_cParam_getBounds`, `ZSTD_dParam_getBounds` | `ZstdCompressParameter.bounds()` / `ZstdDecompressParameter.bounds()` (`ZstdBounds`) |
69-
| `ZSTD_CCtx_loadDictionary`, `ZSTD_DCtx_loadDictionary` | `ZstdOutputStream` / `ZstdInputStream` dictionary constructors |
70+
| `ZSTD_CCtx_loadDictionary`, `ZSTD_DCtx_loadDictionary` | `ZstdCompressCtx.loadDictionary` / `ZstdDecompressCtx.loadDictionary`; `ZstdOutputStream` / `ZstdInputStream` dictionary constructors |
71+
| `ZSTD_CCtx_refCDict`, `ZSTD_DCtx_refDDict` | `ZstdCompressCtx.refDictionary` / `ZstdDecompressCtx.refDictionary` |
7072
| `ZSTD_isFrame`, `ZSTD_findFrameCompressedSize`, `ZSTD_decompressBound`, `ZSTD_getDictID_fromFrame`, `ZSTD_getFrameHeader`, `ZSTD_isSkippableFrame`, `ZSTD_writeSkippableFrame`, `ZSTD_readSkippableFrame` | `ZstdFrame` (+ `ZstdFrameHeader`, `ZstdFrameType`, `ZstdSkippableContent`) |
7173
| `ZSTD_getErrorCode` | `ZstdException.code()` (+ `ZstdErrorCode`) |
7274
| `ZSTD_getFrameProgression` | `ZstdCompressStream.progress()` (`ZstdFrameProgression`) |
@@ -90,7 +92,7 @@ zstd-jni's JNI sources (v1.5.7-11, `src/main/native/*.c`). The latter is
9092
symbol-exact, not functional equivalence: zstd-jni may expose an operation through
9193
a different symbol than this library — e.g. it routes one-shot compression through
9294
`ZSTD_compress2`, so `ZSTD_compress` reads `` for it even though `Zstd.compress`
93-
works. zstd-jni references 53 of these symbols; this library binds 55. They
95+
works. zstd-jni references 53 of these symbols; this library binds 59. They
9496
overlap on the modern context/streaming API and diverge mainly on zstd-jni's
9597
sequence-producer hooks vs this library's frame-inspection and typed-error surface.
9698

@@ -231,7 +233,7 @@ sequence-producer hooks vs this library's frame-inspection and typed-error surfa
231233
| `ZSTD_resetDStream` | — ᵈ ||
232234
| `ZSTD_sizeof_DStream` |||
233235

234-
### Advanced parameters (8/38)
236+
### Advanced parameters (12/38)
235237

236238
| Symbol | Bound | zstd-jni |
237239
|---|:---:|:---:|
@@ -245,11 +247,11 @@ sequence-producer hooks vs this library's frame-inspection and typed-error surfa
245247
| `ZSTD_CCtx_loadDictionary` |||
246248
| `ZSTD_CCtx_loadDictionary_advanced` |||
247249
| `ZSTD_CCtx_loadDictionary_byReference` |||
248-
| `ZSTD_CCtx_refCDict` | ||
250+
| `ZSTD_CCtx_refCDict` | ||
249251
| `ZSTD_CCtx_refPrefix` |||
250252
| `ZSTD_CCtx_refPrefix_advanced` |||
251253
| `ZSTD_CCtx_refThreadPool` |||
252-
| `ZSTD_CCtx_reset` | ||
254+
| `ZSTD_CCtx_reset` | ||
253255
| `ZSTD_CCtx_setCParams` |||
254256
| `ZSTD_CCtx_setFParams` |||
255257
| `ZSTD_CCtx_setParameter` |||
@@ -260,10 +262,10 @@ sequence-producer hooks vs this library's frame-inspection and typed-error surfa
260262
| `ZSTD_DCtx_loadDictionary` |||
261263
| `ZSTD_DCtx_loadDictionary_advanced` |||
262264
| `ZSTD_DCtx_loadDictionary_byReference` |||
263-
| `ZSTD_DCtx_refDDict` | ||
265+
| `ZSTD_DCtx_refDDict` | ||
264266
| `ZSTD_DCtx_refPrefix` |||
265267
| `ZSTD_DCtx_refPrefix_advanced` |||
266-
| `ZSTD_DCtx_reset` | ||
268+
| `ZSTD_DCtx_reset` | ||
267269
| `ZSTD_DCtx_setFormat` | — ᵈ ||
268270
| `ZSTD_DCtx_setMaxWindowSize` |||
269271
| `ZSTD_DCtx_setParameter` |||

integration-tests/src/test/java/io/github/dfa1/zstd/it/ZstdJniInteropTest.java

Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,7 @@
44
import com.github.luben.zstd.ZstdDictDecompress;
55
import io.github.dfa1.zstd.Zstd;
66
import io.github.dfa1.zstd.ZstdCompressCtx;
7+
import io.github.dfa1.zstd.ZstdCompressDict;
78
import io.github.dfa1.zstd.ZstdDecompressCtx;
89
import io.github.dfa1.zstd.ZstdDictionary;
910
import io.github.dfa1.zstd.ZstdInputStream;
@@ -124,6 +125,39 @@ void jniDictCompressJavaDictDecompress() {
124125
assertThat(restored).isEqualTo(record);
125126
}
126127

128+
@Test
129+
void javaLoadedDictWithChecksumJniDictDecompress() {
130+
// A sticky loaded dictionary combined with an advanced parameter
131+
// (checksum) — the COMPRESS2 path — must still produce a frame zstd-jni
132+
// decodes against the same dictionary.
133+
ZstdDictionary dict = trainDict();
134+
byte[] record = record(33);
135+
136+
byte[] frame;
137+
try (ZstdCompressCtx ctx = new ZstdCompressCtx().checksum(true)) {
138+
ctx.loadDictionary(dict);
139+
frame = ctx.compress(record);
140+
}
141+
ZstdDictDecompress jniDict = new ZstdDictDecompress(dict.toByteArray());
142+
assertThat(com.github.luben.zstd.Zstd.decompress(frame, jniDict, record.length)).isEqualTo(record);
143+
}
144+
145+
@Test
146+
void javaReferencedDigestedDictJniDictDecompress() {
147+
// A frame from a context referencing a digested CDict must decode in zstd-jni.
148+
ZstdDictionary dict = trainDict();
149+
byte[] record = record(44);
150+
151+
byte[] frame;
152+
try (ZstdCompressDict cdict = new ZstdCompressDict(dict, Zstd.defaultCompressionLevel());
153+
ZstdCompressCtx ctx = new ZstdCompressCtx()) {
154+
ctx.refDictionary(cdict);
155+
frame = ctx.compress(record);
156+
}
157+
ZstdDictDecompress jniDict = new ZstdDictDecompress(dict.toByteArray());
158+
assertThat(com.github.luben.zstd.Zstd.decompress(frame, jniDict, record.length)).isEqualTo(record);
159+
}
160+
127161
private ZstdDictionary trainDict() {
128162
List<byte[]> samples = new ArrayList<>();
129163
for (int i = 0; i < 3000; i++) {

zstd/src/main/java/io/github/dfa1/zstd/Bindings.java

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -138,6 +138,11 @@ final class Bindings {
138138
NativeLibrary.lookup("ZSTD_CCtx_setParameter",
139139
FunctionDescriptor.of(JAVA_LONG, ADDRESS, JAVA_INT, JAVA_INT));
140140

141+
// size_t ZSTD_CCtx_reset(ZSTD_CCtx*, ZSTD_ResetDirective)
142+
static final MethodHandle CCTX_RESET =
143+
NativeLibrary.lookup("ZSTD_CCtx_reset",
144+
FunctionDescriptor.of(JAVA_LONG, ADDRESS, JAVA_INT));
145+
141146
// size_t ZSTD_compress2(ZSTD_CCtx*, void* dst, size_t dstCap, const void* src, size_t srcSize)
142147
// Uses the advanced parameters set on the context (unlike ZSTD_compressCCtx).
143148
static final MethodHandle COMPRESS2 =
@@ -149,6 +154,11 @@ final class Bindings {
149154
NativeLibrary.lookup("ZSTD_DCtx_setParameter",
150155
FunctionDescriptor.of(JAVA_LONG, ADDRESS, JAVA_INT, JAVA_INT));
151156

157+
// size_t ZSTD_DCtx_reset(ZSTD_DCtx*, ZSTD_ResetDirective)
158+
static final MethodHandle DCTX_RESET =
159+
NativeLibrary.lookup("ZSTD_DCtx_reset",
160+
FunctionDescriptor.of(JAVA_LONG, ADDRESS, JAVA_INT));
161+
152162
// size_t ZSTD_CCtx_setPledgedSrcSize(ZSTD_CCtx*, unsigned long long pledgedSrcSize)
153163
static final MethodHandle CCTX_SET_PLEDGED_SRC_SIZE =
154164
NativeLibrary.lookup("ZSTD_CCtx_setPledgedSrcSize",
@@ -238,6 +248,10 @@ final class Bindings {
238248
static final MethodHandle COMPRESS_USING_CDICT =
239249
NativeLibrary.lookup("ZSTD_compress_usingCDict",
240250
FunctionDescriptor.of(JAVA_LONG, ADDRESS, ADDRESS, JAVA_LONG, ADDRESS, JAVA_LONG, ADDRESS));
251+
// size_t ZSTD_CCtx_refCDict(ZSTD_CCtx*, const ZSTD_CDict*)
252+
static final MethodHandle CCTX_REF_CDICT =
253+
NativeLibrary.lookup("ZSTD_CCtx_refCDict",
254+
FunctionDescriptor.of(JAVA_LONG, ADDRESS, ADDRESS));
241255

242256
// ZSTD_DDict* ZSTD_createDDict(const void* dict, size_t dictSize)
243257
static final MethodHandle CREATE_DDICT =
@@ -250,6 +264,10 @@ final class Bindings {
250264
static final MethodHandle DECOMPRESS_USING_DDICT =
251265
NativeLibrary.lookup("ZSTD_decompress_usingDDict",
252266
FunctionDescriptor.of(JAVA_LONG, ADDRESS, ADDRESS, JAVA_LONG, ADDRESS, JAVA_LONG, ADDRESS));
267+
// size_t ZSTD_DCtx_refDDict(ZSTD_DCtx*, const ZSTD_DDict*)
268+
static final MethodHandle DCTX_REF_DDICT =
269+
NativeLibrary.lookup("ZSTD_DCtx_refDDict",
270+
FunctionDescriptor.of(JAVA_LONG, ADDRESS, ADDRESS));
253271

254272
// --- dictionary training (ZDICT, from dictBuilder) ---
255273

zstd/src/main/java/io/github/dfa1/zstd/NativeCall.java

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -57,6 +57,13 @@ private static String errorName(long code) {
5757
}
5858
}
5959

60+
/// Whether `seg` denotes "no segment": either a Java `null` reference or the
61+
/// [MemorySegment#NULL] zero-address sentinel. Both map to a null pointer in C,
62+
/// which the dictionary entry points read as "clear".
63+
static boolean isNull(MemorySegment seg) {
64+
return seg == null || MemorySegment.NULL.equals(seg);
65+
}
66+
6067
/// Guards a zero-copy entry point: the segment handed to zstd must be backed
6168
/// by native (off-heap) memory, since its address is dereferenced in C. Fails
6269
/// fast with a clear message instead of the FFM linker's cryptic error.

0 commit comments

Comments
 (0)