Skip to content

dfa1/zstd-java

Repository files navigation

zstd-java

CI Quality Gate Status Coverage Maven Central zstd Java License

zstd-java is an FFM-based alternative to the excellent zstd-jni for early adopters on JDK 25+. It wraps Zstandard through the Foreign Function & Memory (FFM) API — no JNI, no sun.misc.Unsafe, no hand-written C (JDK 25 is the first LTS with stable java.lang.foreign).

It leans into two things FFM makes natural:

  • Dictionary compression, trained straight from your own data — the big win on small, repetitive records (logs, market-data ticks, JSON/Avro rows, FIX messages).
  • A zero-copy MemorySegment API — compress/decompress off-heap buffers (an mmap'd slice in, an arena buffer out) with no heap copy and no per-call allocation.

Quickstart

One-shot round-trip with byte[] — the convenient path:

import io.github.dfa1.zstd.Zstd;

byte[] data = ...;
byte[] frame = Zstd.compress(data);        // or Zstd.compress(data, level)
byte[] back  = Zstd.decompress(frame);     // size read from the frame header

Dictionary — train on a sample of your records, then compress each one against the dictionary (huge ratio gains on small, similar messages):

import io.github.dfa1.zstd.*;
import java.util.List;

List<byte[]> samples = ...;                       // representative records
ZstdDictionary dict = ZstdDictionary.train(samples, 8 * 1024);

byte[] message = ...;
try (ZstdCompressContext cctx = new ZstdCompressContext();
     ZstdDecompressContext dctx = new ZstdDecompressContext()) {
    byte[] frame = cctx.compress(message, dict);
    byte[] back  = dctx.decompress(frame, message.length, dict);
}

Zero-copy — off-heap in, off-heap out, no byte[], no per-call allocation:

import io.github.dfa1.zstd.*;
import java.lang.foreign.*;

try (Arena arena = Arena.ofConfined();
     ZstdCompressContext cctx = new ZstdCompressContext();
     ZstdDecompressContext dctx = new ZstdDecompressContext()) {

    MemorySegment src     = ...;                       // e.g. an mmap'd file slice
    MemorySegment frame   = cctx.compress(arena, src); // off-heap → off-heap
    MemorySegment restored = dctx.decompress(arena, frame);
}

Run with --enable-native-access=ALL-UNNAMED. Full walkthrough in the tutorial; hot-path and dictionary recipes in the how-to guides.

Performance

Microbenchmarks against the common JVM zstd options (JMH; Apple M5, JDK 25, all linking the same zstd 1.5.7). Full methodology and tables in docs/benchmarks.md — including the honest ties.

Best vs best — our zero-copy MemorySegment path vs zstd-jni's own zero-copy direct-ByteBuffer path (golden-corpus fixtures, publication-grade run):

operation (payload) zstd-java MemorySegment zstd-jni ByteBuffer edge
compress http (1.2 KiB) 353.6 322.1 +9.8%
decompress http 922.7 750.8 +22.9%
decompress large-literal (200 KiB) 56.1 55.6 tie

(throughput, ops/ms, higher is better; allocation is ~0 B/op on both — both genuinely zero-copy)

The edge is FFM's lower per-call overhead — largest on small payloads, converging to a tie when codec/bandwidth dominates. Against the convenient byte[] / JNI APIs (which allocate the output every call), the segment path is additionally allocation-free: flat ~0 B/op at any size vs MB/op that scales with the payload — no GC pressure on the hot path.

Install

The zstd jar is pure Java and ships no libzstd — you always pair it with a native artifact. Two ways:

1. Everything, all supported platforms — one dependency on zstd-platform, an empty jar that transitively pulls the bindings plus all six natives (~3.8 MB). Zero choices; the build runs on any supported OS/arch.

<dependency>
  <groupId>io.github.dfa1.zstd</groupId>
  <artifactId>zstd-platform</artifactId>
  <version>0.7</version>
</dependency>

2. Leaner, one platform — import zstd-bom to pin versions, then take zstd plus only the zstd-native-<classifier> you target.

<dependencyManagement>
  <dependencies>
    <dependency>
      <groupId>io.github.dfa1.zstd</groupId>
      <artifactId>zstd-bom</artifactId>
      <version>0.7</version>
      <type>pom</type>
      <scope>import</scope>
    </dependency>
  </dependencies>
</dependencyManagement>

<dependencies>
  <dependency>
    <groupId>io.github.dfa1.zstd</groupId>
    <artifactId>zstd</artifactId>
  </dependency>
  <dependency>
    <groupId>io.github.dfa1.zstd</groupId>
    <artifactId>zstd-native-osx-aarch64</artifactId>
    <scope>runtime</scope>
  </dependency>
</dependencies>

Classifiers: osx-aarch64, osx-x86_64, linux-x86_64, linux-aarch64, windows-x86_64, windows-aarch64 — each verified on real hardware by the release smoke matrix. Gradle and more detail in the tutorial. Requires JDK 25+ and --enable-native-access=ALL-UNNAMED at runtime. Building from source is for contributors — see the reference.

Documentation

The docs follow the Diátaxis framework:

Purpose Start here
Tutorial Learning by doing Clean checkout → first round-trip
How-to guides Solving a specific task Hot paths, dictionaries, zero-copy, self-built lib
Reference Looking up facts Platforms, API surface, symbol coverage, build
Explanation Understanding the why Why FFM + Zig, when zero-copy pays, benchmarks

Architecture decisions are recorded as ADRs (MADR 3.0) — the foundational choices and their trade-offs, one file per decision.

License

BSD 3-Clause — the same primary license as zstd, which is bundled under its BSD terms (zstd is dual BSD / GPLv2, © Meta Platforms, Inc.).


AI-assisted development: This project uses Claude Code for implementation — C header mapping, test generation, docs. Architecture, API design, and all decisions are human-driven.

About

Java FFM (Foreign Function & Memory) bindings for zstd — hermetic zig build, dictionary support, zero-copy MemorySegment API

Topics

Resources

License

Stars

Watchers

Forks

Contributors