diff --git a/.claude/skills/improve-performance.md b/.claude/skills/improve-performance.md index ac7f599a6..f48a9ffa8 100644 --- a/.claude/skills/improve-performance.md +++ b/.claude/skills/improve-performance.md @@ -17,7 +17,7 @@ performance improvements in Java/Maven projects. ## Workflow Overview ``` -setup → benchmark → profile → analyse → change → repeat +setup → benchmark → profile → analyze → change → repeat ``` --- @@ -63,7 +63,7 @@ Store the baseline score. Compare every subsequent run against it. --- -## Step 4 — Analyse results +## Step 4 — Analyze results Key things to look for: @@ -128,7 +128,7 @@ Common optimizations to consider (in order of typical impact): constant-folds the stride / alignment / order. Inline `ValueLayout.JAVA_LONG_UNALIGNED` on each call defeats this. 4. **Use `getAtIndex` / `setAtIndex`** in tight loops over a `MemorySegment` — stride is implicit, - bounds check hoists, and the auto-vectoriser reads the shape cleanly. + bounds check hoists, and the auto-vectorizer reads the shape cleanly. 5. **Aligned arena allocation** — `arena.allocate(n, 64)` keeps SIMD-friendly addresses. 6. **Improve data locality** — colocate fields accessed together, prefer flat arrays / segments over linked structures. diff --git a/CHANGELOG.md b/CHANGELOG.md index d26dc33bc..b5b71ff9f 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -52,7 +52,7 @@ A **`vortex.zstd` overhaul**: compression now runs through FFM bindings to the n - `vortex.zstd` segments compressed with a shared (trained) dictionary now decode, via the native `libzstd` dictionary support, instead of being rejected. The upstream `zstd.vortex` compatibility fixture is read end-to-end and matches the Rust reference. ([#104](https://github.com/dfa1/vortex-java/issues/104)) - Writing a nullable `Utf8`/`Binary` column no longer throws `NullPointerException` (or silently drops nulls): nullable string columns now carry their validity like nullable primitives and round-trip through `vortex.masked`. As a result they decode as `MaskedArray` (validity + values child) rather than a bare `VarBinArray`. ([#168](https://github.com/dfa1/vortex-java/pull/168)) - CSV export now handles nullable columns (`MaskedArray`): null rows export as an empty field instead of failing with "unsupported array type for CSV export". ([#168](https://github.com/dfa1/vortex-java/pull/168)) -- Zone-map pruning now compares filter values in the *column's* type domain rather than by the boxed value's type. A predicate whose value is boxed at a different width (e.g. `Integer` on an `I64` column) — or any value on a `U64` column — previously pruned nothing and silently degraded to a full scan; it now prunes correctly (unsigned columns by unsigned order). As part of this, a filter value genuinely incomparable to its column (e.g. a `String` against a numeric column) now raises `VortexException` during the scan instead of silently disabling pruning — a behaviour change for callers that relied on the previous silent full scan. ([#159](https://github.com/dfa1/vortex-java/issues/159)) +- Zone-map pruning now compares filter values in the *column's* type domain rather than by the boxed value's type. A predicate whose value is boxed at a different width (e.g. `Integer` on an `I64` column) — or any value on a `U64` column — previously pruned nothing and silently degraded to a full scan; it now prunes correctly (unsigned columns by unsigned order). As part of this, a filter value genuinely incomparable to its column (e.g. a `String` against a numeric column) now raises `VortexException` during the scan instead of silently disabling pruning — a behavior change for callers that relied on the previous silent full scan. ([#159](https://github.com/dfa1/vortex-java/issues/159)) ## [0.9.0] — 2026-06-24 @@ -73,11 +73,11 @@ Two import-only breaking changes — the `vortex-core` types moved under `io.git ## [0.8.3] — 2026-06-23 -A **Sonar-driven refactoring** release: no new file-format capability, but a focused pass using SonarCloud findings to drive cleanups — dead code removed, duplication factored out, and one hot-loop micro-optimisation. Each finding was triaged (lead, not verdict) so the changes preserve behaviour and the JIT vectorisation of the hot decode loops. The interpretation framework behind this is now documented in `docs/testing.md`. +A **Sonar-driven refactoring** release: no new file-format capability, but a focused pass using SonarCloud findings to drive cleanups — dead code removed, duplication factored out, and one hot-loop micro-optimization. Each finding was triaged (lead, not verdict) so the changes preserve behavior and the JIT vectorization of the hot decode loops. The interpretation framework behind this is now documented in `docs/testing.md`. ### Performance -- `FastLanes.transposeIndex` / `iterateIndex`: replaced the per-element `%`/`/` + `ORDER[]` indirection with permutation tables built once in a static initialiser. Faster address generation keeps more outstanding scatter misses in flight; measured 1.4×–3.4× on the transpose/undelta kernels (Apple M5, L1→DRAM working sets). The per-element decode loops stay specialised per width to preserve C2 superword vectorisation. ([089b6e36](https://github.com/dfa1/vortex-java/commit/089b6e36), [e683a634](https://github.com/dfa1/vortex-java/commit/e683a634)) +- `FastLanes.transposeIndex` / `iterateIndex`: replaced the per-element `%`/`/` + `ORDER[]` indirection with permutation tables built once in a static initializer. Faster address generation keeps more outstanding scatter misses in flight; measured 1.4×–3.4× on the transpose/undelta kernels (Apple M5, L1→DRAM working sets). The per-element decode loops stay specialized per width to preserve C2 superword vectorization. ([089b6e36](https://github.com/dfa1/vortex-java/commit/089b6e36), [e683a634](https://github.com/dfa1/vortex-java/commit/e683a634)) ### Removed @@ -131,7 +131,7 @@ The headline is **writer-side zone-map statistics**: the writer now emits `vorte ## [0.8.1] — 2026-06-20 -A hardening release: no new file-format capability, but a large step up in verification rigour. Mutation testing (PIT) now guards the security-critical bounds/parse paths in core, reader, and writer at 99–100% kill rate; the build fails on any javac warning (`-Xlint:all -Werror`); and property-based round-trips exercise every lossless encoding plus the full cascade-selection pipeline against seeded-random inputs. The one functional addition is boxed-nullable array input on the map `writeChunk` path. +A hardening release: no new file-format capability, but a large step up in verification rigor. Mutation testing (PIT) now guards the security-critical bounds/parse paths in core, reader, and writer at 99–100% kill rate; the build fails on any javac warning (`-Xlint:all -Werror`); and property-based round-trips exercise every lossless encoding plus the full cascade-selection pipeline against seeded-random inputs. The one functional addition is boxed-nullable array input on the map `writeChunk` path. ### Added @@ -169,7 +169,7 @@ Read and write Vortex Variant (semi-structured, JSON-shaped) columns from Java. ### Added - Writer: `vortex.variant` encoder. Encodes a variant column as the canonical `vortex.variant` container over `core_storage` — an all-equal column becomes a single `vortex.constant`, a row-varying column a `vortex.chunked` of per-run constants — with an optional row-aligned typed `shredded` child recorded in `VariantMetadata.shredded_dtype`. Input is `VariantData(List)` with `.constant(n, v)` / `.shredded(...)` factories. Java↔Rust (JNI) round-trip verified for constant, row-varying, and shredded columns. Scalar values only — arbitrary nested objects need `vortex.parquet.variant` (deferred, [ADR 0014](docs/adr/0014-variant-encoding-strategy.md)). ([35da529d](https://github.com/dfa1/vortex-java/commit/35da529d), [e4e44980](https://github.com/dfa1/vortex-java/commit/e4e44980), [4566dca0](https://github.com/dfa1/vortex-java/commit/4566dca0)) -- Reader: variant columns now decode Java-side. `ConstantEncodingDecoder` and `ChunkedEncodingDecoder` handle `DType.Variant` (materialising the inner-typed array); `VariantEncodingDecoder` wraps the result as `VariantArray`, exposing `coreStorage()` and `shredded()`. ([76e4c741](https://github.com/dfa1/vortex-java/commit/76e4c741), [4566dca0](https://github.com/dfa1/vortex-java/commit/4566dca0)) +- Reader: variant columns now decode Java-side. `ConstantEncodingDecoder` and `ChunkedEncodingDecoder` handle `DType.Variant` (materializing the inner-typed array); `VariantEncodingDecoder` wraps the result as `VariantArray`, exposing `coreStorage()` and `shredded()`. ([76e4c741](https://github.com/dfa1/vortex-java/commit/76e4c741), [4566dca0](https://github.com/dfa1/vortex-java/commit/4566dca0)) ### Security @@ -236,7 +236,7 @@ CLI usability + reader robustness on real-world files (NYC Yellow Taxi). ### Added - CLI `view ` — scrollable Excel-like grid TUI. Streams rows on demand via a new `LazyGridSource` (one live chunk at a time, format only the visible window). Title bar shows `chunk K/N`. Default writes to alt-screen; quit with `q` / `Esc`. ([1c0311fb](https://github.com/dfa1/vortex-java/commit/1c0311fb), [b7f6b6c1](https://github.com/dfa1/vortex-java/commit/b7f6b6c1), [94e5bff8](https://github.com/dfa1/vortex-java/commit/94e5bff8), [6a8ddd3a](https://github.com/dfa1/vortex-java/commit/6a8ddd3a)) -- CLI `export` writes to a derived `.csv` next to the input by default, with a stderr progress bar mirroring the import flow. `export -` keeps the old stdout streaming behaviour. ([2b26da9a](https://github.com/dfa1/vortex-java/commit/2b26da9a)) +- CLI `export` writes to a derived `.csv` next to the input by default, with a stderr progress bar mirroring the import flow. `export -` keeps the old stdout streaming behavior. ([2b26da9a](https://github.com/dfa1/vortex-java/commit/2b26da9a)) - Reader: `ScanIterator.chunkRowCounts()` — returns per-chunk row counts by walking the layout tree, no value decode. Used by the `view` TUI to plan navigation. ([b7f6b6c1](https://github.com/dfa1/vortex-java/commit/b7f6b6c1)) - Reader: lazy `vortex.decimal` decode — new `LazyDecimalArray` record holds a zero-copy mmap slice and produces `BigDecimal` per `getDecimal(i)`. Replaces the `GenericArray` wrapper, no buffers / children indirection. ([6bc955d2](https://github.com/dfa1/vortex-java/commit/6bc955d2)) - Reader: 7 `Offset*Array` records (Long / Int / Short / Byte / Double / Float / Bool) + `VarBinArray.SlicedMode` for offset-based slicing of pre-decoded shared arrays. ([5df3d9a9](https://github.com/dfa1/vortex-java/commit/5df3d9a9)) @@ -244,7 +244,7 @@ CLI usability + reader robustness on real-world files (NYC Yellow Taxi). ### Fixed - Reader: per-column chunking alignment — files where one column has 1 mega-flat and another has N small flats (e.g. NYC Yellow Taxi 2024-01 has a 2.96M-row VendorID flat next to 23 × 131072-row datetime flats) now decode the wide column once into a `sharedArena` and slice it per chunk via `Offset*Array`. Previously the scan iterator emitted a single chunk whose datetime columns were the first 131072 rows only — silently dropping 95.6 % of the file. ([5df3d9a9](https://github.com/dfa1/vortex-java/commit/5df3d9a9)) -- Reader: `FrameOfReferenceEncodingDecoder` now takes the arena variant of `ArraySegments.of`, so lazy children (e.g. `LazyRunEndLongArray`) materialise instead of throwing "no primary segment". ([5df3d9a9](https://github.com/dfa1/vortex-java/commit/5df3d9a9)) +- Reader: `FrameOfReferenceEncodingDecoder` now takes the arena variant of `ArraySegments.of`, so lazy children (e.g. `LazyRunEndLongArray`) materialize instead of throwing "no primary segment". ([5df3d9a9](https://github.com/dfa1/vortex-java/commit/5df3d9a9)) ### Docs @@ -282,7 +282,7 @@ Cleanup release on top of 0.7.0 — one more lazy encoding, a Windows TUI usabil **pco encoder** (Classic + Consecutive delta + IntMult mode, 4-way tANS, multi-chunk, all 8 ptypes), **writer compression** (~93% Rust JNI parity on NYC Yellow Taxi: 47.0 MB → 43.4 MB; stratified sampling, stats-driven cascade, sparse-cascade idx/val children, patched bitpacking), -**lazy / zero-copy decode** (ADR 0010 + ADR 0012: ALP / FoR / ZigZag / Chunked / Dict / RunEnd / RLE / Sparse / ALP-RD / VarBinView / DateTimeParts / DecimalByteParts now defer transform / materialisation until access), +**lazy / zero-copy decode** (ADR 0010 + ADR 0012: ALP / FoR / ZigZag / Chunked / Dict / RunEnd / RLE / Sparse / ALP-RD / VarBinView / DateTimeParts / DecimalByteParts now defer transform / materialization until access), **write API ergonomics** (`DType` static factories, `structBuilder`, typed `writeChunk(Consumer)` — ADR 0009), **Sonar pass** (Codecov → SonarCloud, Javadoc HTML → Markdown, full `S6218 / S7474 / S2184 / S3776` sweep). @@ -293,7 +293,7 @@ Cleanup release on top of 0.7.0 — one more lazy encoding, a Windows TUI usabil - ADR 0009 — write API ergonomics: `DType` static factories + `asNullable()` ([0e9d6703](https://github.com/dfa1/vortex-java/commit/0e9d6703)), `DType.structBuilder()` ([63d66eef](https://github.com/dfa1/vortex-java/commit/63d66eef)), typed `writeChunk(Consumer)` builder ([ddb3e21a](https://github.com/dfa1/vortex-java/commit/ddb3e21a)); design doc ([d9c4b99](https://github.com/dfa1/vortex-java/commit/d9c4b99), [a57ea70](https://github.com/dfa1/vortex-java/commit/a57ea70)); `MemorySegment` zero-copy overload split to ADR 0011 ([6367eb37](https://github.com/dfa1/vortex-java/commit/6367eb37)) - ADR 0010 — lazy decode for 1:1 transform encodings: `LazyAlpFloatArray`, lazy `FoR` / `ZigZag` arrays defer the transform until first element access ([cff3acb5](https://github.com/dfa1/vortex-java/commit/cff3acb5), [c47c055c](https://github.com/dfa1/vortex-java/commit/c47c055c), [c3ca6951](https://github.com/dfa1/vortex-java/commit/c3ca6951), [68186f8f](https://github.com/dfa1/vortex-java/commit/68186f8f)) - ADR 0012 — zero-copy decode for compound encodings. `ChunkedXxxArray` wraps instead of concatenating ([dfe7aa34](https://github.com/dfa1/vortex-java/commit/dfe7aa34), [c557b8fb](https://github.com/dfa1/vortex-java/commit/c557b8fb), [e2db153d](https://github.com/dfa1/vortex-java/commit/e2db153d)); `DictXxxArray` lazy reads ([9b97a1a5](https://github.com/dfa1/vortex-java/commit/9b97a1a5)); lazy `RunEnd` ([210449b5](https://github.com/dfa1/vortex-java/commit/210449b5)), `RLE` ([f35f9a96](https://github.com/dfa1/vortex-java/commit/f35f9a96)), `Sparse` ([b604f21c](https://github.com/dfa1/vortex-java/commit/b604f21c)), `ALP-RD` ([937ade36](https://github.com/dfa1/vortex-java/commit/937ade36)); `VarBinArray.ChunkedMode` ([b3696f5a](https://github.com/dfa1/vortex-java/commit/b3696f5a)) + `ViewMode` for VarBinView ([0eea0405](https://github.com/dfa1/vortex-java/commit/0eea0405)); `LazyDateTimePartsLongArray` ([8ab9ec70](https://github.com/dfa1/vortex-java/commit/8ab9ec70)); `LazyDecimalBytePartsArray` ([22887cb2](https://github.com/dfa1/vortex-java/commit/22887cb2)); design doc ([f6a19c47](https://github.com/dfa1/vortex-java/commit/f6a19c47), [2578f892](https://github.com/dfa1/vortex-java/commit/2578f892), [1c7f5950](https://github.com/dfa1/vortex-java/commit/1c7f5950)) -- ADR 0013 — compute primitives (masks, kernels, no-materialise) design doc ([400e5b03](https://github.com/dfa1/vortex-java/commit/400e5b03)) +- ADR 0013 — compute primitives (masks, kernels, no-materialize) design doc ([400e5b03](https://github.com/dfa1/vortex-java/commit/400e5b03)) - `forEach*` / `fold` default methods on Short / Byte / Bool array interfaces; chunked overrides iterate children directly ([7dc6567e](https://github.com/dfa1/vortex-java/commit/7dc6567e), [f500afe3](https://github.com/dfa1/vortex-java/commit/f500afe3)) - `truncateArray` preserves zero-copy on `ChunkedXxxArray` ([6f4eaa96](https://github.com/dfa1/vortex-java/commit/6f4eaa96)) - ALP size-based exponent search ported from Rust, two-step decode ([f9bb7373](https://github.com/dfa1/vortex-java/commit/f9bb7373)) @@ -338,7 +338,7 @@ Cleanup release on top of 0.7.0 — one more lazy encoding, a Windows TUI usabil ### Notes - Pco encode FloatMult / FloatQuant modes deferred — marginal gain over existing Classic+ALP cascade. -- Remaining 0.6 MB (1.4%) writer gap vs Rust JNI on the taxi benchmark is structural — concentrated in `trip_distance` (+540 KB, per-chunk ALP encoding) and `PULocationID` (+250 KB, dict-codes layout shape). Closing it needs `vortex.stats` outer-layer support or dtype-specialised dict schemes. +- Remaining 0.6 MB (1.4%) writer gap vs Rust JNI on the taxi benchmark is structural — concentrated in `trip_distance` (+540 KB, per-chunk ALP encoding) and `PULocationID` (+250 KB, dict-codes layout shape). Closing it needs `vortex.stats` outer-layer support or dtype-specialized dict schemes. [0.7.0]: https://github.com/dfa1/vortex-java/compare/v0.6.0...v0.7.0 @@ -476,7 +476,7 @@ replaces the silent `hasNext()` arena-closing footgun with closeable `Chunk` obj `EncodingRegistry.builder().registerServiceLoaded().register(myEncoding).build()`. ([64ffbaa](https://github.com/dfa1/vortex-java/commit/64ffbaa)) - **Breaking — `inspect` split into `inspect` (text) + `tui` (interactive).** - Previous `inspect ` behaviour stays on `inspect`; interactive use is now + Previous `inspect ` behavior stays on `inspect`; interactive use is now on the dedicated `tui` subcommand. ([e8db30a](https://github.com/dfa1/vortex-java/commit/e8db30a)) - **`Extension` sealed hierarchy** replaces the prior `Extensions` utility class. @@ -585,7 +585,7 @@ FlatBuffer/Protobuf runtime exceptions). Regression suite lives under blob offsets and `Layout.encoding` index are bounds-checked at parse time. ([f8f89fe](https://github.com/dfa1/vortex-java/commit/f8f89fe)) - **Footer `segmentSpecs` bounds** — every spec is validated against `fileSize` the moment - the footer is materialised, eliminating later `IndexOutOfBoundsException` on + the footer is materialized, eliminating later `IndexOutOfBoundsException` on `MemorySegment.asSlice`. ([03845ac](https://github.com/dfa1/vortex-java/commit/03845ac)) - **PType ordinal bounds-check** — `PType.fromOrdinal(int)` replaces all 22 `PType.values()[idx]` diff --git a/CLAUDE.md b/CLAUDE.md index ebeab79d7..21ed1d34b 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -204,6 +204,9 @@ in the Rust source for the exact schema, then implement from spec. ## Code style - 4-space indent, **zero SonarQube bugs/smells**, no `sun.misc.Unsafe` or internal JDK APIs. +- **American English everywhere** (javadoc, comments, identifiers): + `recognize`/`optimize`/`finalize`/`serialize`/`normalize`/`behavior`/`color` — never + `-ise`/`-isation`/`-our`. Matches the JDK (`Object.finalize`, `Serializable`). - Prefer explicit over clever; fail fast on unhandled cases. - Idiomatic modern Java: reuse the JDK (override `Iterator.forEachRemaining`, don't invent `forEachChunk`; use `Optional`, records, sealed types, pattern switches, virtual threads, FFM). diff --git a/SECURITY.md b/SECURITY.md index 597e804ab..e4fb8637a 100644 --- a/SECURITY.md +++ b/SECURITY.md @@ -15,7 +15,7 @@ Use GitHub's private vulnerability reporting: 1. Open . 2. Fill in the form. Include a minimal reproduction (a `.vortex` file or the bytes that trigger the issue) where possible. -3. You'll receive an acknowledgement within **3 business days**. +3. You'll receive an acknowledgment within **3 business days**. If GitHub's reporting flow is unavailable, email the maintainer at the address on the project's Maven Central metadata. @@ -42,7 +42,7 @@ In scope: - Any malformed `.vortex` input that causes silent data corruption — wrong row count, wrong values, or a misaligned column with a successful return. - Any vulnerability in `VortexWriter` that produces files which would later trigger the - above behaviours when read. + above behaviors when read. Out of scope: diff --git a/TODO.md b/TODO.md index ea4baf650..81c28e0cc 100644 --- a/TODO.md +++ b/TODO.md @@ -62,7 +62,7 @@ Per-encoding gotchas: and `vortex-jni`; assert both throw or both return identical row count + values. Reuse `RustWritesJavaReadsIntegrationTest` harness. - [ ] **OSS-Fuzz submission** — Jazzer is a first-class OSS-Fuzz engine; submit the project - once the corpus + targets stabilise. Free continuous fuzzing. + once the corpus + targets stabilize. Free continuous fuzzing. ## Build @@ -85,8 +85,8 @@ Per-encoding gotchas: ## Compute -- [ ] **Compute primitives — masks, kernels, no-materialise** — pushdown filter/compare/aggregate - kernels operating on Lazy arrays without materialising. See [ADR-0013](docs/adr/0013-compute-primitives.md) +- [ ] **Compute primitives — masks, kernels, no-materialize** — pushdown filter/compare/aggregate + kernels operating on Lazy arrays without materializing. See [ADR-0013](docs/adr/0013-compute-primitives.md) (Proposed). Gate: a concrete downstream consumer (e.g. the vortex-arrow bridge or filter pushdown). Done: §6 read-side surface — `ScanIterator.columnZoneStats(col)` exposes per-zone min/max/sum/null count, decoding sum from the `vortex.stats` zone-map table (matches files from diff --git a/calcite/src/main/java/io/github/dfa1/vortex/calcite/VortexTable.java b/calcite/src/main/java/io/github/dfa1/vortex/calcite/VortexTable.java index 7e7f1b777..d2c72c393 100644 --- a/calcite/src/main/java/io/github/dfa1/vortex/calcite/VortexTable.java +++ b/calcite/src/main/java/io/github/dfa1/vortex/calcite/VortexTable.java @@ -57,7 +57,7 @@ /// A single Vortex file exposed to Calcite as a flat SQL table with column projection and /// zone-map filter push-down. /// -/// Projection (`projects`) is honoured exactly — only the requested columns are decoded and +/// Projection (`projects`) is honored exactly — only the requested columns are decoded and /// returned. Filters (`filters`) that translate to a [RowFilter] are pushed into the scan for /// *chunk skipping* via zone-map statistics, but are **left in Calcite's list** rather than /// consumed: zone-map pruning is approximate (it drops whole chunks that cannot match, not @@ -521,7 +521,7 @@ private static Match classify(RowFilter filter, int zone, /// Classifies one zone against a column-bound [Predicate] from the zone's statistics `s`. The /// comparison ops carry the same three-valued-logic semantics as before the [RowFilter] / - /// [Predicate] unification: an unrecognised stat shape or a partially-overlapping zone is + /// [Predicate] unification: an unrecognized stat shape or a partially-overlapping zone is /// [Match#BOUNDARY], a zone provably outside the predicate is [Match#OUT], and a zone every row /// of which matches (which, for a value comparison, also requires the zone to carry no nulls) is /// [Match#IN]. The composite and range predicates ([Predicate.Between] / [Predicate.And] / @@ -736,7 +736,7 @@ public Enumerator enumerator() { } /// Streaming [Enumerator] over a Vortex scan: advances chunk by chunk, decoding each requested - /// column once per chunk and materialising one `Object[]` row per [#moveNext()]. Rows are not + /// column once per chunk and materializing one `Object[]` row per [#moveNext()]. Rows are not /// retained, so the working set stays at one chunk rather than the whole result. private final class VortexEnumerator implements Enumerator { @@ -899,7 +899,7 @@ private static void collectColumns(RowFilter filter, java.util.Set out) } } - /// Lenient translation for the scan path ([#toRowFilter]): an unrecognised node (or `AND` + /// Lenient translation for the scan path ([#toRowFilter]): an unrecognized node (or `AND` /// conjunct) is simply dropped, since the scan re-checks every row so a partially captured filter /// is still correct, just less selective for zone-map pruning. Delegates to the shared /// [#comparison] dispatch with `strict = false`. @@ -908,7 +908,7 @@ private static Optional toComparison(RexNode node, List names } /// Strict counterpart of [#toComparison]: the same column-vs-literal / `AND` grammar, but a - /// single unrecognised node (or one `AND` conjunct) collapses the whole result to empty rather + /// single unrecognized node (or one `AND` conjunct) collapses the whole result to empty rather /// than being dropped, and bare `IS NULL` / `IS NOT NULL` are also translatable. Used by /// [#translatePushedFilters] so aggregate push-down answers from stats only when the [RowFilter] /// captures the predicate in full. Delegates to the shared [#comparison] dispatch with diff --git a/calcite/src/test/java/io/github/dfa1/vortex/calcite/OhlcSqlDemoTest.java b/calcite/src/test/java/io/github/dfa1/vortex/calcite/OhlcSqlDemoTest.java index cf02852e4..507fd504e 100644 --- a/calcite/src/test/java/io/github/dfa1/vortex/calcite/OhlcSqlDemoTest.java +++ b/calcite/src/test/java/io/github/dfa1/vortex/calcite/OhlcSqlDemoTest.java @@ -228,7 +228,7 @@ private static Pushdown runPushdown(Path file) throws Exception { } } - /// Runs a query, prints every row as a labelled table, and returns the row count. + /// Runs a query, prints every row as a labeled table, and returns the row count. private static long printAndCount(Connection conn, String title, String sql) throws Exception { System.out.printf("%n[%s]%n", title); long rows = 0; diff --git a/calcite/src/test/java/io/github/dfa1/vortex/calcite/VortexAdapterCoverageTest.java b/calcite/src/test/java/io/github/dfa1/vortex/calcite/VortexAdapterCoverageTest.java index 51d090af1..29caba89a 100644 --- a/calcite/src/test/java/io/github/dfa1/vortex/calcite/VortexAdapterCoverageTest.java +++ b/calcite/src/test/java/io/github/dfa1/vortex/calcite/VortexAdapterCoverageTest.java @@ -26,7 +26,7 @@ import static org.assertj.core.api.Assertions.assertThatThrownBy; /// Coverage for the adapter surface across every column type: SQL type mapping -/// ([VortexTable#getRowType]), row materialisation ([VortexTable] scan + enumerator), +/// ([VortexTable#getRowType]), row materialization ([VortexTable] scan + enumerator), /// [VortexSchema] lookup, and [VortexAggregates]. class VortexAdapterCoverageTest { @@ -94,7 +94,7 @@ void getRowType_mapsEveryColumnToItsSqlType() { } @Test - void scan_materialisesEveryColumnToItsJavaType() { + void scan_materializesEveryColumnToItsJavaType() { // Given VortexTable table = new VortexTable(file); diff --git a/cli/src/main/java/io/github/dfa1/vortex/cli/tui/LazyGridSource.java b/cli/src/main/java/io/github/dfa1/vortex/cli/tui/LazyGridSource.java index e0584f519..b8c98f5cc 100644 --- a/cli/src/main/java/io/github/dfa1/vortex/cli/tui/LazyGridSource.java +++ b/cli/src/main/java/io/github/dfa1/vortex/cli/tui/LazyGridSource.java @@ -45,7 +45,7 @@ public final class LazyGridSource implements AutoCloseable { /// /// @param handle open Vortex file handle owned by `worker` /// @param worker I/O dispatcher for the handle's confined thread - /// @return initialised source + /// @return initialized source /// @throws InterruptedException if the calling thread is interrupted while /// waiting for the worker public static LazyGridSource open(VortexHandle handle, IoWorker worker) diff --git a/cli/src/main/java/io/github/dfa1/vortex/cli/tui/term/Ansi.java b/cli/src/main/java/io/github/dfa1/vortex/cli/tui/term/Ansi.java index e0b6763a3..12bbe4d53 100644 --- a/cli/src/main/java/io/github/dfa1/vortex/cli/tui/term/Ansi.java +++ b/cli/src/main/java/io/github/dfa1/vortex/cli/tui/term/Ansi.java @@ -48,17 +48,17 @@ public static String moveTo(int row, int col) { return CSI + row + ";" + col + "H"; } - /// Standard SGR foreground colour (codes 30-37 normal, 90-97 bright). + /// Standard SGR foreground color (codes 30-37 normal, 90-97 bright). /// - /// @param code SGR colour code + /// @param code SGR color code /// @return CSI sequence public static String fg(int code) { return CSI + code + "m"; } - /// Standard SGR background colour (codes 40-47 normal, 100-107 bright). + /// Standard SGR background color (codes 40-47 normal, 100-107 bright). /// - /// @param code SGR colour code + /// @param code SGR color code /// @return CSI sequence public static String bg(int code) { return CSI + code + "m"; diff --git a/cli/src/main/java/io/github/dfa1/vortex/cli/tui/term/KeyDecoder.java b/cli/src/main/java/io/github/dfa1/vortex/cli/tui/term/KeyDecoder.java index 1de4d8efe..876934578 100644 --- a/cli/src/main/java/io/github/dfa1/vortex/cli/tui/term/KeyDecoder.java +++ b/cli/src/main/java/io/github/dfa1/vortex/cli/tui/term/KeyDecoder.java @@ -7,9 +7,9 @@ /// Translates raw stdin bytes into [Key] events. /// -/// Recognises common CSI sequences emitted by xterm-compatible terminals: +/// Recognizes common CSI sequences emitted by xterm-compatible terminals: /// `ESC [ A/B/C/D` for arrows, `ESC [ 5~ / 6~` for PgUp/PgDn, -/// `ESC [ H / F` and `ESC [ 1~ / 4~` for Home/End. Any unrecognised +/// `ESC [ H / F` and `ESC [ 1~ / 4~` for Home/End. Any unrecognized /// escape sequence is dropped and decoding continues with the next byte. /// /// Stateless across reads - call [#next(InputStream)] for each event. diff --git a/cli/src/test/java/io/github/dfa1/vortex/cli/FilterCommandTest.java b/cli/src/test/java/io/github/dfa1/vortex/cli/FilterCommandTest.java index 011a85e84..4aa697b49 100644 --- a/cli/src/test/java/io/github/dfa1/vortex/cli/FilterCommandTest.java +++ b/cli/src/test/java/io/github/dfa1/vortex/cli/FilterCommandTest.java @@ -107,7 +107,7 @@ void doubleValueAgainstLongColumn_returnsOk() { @Test void unknownOperator_returnsUsageError() { - // Given — a lone '!' is a recognised operator char but not a valid operator + // Given — a lone '!' is a recognized operator char but not a valid operator // When CliTestSupport.Captured result = capture(() -> FilterCommand.run(new String[]{"filter", file.toString(), "id", "!", "1"})); diff --git a/cli/src/test/java/io/github/dfa1/vortex/cli/tui/term/KeyDecoderTest.java b/cli/src/test/java/io/github/dfa1/vortex/cli/tui/term/KeyDecoderTest.java index 764fa7096..3a1a25745 100644 --- a/cli/src/test/java/io/github/dfa1/vortex/cli/tui/term/KeyDecoderTest.java +++ b/cli/src/test/java/io/github/dfa1/vortex/cli/tui/term/KeyDecoderTest.java @@ -96,7 +96,7 @@ void next_eof_returnsEof() throws IOException { @Test void next_unknownCsiLetter_yieldsEscape() throws IOException { - // Given — ESC [ Z is xterm reverse-tab; we don't recognise it + // Given — ESC [ Z is xterm reverse-tab; we don't recognize it ByteArrayInputStream in = bytes(0x1B, '[', 'Z'); // When @@ -135,7 +135,7 @@ void next_ss3SequenceVariant_decodesArrows() throws IOException { @Test void next_unknownEscapePrefix_yieldsEscape() throws IOException { - // Given — `ESC X` (X is neither '[' nor 'O') is not a recognised + // Given — `ESC X` (X is neither '[' nor 'O') is not a recognized // CSI or SS3 sequence. Must return Escape rather than try to decode further. ByteArrayInputStream in = bytes(0x1B, 'X', 'A'); diff --git a/core/src/main/java/io/github/dfa1/vortex/core/model/EncodingId.java b/core/src/main/java/io/github/dfa1/vortex/core/model/EncodingId.java index 10271c5aa..d56c8f332 100644 --- a/core/src/main/java/io/github/dfa1/vortex/core/model/EncodingId.java +++ b/core/src/main/java/io/github/dfa1/vortex/core/model/EncodingId.java @@ -95,7 +95,7 @@ public enum EncodingId { /// callers that demand a known id chain `.orElseThrow(...)`. /// /// @param id raw encoding id string (e.g. `"vortex.primitive"`) - /// @return matching constant, or empty if not recognised + /// @return matching constant, or empty if not recognized public static Optional parse(String id) { return Optional.ofNullable(LOOKUP.get(id)); } diff --git a/core/src/main/java/io/github/dfa1/vortex/core/proto/ProtoWriter.java b/core/src/main/java/io/github/dfa1/vortex/core/proto/ProtoWriter.java index 43d8a04cc..5ad6d2d43 100644 --- a/core/src/main/java/io/github/dfa1/vortex/core/proto/ProtoWriter.java +++ b/core/src/main/java/io/github/dfa1/vortex/core/proto/ProtoWriter.java @@ -104,7 +104,7 @@ int beginLenDelim() { return mark; } - /// Finalises a length-delimited region opened by [#beginLenDelim()]. + /// Finalizes a length-delimited region opened by [#beginLenDelim()]. /// Writes the payload length as a varint at the reserved offset and shifts the payload /// left if the varint is shorter than 5 bytes. void endLenDelim(int mark) { diff --git a/core/src/test/java/io/github/dfa1/vortex/core/model/ExtensionIdTest.java b/core/src/test/java/io/github/dfa1/vortex/core/model/ExtensionIdTest.java index cabb214c4..390c58901 100644 --- a/core/src/test/java/io/github/dfa1/vortex/core/model/ExtensionIdTest.java +++ b/core/src/test/java/io/github/dfa1/vortex/core/model/ExtensionIdTest.java @@ -23,7 +23,7 @@ void parse_knownIds_returnEnumConstant(String wire, ExtensionId expected) { @Test void parse_unknownId_returnsEmpty() { - // Given — open-world extension id; library doesn't recognise it + // Given — open-world extension id; library doesn't recognize it // When / Then — non-throwing miss so the registry can route to passthrough assertThat(ExtensionId.parse("acme.geopoint")).isEmpty(); } diff --git a/docs/adr/0001-split-read-and-write-runtimes.md b/docs/adr/0001-split-read-and-write-runtimes.md index 41ab48398..ef1b915ad 100644 --- a/docs/adr/0001-split-read-and-write-runtimes.md +++ b/docs/adr/0001-split-read-and-write-runtimes.md @@ -302,7 +302,7 @@ of CI / integration-test fallout, plus reviewer time. Not a weekend. - **Side-by-side period drift.** Phases 1–3 leave both the old `Registry` and the new `ReadRegistry`/`WriteRegistry` registered for each encoding - during transition. Risk: divergent behaviour if a bug fix lands on one + during transition. Risk: divergent behavior if a bug fix lands on one side and not the other. Mitigation: integration tests run against both paths during the transition; the old `Registry` becomes a thin forwarder early in Phase 1. diff --git a/docs/adr/0002-pluggable-dtype-layout-compute.md b/docs/adr/0002-pluggable-dtype-layout-compute.md index f5030da8a..064f786a9 100644 --- a/docs/adr/0002-pluggable-dtype-layout-compute.md +++ b/docs/adr/0002-pluggable-dtype-layout-compute.md @@ -165,7 +165,7 @@ all of the following, in writing: 1. **A named downstream consumer.** Not a hypothetical "someone might want X." A concrete project / team with a name and a use case. -2. **A spec for the new variant.** Wire format, serialisation, +2. **A spec for the new variant.** Wire format, serialization, round-trip semantics. Not just "register a custom type" in the abstract. 3. **Confirmation the existing `Extension` mechanism does not fit.** diff --git a/docs/adr/0006-benchmark-publishing.md b/docs/adr/0006-benchmark-publishing.md index d12e658d9..80b773ab3 100644 --- a/docs/adr/0006-benchmark-publishing.md +++ b/docs/adr/0006-benchmark-publishing.md @@ -17,7 +17,7 @@ results to `gh-pages/dev/bench` via `benchmark-action/github-action-benchmark`. GitHub Actions shared runners share physical hosts with other tenants. JMH benchmarks are sensitive to CPU frequency scaling, SMT contention, and OS scheduler noise. Typical variance on shared runners is **20–40%** per run — -larger than the signal for a 10–15% decode optimisation. A number published +larger than the signal for a 10–15% decode optimization. A number published from a shared runner cannot be cited, compared across commits, or used to claim a performance target is met. @@ -25,7 +25,7 @@ The current workflow does carry a regression threshold (`alert-threshold: 130%`) with `comment-on-alert: true`. That is a **2.3 σ** guard relative to the noise floor — it catches catastrophic regressions (5–10×) but misses 10–30% regressions, which are the ones that actually matter during encoder -optimisation work. +optimization work. ### The alternative: local-run publish @@ -95,7 +95,7 @@ cd - Without the CI workflow, regressions are caught by: -1. **Running `bench-publish` before and after an optimisation PR.** The +1. **Running `bench-publish` before and after an optimization PR.** The commit SHAs in the filenames make A/B comparison mechanical. 2. **Adding a JMH regression test** (`@BenchmarkMode(Throughput)` with an `assert` or a baseline comparison in the performance module) — not @@ -141,13 +141,13 @@ longer updated. - `bench-publish` requires local Java + Maven build; not runnable from a mobile device / tablet. - Numbers accumulate only when the developer actively publishes. Long - gaps between optimisation cycles leave stale README tables. + gaps between optimization cycles leave stale README tables. ### Risk - If a second contributor joins and cannot reproduce numbers on different hardware, the single-machine baseline becomes a coordination problem. Mitigation: `benchmark-meta.json` documents the reference hardware; - normalise by throughput ratio (new hardware / reference) rather than + normalize by throughput ratio (new hardware / reference) rather than absolute scores. ## References diff --git a/docs/adr/0010-lazy-decode.md b/docs/adr/0010-lazy-decode.md index f2247c392..063980af3 100644 --- a/docs/adr/0010-lazy-decode.md +++ b/docs/adr/0010-lazy-decode.md @@ -144,7 +144,7 @@ going through an interface to reach the lazy variant. **PoC measurement rejected this gate.** Lazy decode is *strictly faster* than eager on full fold (+9.5% on `javaReadClose`) because -the materialisation write/read intermediate buffer disappears — the +the materialization write/read intermediate buffer disappears — the lazy variant returns the encoded segment directly and applies the transform on access; the fused variant unpacks bitpacked → double in one pass. Net halving of memory traffic on the OHLC chain. The gate @@ -155,7 +155,7 @@ is dropped; lazy is the default whenever the chain pattern matches. Today each primitive Array is a `public final class` with a `MemorySegment` buffer field. Lazy variants cannot extend it. Convert every numeric and bool Array to a **non-sealed interface** and move -the current behaviour into a **public** `MaterializedXxxArray` record. +the current behavior into a **public** `MaterializedXxxArray` record. No static factory on the interface — encoders construct `new MaterializedXxxArray(...)` directly, keeping the interface a pure contract: @@ -283,14 +283,14 @@ back to `MaterializedDoubleArray` (see the fused-chain subsection below). Checking a patch bitmap or a sorted patch-index on every access would add a per-row branch inside `getDouble`, which the codebase's hot-loop rule (see CLAUDE.md) bans because it kills -auto-vectorisation in the million-row inner loops. Patched chunks -are the majority of the OHLC dataset today, so eager materialisation +auto-vectorization in the million-row inner loops. Patched chunks +are the majority of the OHLC dataset today, so eager materialization for them stays the dominant cost; "Extended fusion" in the Future section is the path to recover it without paying the per-row branch. **No filter gate.** The PoC measurement showed lazy is *strictly faster* than the eager path even on full-fold workloads (+9.5% on -`javaReadClose`, OHLC chain), because the materialisation write/read +`javaReadClose`, OHLC chain), because the materialization write/read intermediate buffer is skipped entirely. The earlier "gate lazy behind `hasFilter()`" idea is dropped in the final design — lazy is the default whenever the chain pattern matches: @@ -312,10 +312,10 @@ intermediate FoR/ALP buffers entirely and returns a `(bitWidth, offset, ref, scale)`. Public surface is `compareGt`: unpack each row, apply `+ref`, compare against the threshold in the encoded int domain, emit a bit into the result `BoolArray`. The full -double materialisation is deferred to `sumMasked` (or any other +double materialization is deferred to `sumMasked` (or any other generic reduction) which only touches the matching rows. The full-fold path (`getDouble`/`fold`/`forEachDouble`) lazily -materialises through one pass that writes doubles directly from the +materializes through one pass that writes doubles directly from the bitpacked unpack — halving the memory traffic vs the old eager chain (one decode pass instead of bitpacked→FoR→ALP). The class keeps a package-private fused `sumWhereGt` for the hot @@ -397,7 +397,7 @@ decoded just enough to test and are not delivered to the consumer. where possible. This is the half of compute pushdown that Rust calls the "filter kernel" (`vortex-array/src/arrays/filter/kernel.rs`); keeping it separate from `compareXxx` matches the Rust shape and -lets the framework decide how to materialise. +lets the framework decide how to materialize. The Rust experience to reuse: most encodings can answer **without reading buffers** (metadata-only). Frame the operator so they @@ -467,8 +467,8 @@ encoding. No interface change — each new variant is just another - `ForLongArray`, `ForIntArray` — Frame-of-Reference, lazy - `ZigZagLongArray`, `ZigZagIntArray` — XOR/shift on access (order not preserved, so no pushdown — but lazy still skips the - materialisation pass) -- Additional fused classes for other common chains, modelled on + materialization pass) +- Additional fused classes for other common chains, modeled on `FusedAlpForBitpackedDoubleArray` from the Phase 2 PoC - Extended fusion: handle bitpacked patches inside the fused kernel, closing the patched-chunk path that today falls back to @@ -505,7 +505,7 @@ encoding. No interface change — each new variant is just another - **Patched chunks lose the lazy win.** Lazy `AlpDoubleArray` does not carry a patch index — adding one would force a per-row branch inside `getDouble`, which the codebase's hot-loop rule bans for - vectorisation reasons. So patched chunks fall back to + vectorization reasons. So patched chunks fall back to `MaterializedDoubleArray` and pay the full eager decode. Patched chunks dominate the OHLC dataset today; closing this gap needs the "Extended fusion" follow-up that handles patches inside the @@ -617,7 +617,7 @@ Phases 0/1/2 (lazy decode for the 1:1 transforms) shipped over 2026-06-14 — 20 - **Phase 2 — Lazy ALP / FoR / ZigZag.** `LazyAlpDoubleArray`, `LazyAlpFloatArray`, `LazyForLongArray`, `LazyForIntArray`, `LazyZigZagLongArray`, `LazyZigZagIntArray` shipped. Each holds the encoded child + transform parameters; per-row dispatch applies - the transform on demand. `ArraySegments.of(arr, arena)` materialises into a fresh segment + the transform on demand. `ArraySegments.of(arr, arena)` materializes into a fresh segment only when a downstream caller demands a contiguous buffer. The lazy-storage pattern generalised beyond the 1:1 transform scope of this ADR. Adjacent @@ -625,7 +625,7 @@ encodings adopted the same top-level record shape: - [ADR 0012 — Lazy Chunked / Dict / VarBin layouts](0012-zero-copy-layout-decoding.md). - `vortex.runend`, `vortex.sparse`, `fastlanes.rle` (lazy lookup tables; not 1:1 transforms - but the same lazy-record + `ArraySegments` materialise pattern; see + but the same lazy-record + `ArraySegments` materialize pattern; see `LazyRunEndXxxArray`, `LazySparseXxxArray`, `LazyRleXxxArray` in `reader.array`). `docs/compatibility.md` Decode shape table tracks per-encoding status. @@ -634,7 +634,7 @@ encodings adopted the same top-level record shape: The compute-pushdown phase (per-encoding `compareXxx` / `take` / `filter` operating on the encoded form) is **superseded by [ADR 0013 — Compute primitives: masks, kernels, -no-materialise](0013-compute-primitives.md)**. ADR 0013 lands the masks-and-kernels +no-materialize](0013-compute-primitives.md)**. ADR 0013 lands the masks-and-kernels framework that Phase 3 sketched, plus the wider design needed to make pushdown compose across the lazy storage types from ADRs 0010 and 0012. diff --git a/docs/adr/0012-zero-copy-layout-decoding.md b/docs/adr/0012-zero-copy-layout-decoding.md index 6ecc91946..a1e27efd2 100644 --- a/docs/adr/0012-zero-copy-layout-decoding.md +++ b/docs/adr/0012-zero-copy-layout-decoding.md @@ -67,8 +67,8 @@ both Chunked and Dict as first-class lazy storage types: `DictSlots { codes: ArrayRef, values: ArrayRef }`. Never expands. - **Compute kernels per encoding** — `vortex-array/src/arrays/chunked/compute/take.rs` sorts indices by chunk, - takes from each child separately, assembles. No canonical materialisation. -- **Materialisation is opt-in** — `vortex-array/src/canonical.rs` provides + takes from each child separately, assembles. No canonical materialization. +- **Materialization is opt-in** — `vortex-array/src/canonical.rs` provides `to_canonical()` for Arrow handoff. Default reads stay in encoded form. Java today inverts this: lazy is the exception (just the six transform @@ -93,7 +93,7 @@ public record ChunkedDoubleArray( } // fold / forEachDouble: default-method inherited from interface, // but override here to iterate children sequentially — each child - // loop stays tight and the JIT vectorises per child. + // loop stays tight and the JIT vectorizes per child. @Override public double fold(double identity, DoubleBinaryOperator op) { double result = identity; for (DoubleArray c : children) { @@ -118,23 +118,23 @@ Scope: real workload demands.) - `DictDoubleArray`, `DictLongArray`, `DictIntArray`, `DictVarBinArray`. -### Materialisation fallback +### Materialization fallback `ArraySegments.of(arr, arena)` already handles lazy variants. Add cases: ```java -case ChunkedDoubleArray a -> materialise(a, arena); -case DictDoubleArray a -> materialise(a, arena); +case ChunkedDoubleArray a -> materialize(a, arena); +case DictDoubleArray a -> materialize(a, arena); // … etc. ``` -Each materialiser allocates `length * elemBytes` and walks +Each materializer allocates `length * elemBytes` and walks children/codes — **same cost as today's eager path.** This fires only when a parent decoder demands a flat segment via `decodeChildSegment` (rare for Chunked at the outer layer; common for Dict codes flowing into a Bitpacked sibling). -When nothing forces materialisation — projection-only reads, fold/forEach +When nothing forces materialization — projection-only reads, fold/forEach on the user-facing array — no allocation happens. ### Decoder wiring @@ -161,11 +161,11 @@ regression by a comfortable margin. ### Positive -- **Zero-copy honoured on multi-chunk files.** A scan over an 8-chunk +- **Zero-copy honored on multi-chunk files.** A scan over an 8-chunk 10M-row F64 column avoids 80 MB of arena alloc + 80 MB of memcpy per scan. Projection-only scans pay zero copy cost for skipped columns. - **Zero-copy on dict-encoded columns.** Dictionary columns (common for - low-cardinality categorical data) stop materialising n elements every + low-cardinality categorical data) stop materializing n elements every scan. - **Java aligns with Rust.** Every encoding (per ADR 0010) and every layout (this ADR) is permanent storage. The mental model becomes @@ -193,7 +193,7 @@ regression by a comfortable margin. - **`ArraySegments.of(arr, arena)` fallback has the same memcpy cost as today's eager path.** Net win only when nobody asks for a flat segment. Decoders that route through `decodeChildSegment` still pay the - materialisation cost — they just pay it through a different code path. + materialization cost — they just pay it through a different code path. ### Risks to manage @@ -226,10 +226,10 @@ regression by a comfortable margin. - **`find_chunk_idx` is a per-row hot-loop branch.** Binary search over `offsets` runs on every `getDouble(i)` call. The CLAUDE.md hot-loop rule bans per-element modulo/division because it kills C2 superword - vectorisation. Binary search is conditional control flow with a - variable-target branch — also bad for vectorisation. Mitigations: + vectorization. Binary search is conditional control flow with a + variable-target branch — also bad for vectorization. Mitigations: use the per-child `fold` path (no `find_chunk_idx` calls); for random - access workloads, accept the cost — random access is non-vectorisable + access workloads, accept the cost — random access is non-vectorizable by nature. ## Alternatives considered @@ -252,13 +252,13 @@ decision (lazy layouts vs lazy transforms), different risk profile. Mixing them muddles the audit trail — future readers chasing "why is Chunked lazy" land in a doc about ALP. Rejected; cross-link instead. -### C — Strict zero-copy contract (refuse `ArraySegments.of` materialisation) +### C — Strict zero-copy contract (refuse `ArraySegments.of` materialization) -`ArraySegments.of(ChunkedXxxArray, arena)` throws instead of materialising. +`ArraySegments.of(ChunkedXxxArray, arena)` throws instead of materializing. Force all decoders to use typed access (`getXxx(i)` / `fold`). Pros: strictest possible contract; impossible to accidentally pay for -materialisation. Cons: breaks chains where a parent decoder genuinely +materialization. Cons: breaks chains where a parent decoder genuinely needs a flat segment — most importantly, the dict-codes-into-bitpacked path where `codesSeg` flows into a tight bit-unpack loop. Forcing those sites to use `getInt(i)` per code would megamorphic-dispatch on every @@ -270,7 +270,7 @@ Land Chunked first; Dict comes later. Pros: smaller blast radius (4 impls per interface instead of 5 for Long/Int arrays). Lower risk of inlining regression. Faster ship. Cons: defers -half the zero-copy win. Dict workloads continue to pay full materialisation +half the zero-copy win. Dict workloads continue to pay full materialization cost. This is a viable shipping order, not a rejection of Dict — recorded as an @@ -285,7 +285,7 @@ Int. Pros: drops LongArray and IntArray back to 4 impls each (within reach of the cap with one more mitigation). Cons: adds a per-row enum branch in the inner loop. The earlier ADR 0010 work explicitly avoided this in -favour of distinct types for Phase 3 compute pushdown. +favor of distinct types for Phase 3 compute pushdown. Recorded as a mitigation, not a rejection — implementation PR decides. @@ -298,10 +298,10 @@ The implementation PR resolves these; recorded here so the trail is clear: refactor `LazyFor`+`LazyZigZag` into a generic transform class, or restrict the new types to pattern-match dispatch (no interface exposure)? -3. **Fallback policy:** relaxed (`ArraySegments.of(_, arena)` materialises) +3. **Fallback policy:** relaxed (`ArraySegments.of(_, arena)` materializes) or strict (throws)? 4. **Sequencing vs ADR 0010 §Phase 2 fused chain:** which lands first? - Both target OHLC; fused chain fixes ALP(FoR(Bitpacked)) materialisation + Both target OHLC; fused chain fixes ALP(FoR(Bitpacked)) materialization inside one chunk, Chunked fixes the cross-chunk concat. They compose but order matters for measurement. @@ -314,17 +314,17 @@ Shipped across three PRs against `main`: pushed those interfaces from 1 to 2 impls, well under the JIT inline budget). `ScanIterator.decodeConcatPrimitive` renamed to `decodeChunkedLayout` and rewritten to construct `ChunkedXxxArray` directly; the alloc + memcpy loop deleted. `ChunkedEncodingDecoder.decode` rewritten the same way (`wrap`/`wrapPrimitive`/`wrapStruct` replaces - `concat`/`concatPrimitive`/`concatStruct`). `ArraySegments.of(arr, arena)` gained the chunked materialise cases - per §"Materialisation fallback". Bench gate passed: `JavaVsJniReadBenchmark` showed no statistically significant + `concat`/`concatPrimitive`/`concatStruct`). `ArraySegments.of(arr, arena)` gained the chunked materialize cases + per §"Materialization fallback". Bench gate passed: `JavaVsJniReadBenchmark` showed no statistically significant delta vs the previously-considered sticky-cache class shape; record shape chosen on architecture grounds (immutable, thread-safe, idiomatic Java). `forEach*` overrides iterate children directly so sequential scans bypass the per-row binary search. - **PR #39 — Dict half.** `DictLongArray`/`IntArray`/`DoubleArray`/`FloatArray` records as proposed. - Codes ptype variance (U8/U16/U32/U64 = Byte/Short/Int/Long Array) handled via centralised + Codes ptype variance (U8/U16/U32/U64 = Byte/Short/Int/Long Array) handled via centralized `DictArrays.readCode(codes, i)` plus per-method codes-type switches hoisted outside the inner loops per the CLAUDE.md hot-loop rule. `ScanIterator.expandDictPrimitive` deleted; the primitive dict-layout branch - returns the matching `DictXxxArray`. `ArraySegments.of(arr, arena)` gained the four dict materialise cases. + returns the matching `DictXxxArray`. `ArraySegments.of(arr, arena)` gained the four dict materialize cases. `truncateArray` got Dict cases before the per-interface catch-all so LIMIT keeps the dictionary and just slices codes. @@ -343,7 +343,7 @@ Resolutions to the open questions: LazyZigZag + Chunked + Dict). `JavaVsJniReadBenchmark` between PRs showed no measurable regression because sequential reads use `forEach*` (single impl per call site) and the polymorphic `getXxx(i)` site isn't the benchmark's hot path. Re-evaluate if a real workload surfaces the cost. -3. **Fallback policy:** relaxed — `ArraySegments.of(arr, arena)` materialises Chunked and Dict variants on +3. **Fallback policy:** relaxed — `ArraySegments.of(arr, arena)` materializes Chunked and Dict variants on demand. Used internally by `decodeDictLayout` for string dict expansion (the codes side) and reserved for future decoders that genuinely need a contiguous segment. 4. **Sequencing vs ADR 0010 §Phase 2:** Chunked + Dict lazy decoding landed first. Phase 2 (fused chain) is @@ -374,9 +374,9 @@ What was **not** shipped (intentional): - `vortex-array/src/arrays/chunked/array.rs` — ChunkedArray storage and `find_chunk_idx` - `vortex-array/src/arrays/dict/array.rs` — DictArray storage - `vortex-array/src/arrays/chunked/compute/take.rs` — per-chunk compute kernel - - `vortex-array/src/canonical.rs` — `to_canonical()` opt-in materialisation + - `vortex-array/src/canonical.rs` — `to_canonical()` opt-in materialization - Local code: - `ScanIterator.java:447-474` — `decodeConcatPrimitive` - `ScanIterator.java:156-218` — `expandDictPrimitive`, `expandDictStrings` - `ChunkedEncodingDecoder.java:90-94` — duplicate concat path - - `ArraySegments.java` — two-arg `of(arr, arena)` overload (the materialisation hook) + - `ArraySegments.java` — two-arg `of(arr, arena)` overload (the materialization hook) diff --git a/docs/adr/0013-compute-primitives.md b/docs/adr/0013-compute-primitives.md index 1f51dfc90..0a81d1881 100644 --- a/docs/adr/0013-compute-primitives.md +++ b/docs/adr/0013-compute-primitives.md @@ -1,12 +1,12 @@ -# ADR 0013: Compute primitives — masks, kernels, no-materialise contract +# ADR 0013: Compute primitives — masks, kernels, no-materialize contract - **Status:** Accepted — §1 (Mask), §4 (Predicate), §2/§3 (filter/reduce kernels, a generic - streaming baseline plus a type-specialised boxing-free fast lane, behind a minimal `Compute` + streaming baseline plus a type-specialized boxing-free fast lane, behind a minimal `Compute` entry point), §5 (`RowFilter` unified over `Predicate` — a `RowFilter.Column` binds a column to a shared public `Predicate`; the same predicate is compiled against zone-map stats for pruning and against the decoded array for the boundary fold), and §6 (zone-map aggregate push-down, both the whole-zone and boundary-zone tiers) are implemented in `reader.compute`. Deferred: encoded-domain - kernel specialisation (perf escalation — pushing the predicate into the ALP/FoR/Dict integer + kernel specialization (perf escalation — pushing the predicate into the ALP/FoR/Dict integer domain without decoding), and the ergonomic façade (a columnar transducer — its own ADR). `MapKernel` is unbuilt (no consumer yet). - **Date:** 2026-06-15 @@ -30,11 +30,11 @@ The lazy infrastructure stops at the `Array` boundary. Once a caller pulls a chunk out of the scan, the natural next step is filter / project / reduce. Today the only way to do that is element-by-element via the per-type accessor (`DoubleArray.getDouble`, `LongArray.getLong`, …) or by forcing -materialisation through `ArraySegments.of(Array)`. Neither composes: +materialization through `ArraySegments.of(Array)`. Neither composes: - Per-element accessors fight loop fusion — JIT cannot see across user code boundaries, so `filter` then `sum` decode twice. -- Forced materialisation discards the lazy gain. A filter that retains 1% +- Forced materialization discards the lazy gain. A filter that retains 1% of rows still pays the full decode cost the moment a downstream stage asks for a buffer. @@ -77,9 +77,9 @@ public sealed interface Mask permits AllTrue, AllFalse, RangeMask, BitmapMask { `compare` / `between` / `is_null` kernels. A `Chunk` returned by the scan carries an optional `Mask`. Successive -filter kernels intersect masks in place; downstream kernels honour the +filter kernels intersect masks in place; downstream kernels honor the mask (skip excluded positions during reduce, emit a smaller result for -`take`). Nothing materialises until a sink demands it. +`take`). Nothing materializes until a sink demands it. ### 2. Kernel signatures @@ -101,12 +101,12 @@ public interface ReduceKernel { selection mask, and operator-specific parameters. - Implementations dispatch on the concrete `Array` subtype via pattern switch (e.g. `LazyAlpDoubleArray` → encoded-domain compare, fallback - arrays → materialised path). + arrays → materialized path). - Output is an `Array` or `Mask` of the **same length** as the input — positional alignment is preserved through the pipeline so masks remain meaningful across stages. -### 3. No-materialise contract +### 3. No-materialize contract A kernel that operates on a lazy `Array` variant **must not** call `ArraySegments.of(array)` unless every fallback path has been exhausted. @@ -117,7 +117,7 @@ The contract: encodes the scalar, compares longs.) 2. Streaming per-element implementation using the accessor if (1) is not possible. Allocates only the result. -3. Forced materialisation only as a last resort, gated by a debug log / +3. Forced materialization only as a last resort, gated by a debug log / counter so regressions are visible. This is the rule that prevents the lazy gain from leaking. A new kernel @@ -191,7 +191,7 @@ the same per-zone rows to the kernel rather than (only) to the inspector. ### Positive -- Filter / project / aggregate compose without materialising intermediates. +- Filter / project / aggregate compose without materializing intermediates. A `filter(close > 100).sum(volume)` pipeline touches the close column's encoded i64s once and the volume column's encoded i64s once. - The lazy decoders introduced by ADR 0010 / 0012 become useful for more @@ -217,8 +217,8 @@ the same per-zone rows to the kernel rather than (only) to the inspector. ### Risks to manage - **Kernel matrix explosion.** Mitigate by writing one generic streaming - path that works for any `Array` via accessors, then specialising only - the hot encodings (ALP, FoR, BitPacked, Dict). Specialisation is a + path that works for any `Array` via accessors, then specializing only + the hot encodings (ALP, FoR, BitPacked, Dict). Specialization is a performance escalation, not a correctness requirement. - **User-facing API churn.** Without a decided façade (transducer vs Stream vs builder), early callers of `vortex-compute` end up depending @@ -235,7 +235,7 @@ the same per-zone rows to the kernel rather than (only) to the inspector. ### A. Push everything through the Stream API Reuse `Stream` etc. with custom Spliterators that respect masks. -Rejected: Stream forces autoboxing on primitive specialisations (no +Rejected: Stream forces autoboxing on primitive specializations (no `DoubleStream.filter(DoublePredicate)` that emits a packed mask), and the internal Spliterator state isn't a natural place to carry encoded-domain short-circuits. Worth offering as a *convenience* sink on top of the @@ -255,7 +255,7 @@ a syntax layer; they cannot replace the kernels. ADR 0002 is marked Deferred and covers compute *pluggability*. This ADR covers the **primitives** that pluggable compute would plug into. We can ship the primitives now without committing to user-installable kernels. -Pluggability is a later question — the no-materialise contract and +Pluggability is a later question — the no-materialize contract and predicate vocabulary stand on their own. ## References diff --git a/docs/adr/0015-drop-materialized-fallbacks.md b/docs/adr/0015-drop-materialized-fallbacks.md index 9bf00086c..ebc84af63 100644 --- a/docs/adr/0015-drop-materialized-fallbacks.md +++ b/docs/adr/0015-drop-materialized-fallbacks.md @@ -21,10 +21,10 @@ The cost compounds: duplicated across unit + integration suites. - **Decoder churn.** `ZigZag`, `FoR`, `ALP`, `RunEnd`, `Sparse`, `RLE`, `Dict`, `Chunked`, `VarBinView`, `DateTimeParts`, `DecimalByteParts`, - `Decimal`, `Constant` each have a non-trivial materialised branch that + `Decimal`, `Constant` each have a non-trivial materialized branch that must keep up with format changes — without ever firing in production. - **Surprise factor for contributors.** New decoders default to the - pattern they read in neighbouring files; if the neighbouring file still + pattern they read in neighboring files; if the neighboring file still has a Materialized branch, the new encoding inherits the obsolete shape by mimicry. - **Real-file fragility.** The per-column chunking misalignment fix in @@ -65,7 +65,7 @@ A Materialized branch may be deleted when **all** of these hold: instance-of assertions, byte-level segment comparisons that only the eager path produces). - Helper methods that exist only to support the eager path (e.g. - `applyReference`, per-ptype materialisers in + `applyReference`, per-ptype materializers in `FrameOfReferenceEncodingDecoder`). ### What stays @@ -74,7 +74,7 @@ A Materialized branch may be deleted when **all** of these hold: `delta`, `patched` — keep their Materialized output. ADR 0010 § "Decompression encodings stay eager" applies; ADR 0015 does not change it. -- Materialisation fallbacks inside `ArraySegments.of(arr, arena)` stay — +- Materialization fallbacks inside `ArraySegments.of(arr, arena)` stay — they exist for callers that explicitly request a `MemorySegment` from a Lazy array, not for default decode. @@ -90,10 +90,10 @@ A Materialized branch may be deleted when **all** of these hold: ### Negative - Loss of an in-tree A/B comparison point. Mitigation: keep the - Materialised path live in benchmarks only, never in production decode. -- Encoding-specific micro-optimisations that the eager path enabled + Materialized path live in benchmarks only, never in production decode. +- Encoding-specific micro-optimizations that the eager path enabled (e.g. SIMD-friendly `applyReference` in FoR) need to migrate to the - Lazy path or to a `materialiseXxx` helper in `ArraySegments`. + Lazy path or to a `materializeXxx` helper in `ArraySegments`. ## Rollout diff --git a/docs/adr/0016-vortex-arrow-bridge.md b/docs/adr/0016-vortex-arrow-bridge.md index 3cf9af382..e20c21be5 100644 --- a/docs/adr/0016-vortex-arrow-bridge.md +++ b/docs/adr/0016-vortex-arrow-bridge.md @@ -112,8 +112,8 @@ Three pieces of work beyond just handing over the existing mmap slices: length rounded up to 8) plus a `null_count`. Our `MaskedArray.validity()` is a per-element `BoolArray`, so it must be packed; `validity == null` emits a null buffer pointer with `null_count = 0`. -2. **Lazy materialisation.** Lazy arrays (ZigZag/FoR/ALP/Dict/RLE) store the - *encoded* form, which is not the Arrow values layout, so they must be materialised +2. **Lazy materialization.** Lazy arrays (ZigZag/FoR/ALP/Dict/RLE) store the + *encoded* form, which is not the Arrow values layout, so they must be materialized into a contiguous LE segment first. This is exactly the producer step that `Array.materialize(arena)` performs (see below), so it feeds the `values` buffer directly. Primitive values, VarBin data+offsets, and StringView are already @@ -128,12 +128,12 @@ Three pieces of work beyond just handing over the existing mmap slices: ### Relationship to the `Array.materialize` seam (shipped) -The bulk-materialisation seam Option B builds on now exists: +The bulk-materialization seam Option B builds on now exists: `Array.materialize(SegmentAllocator)` — a pure abstract method (mirroring the existing `Array.limited(...)` polymorphism) that turns any array, lazy or eager, into a contiguous LE primitive segment. Each type owns its path: segment-backed arrays return their buffer zero-copy, the `Lazy*` variants apply their inlined decode formula (ZigZag/FoR/ALP) in a -vectorisable loop next to their per-element accessor, chunked/dict arrays concat/gather, +vectorizable loop next to their per-element accessor, chunked/dict arrays concat/gather, and the families with no primary segment (struct, list, variant, byte-parts decimal, null, unknown) throw. @@ -141,7 +141,7 @@ This is **not** an Arrow feature — but it is the natural producer of the Arrow buffer, so Option B builds on it. The contiguous LE segment it yields already matches Arrow's primitive values-buffer layout. Two gaps remain to a full Arrow array, both per the table above: validity + offsets + children; and the broadcast edge — a constant column -materialises to a single-element buffer (`length != elementCount`), which `materialize()` +materializes to a single-element buffer (`length != elementCount`), which `materialize()` returns as-is, so the Arrow producer must expand it to `length` values. `materialize` is intentionally part of the public `Array` contract (not a package-private @@ -165,7 +165,7 @@ Ship nothing; point users at the typed views and let them copy into Arrow themse - Arrow interop has a recorded home and a recommended shape when the need arrives. ### Negative -- Until built, Arrow consumers must hand-roll conversion (Option C behaviour). +- Until built, Arrow consumers must hand-roll conversion (Option C behavior). ### Risks to manage - Arrow's Java buffer layout and `BufferAllocator` lifetime model must be mapped diff --git a/docs/adr/0018-calcite-sql-adapter.md b/docs/adr/0018-calcite-sql-adapter.md index a5601528a..a7442eef7 100644 --- a/docs/adr/0018-calcite-sql-adapter.md +++ b/docs/adr/0018-calcite-sql-adapter.md @@ -20,7 +20,7 @@ GROUP BY …` rather than hand-written scan loops. Two ways to answer that: -1. **Build a SQL engine.** Parser, type system, logical plan, cost-based optimiser, join +1. **Build a SQL engine.** Parser, type system, logical plan, cost-based optimizer, join algorithms, aggregation/spilling, NULL semantics. Person-years, none of it Vortex-specific — it is solved, generic, and not where this project's advantage lies. 2. **Adapt to an existing engine.** Plug the Vortex scan into a mature SQL front-end and let @@ -29,12 +29,12 @@ Two ways to answer that: The project's advantage is the **scan**, not the engine. An external engine only ever sees *decoded* values, so it has already paid the cost Vortex could have skipped — chunk-skipping -via zone maps, encoded-domain comparison (`compare(ALPArray, scalar)` without materialising +via zone maps, encoded-domain comparison (`compare(ALPArray, scalar)` without materializing doubles), and answering `MIN`/`MAX`/`COUNT`/`SUM` straight from the stats table. That asymmetry only exists *inside* Vortex. So the goal is to expose the scan + push-down to an engine, not to reimplement the engine. -Apache Calcite is the natural JVM host: a pure-Java SQL parser + relational optimiser + +Apache Calcite is the natural JVM host: a pure-Java SQL parser + relational optimizer + pluggable adapter framework (the substrate under Drill, Flink-SQL, Beam-SQL). It does the generic work; the adapter contributes a table that knows how to push work down. @@ -169,7 +169,7 @@ floating-point filter column (NaN ordering), a non-numeric `SUM`, or a missing/m **Two doors, chosen by query shape.** Calcite is the right tool for *reducing* queries (filter / aggregate / group-by), where push-down shrinks the result and the `Object[]` boundary -amortises to near-nothing. It is the wrong tool for *bulk columnar extract* (`SELECT *` over a +amortizes to near-nothing. It is the wrong tool for *bulk columnar extract* (`SELECT *` over a large file, weak/no filter): every row boxes. That shape should bypass Calcite via the Arrow C-Data export of ADR 0016 (Option B), columnar and zero-boxing. Calcite is never Vortex's *execution* engine — the moment many rows flow *through* it, the wrong door was used. The @@ -180,7 +180,7 @@ by construction. ### Positive -- SQL over Vortex with no engine to build or maintain; Calcite owns parse/plan/optimise/join. +- SQL over Vortex with no engine to build or maintain; Calcite owns parse/plan/optimize/join. - The push-down surface is exactly the ADR 0013 primitive set — the adapter is the first real consumer of that vocabulary, validating it against a real planner. - Dependency isolation: only consumers who add `vortex-calcite` pay the Calcite/Janino/Guava @@ -193,11 +193,11 @@ by construction. ### Negative - Calcite's Enumerable execution is row-at-a-time `Object[]` with autoboxing. Acceptable for a - *source* (push-down does the heavy work before rows materialise), but it caps throughput on + *source* (push-down does the heavy work before rows materialize), but it caps throughput on any query that emits many rows — which is why bulk extract is pushed to the Arrow door, not Calcite. - A second public surface (SQL) with its own semantics gotchas (reserved words, `AVG` integer - division, three-valued logic) the adapter must honour exactly to stay consistent with the + division, three-valued logic) the adapter must honor exactly to stay consistent with the push-down path. - The `RelOptRule` for aggregate push-down is non-trivial Calcite-internal work; the kernel matrix (Array variant × Predicate variant) from ADR 0013 still has to exist underneath. @@ -210,7 +210,7 @@ by construction. does. - **Calcite version churn.** Calcite's adapter SPI and Avatica move between releases; pin the version and keep the smoke test as the JDK-compatibility tripwire (re-run on JDK upgrades). -- **Temptation to make Calcite columnar.** Bridging Vortex into a vectorised execution engine is +- **Temptation to make Calcite columnar.** Bridging Vortex into a vectorized execution engine is the rabbit hole; that is DuckDB/DataFusion territory (native — the `vortex-jni` world). Resist it. Keep Calcite as front-end + planner only. - **Lifetime.** `Object[]` rows hold decoded values, but any zero-copy path (future) must keep @@ -223,7 +223,7 @@ by construction. A hand-rolled parser + executor over the ADR 0013 kernels (single table, `WHERE`, simple aggregates, no joins). Attractive as a *demo* of push-down, but it dies at the first join or -optimiser requirement and re-implements solved, generic machinery. Rejected as a product +optimizer requirement and re-implements solved, generic machinery. Rejected as a product direction; the prototype's value is the adapter + push-down helper, not a query language. ### B. Hand off entirely to DuckDB / DataFusion via Arrow @@ -237,8 +237,8 @@ replacement for JVM SQL with push-down. ### C. Apache Calcite with full custom physical convention -Implement a complete `Convention` with vectorised Vortex operators so execution stays columnar -end-to-end. Maximum performance, but it means building a vectorised execution engine inside +Implement a complete `Convention` with vectorized Vortex operators so execution stays columnar +end-to-end. Maximum performance, but it means building a vectorized execution engine inside Calcite — most of option A's cost plus Calcite-internal complexity. Deferred indefinitely; the `Object[]` Enumerable path plus aggressive push-down is sufficient for the source role, and bulk throughput belongs to the Arrow door. diff --git a/docs/adr/ADR.md b/docs/adr/ADR.md index bbc292a65..844165a01 100644 --- a/docs/adr/ADR.md +++ b/docs/adr/ADR.md @@ -27,7 +27,7 @@ the decision shipped in (blank = not yet shipped). | 0010 | Lazy decode | Accepted | 0.7.0 | | 0011 | Writer zero-copy MemorySegment overload | Deferred | | | 0012 | Zero-copy layout decoding: lazy Chunked/Dict | Accepted | 0.7.0 | -| 0013 | Compute primitives: masks, kernels, no-materialise | Proposed | | +| 0013 | Compute primitives: masks, kernels, no-materialize | Proposed | | | 0014 | Variant encoding: chunked constants now, parquet.variant later | Accepted | 0.8.0 | | 0015 | Drop Materialized fallbacks once Lazy has shipped | Accepted | 0.8.0 | | 0016 | vortex-arrow bridge module for Arrow interop | Proposed | | diff --git a/docs/compatibility.md b/docs/compatibility.md index ea40d27d2..f989fb765 100644 --- a/docs/compatibility.md +++ b/docs/compatibility.md @@ -84,7 +84,7 @@ decoder falls into one of three shapes: No arena allocation, no per-element copy. - **Lazy** — output is a `LazyXxxArray` / `ChunkedXxxArray` record that holds the encoded child plus the transform parameters. Per-row `getXxx(i)` applies the transform on demand. No - output buffer is allocated unless a caller explicitly materialises via + output buffer is allocated unless a caller explicitly materializes via `ArraySegments.of(arr, arena)`. - **Materialized** — output is a buffer allocated from `ctx.arena()` populated during `decode()`. Required for decompression-style encodings (Bitpacked, Pco, Zstd, etc.) where reading element @@ -115,7 +115,7 @@ decoder falls into one of three shapes: | `vortex.fixed_size_list` | Lazy | Lazy | `FixedSizeListArray` wraps flat elements child; no per-row alloc | | `vortex.zstd` | Materialized | Materialized | block decompression | | `vortex.masked` | Zero-copy | Zero-copy | wraps inner + validity | -| `vortex.decimal` | Lazy | Lazy | `LazyDecimalArray` — `BigDecimal` materialised per row on `getDecimal(i)` | +| `vortex.decimal` | Lazy | Lazy | `LazyDecimalArray` — `BigDecimal` materialized per row on `getDecimal(i)` | | `vortex.decimal_byte_parts` | Lazy | Lazy | `LazyDecimalBytePartsArray` — reassembles byte parts on access | | `vortex.datetimeparts` | Lazy | Lazy | `LazyDateTimePartsLongArray` — reassembles parts on access | | `vortex.pco` | Materialized | Materialized | range-encoded decompression | @@ -134,7 +134,7 @@ Bitpacked produces `LazyAlp(MaterializedXxx)`). ### Unknown encodings -Files containing unrecognised encoding IDs throw `VortexException` by default. Opt in to +Files containing unrecognized encoding IDs throw `VortexException` by default. Opt in to passthrough mode to read such files without failing: ```java @@ -150,7 +150,7 @@ try (VortexReader vf = VortexReader.open(path, registry)) { ## Extension types Extension dtypes wrap a primitive storage array with a logical-id tag plus optional -metadata. The Rust catalogue lives in +metadata. The Rust catalog lives in [`vortex-array/src/extension/`](https://github.com/vortex-data/vortex/tree/develop/vortex-array/src/extension); each subdir below names a canonical extension id and its on-disk shape. diff --git a/docs/explanation.md b/docs/explanation.md index f46f9b9e7..3bb580ee8 100644 --- a/docs/explanation.md +++ b/docs/explanation.md @@ -133,9 +133,9 @@ but it is not a candidate in the numeric cascade. The JNI path pays three costs per batch: (1) a JNI boundary crossing to call into native code, (2) the Arrow C Data Interface handshake to pass decoded buffers back to the JVM as -`ArrowArray`/`ArrowSchema` structs, and (3) materialising the result into Apache Arrow +`ArrowArray`/`ArrowSchema` structs, and (3) materializing the result into Apache Arrow `VectorSchemaRoot` objects before the application can read a single value. The JIT cannot -inline or optimise across the JNI boundary. +inline or optimize across the JNI boundary. `vortex-java` eliminates all of that. The FFM API (`MemorySegment`) gives Java code a typed, bounds-checked view directly into the OS mmap region. Decoding reads bytes directly @@ -155,7 +155,7 @@ is tiny and the scan is over a handful of entries, not individual rows. **1. mmap zero-copy.** Vortex reads directly from the mmap'd `MemorySegment` — the file bytes _are_ the decode -input, no intermediate copies. Hardwood reads into internal page buffers and materialises +input, no intermediate copies. Hardwood reads into internal page buffers and materializes values before batch hand-off. Parquet also pays per-page framing overhead: RLE-encoded definition/repetition levels, page header parsing, optional dictionary decode. Vortex's layout is a flat array of encoded values with no per-row framing. @@ -409,7 +409,7 @@ At decode time the registry maps the ID string from the Layout node to the right `Encoding` instance and calls `decode(DecodeContext)`. Custom encodings can be added at build time: `Registry.builder().register(myEncoding).build()`. -Files with unrecognised IDs throw `VortexException` unless the builder enabled `allowUnknown()`. +Files with unrecognized IDs throw `VortexException` unless the builder enabled `allowUnknown()`. ## Testing strategy diff --git a/docs/how-to.md b/docs/how-to.md index f07830aef..72437d799 100644 --- a/docs/how-to.md +++ b/docs/how-to.md @@ -295,7 +295,7 @@ java -jar cli/target/vortex-cli-*-all.jar filter data.vortex "price >= 100" > ou ## Read files with unknown encodings -By default, a file containing an unrecognised encoding ID throws `VortexException`. +By default, a file containing an unrecognized encoding ID throws `VortexException`. Use `allowUnknown()` to read the file anyway — columns with unknown encodings are returned as `UnknownArray` (opaque, not decodable, but the rest of the file is readable): diff --git a/docs/testing.md b/docs/testing.md index f3dd95c71..0e8fb9e5d 100644 --- a/docs/testing.md +++ b/docs/testing.md @@ -25,7 +25,7 @@ caught at the lowest layer that can see them. | Layer | Runner | ~Executions | Scope | |-------|--------|-------------|-------| -| Unit | surefire | ~2,690 | One class/behaviour, in-memory, no I/O | +| Unit | surefire | ~2,690 | One class/behavior, in-memory, no I/O | | Property-based | surefire | (subset of unit) | Seeded-random sweeps over encode/decode | | Integration | failsafe | ~271 | Java↔Rust interop + real files + CLI end-to-end | | Mutation | PIT (opt-in) | — | Adequacy of tests for bounds/parse classes | @@ -42,7 +42,7 @@ I/O, no network, no sleep — mock or use in-memory `MemorySegment`s. Each test What they cover, by module: -- **core** (256) — `DType`/`PType` modelling, `IoBounds` guards, `PTypeIO` little-endian +- **core** (256) — `DType`/`PType` modeling, `IoBounds` guards, `PTypeIO` little-endian segment reads/writes, proto record encode/decode. - **reader** (780) — every `EncodingDecoder` and `Array` subtype, the file-structure parsers (`Footer`, `Trailer`, `PostscriptParser`, `Layout`), and the lazy/chunked/dict @@ -69,7 +69,7 @@ Seeds are fixed so any failure reproduces. Current property suites: - `CascadingCompressorTest.RoundTripProperty` — the full encoder-selection + nesting pipeline, every codec at cascade depth 0–3. - `PcoEncodingEncoderTest` / `PcoEncodingDecoderTest` — Pco mode pickers (delta, IntMult), - bin optimiser, and ANS/patch paths over mixed distributions. + bin optimizer, and ANS/patch paths over mixed distributions. ## Integration tests (`./mvnw verify -pl integration -am`) @@ -134,13 +134,13 @@ explicit `ClassName.methodName` filter. Coverage (JaCoCo, aggregated across surefire + failsafe) is ~81% and is reported to SonarCloud daily. Generated `fbs/`/`proto/` sources and the `performance/` benchmark module -are excluded — they have no hand-written behaviour worth covering. The quality gate requires +are excluded — they have no hand-written behavior worth covering. The quality gate requires zero bugs and zero vulnerabilities; the build itself fails on any javac warning (`-Xlint:all -Werror`), zero Checkstyle violations, and zero Javadoc warnings. ## Reading the signals: Sonar and PIT as data, not verdicts -SonarCloud and PIT both report facts, not judgements. A Sonar finding ("this line is +SonarCloud and PIT both report facts, not judgments. A Sonar finding ("this line is uncovered", "these blocks are duplicated") is a pointer to look, not a defect by itself — the interpretation is the engineering work. Two patterns recur often enough to be worth naming. @@ -150,7 +150,7 @@ naming. When Sonar flags a line as not covered, it is exactly one of: 1. **Missing test** — reachable by valid input, just never exercised. Add the test. -2. **Dead code** — unreachable by any input. Delete it; a test would only pin behaviour +2. **Dead code** — unreachable by any input. Delete it; a test would only pin behavior that can never run. 3. **Defensive-by-contract** — reachable only if an invariant is already broken: the `default -> throw new VortexException(...)` arms, the `catch (IOException)` on metadata @@ -173,11 +173,11 @@ Sonar's duplication metric is also a pointer, not an order. Most flagged duplica and should be factored out — e.g. the four `unpackLoop8/16/32/64` methods in `BitpackedEncodingDecoder` each rebuilt an identical per-row schedule, now hoisted into one `schedule(typeBits, bitWidth)` helper. But some duplication is the price of a hard -constraint: the per-element inner unpack loops in those same methods stay specialised per +constraint: the per-element inner unpack loops in those same methods stay specialized per width on purpose, because a generic `ValueLayout`/accessor would stop C2 from constant-folding -the typed access and block superword vectorisation (the hot-loop rule). When duplication and a +the typed access and block superword vectorization (the hot-loop rule). When duplication and a performance or safety invariant conflict, the invariant wins — factor out the cold, -run-once part and leave the hot, specialised part alone, with a comment saying why. +run-once part and leave the hot, specialized part alone, with a comment saying why. The throughline: let the tools point at the data, then decide with the context they do not have. diff --git a/integration/src/test/java/io/github/dfa1/vortex/integration/AllowUnknownIntegrationTest.java b/integration/src/test/java/io/github/dfa1/vortex/integration/AllowUnknownIntegrationTest.java index 489215e3d..4ef3aaec6 100644 --- a/integration/src/test/java/io/github/dfa1/vortex/integration/AllowUnknownIntegrationTest.java +++ b/integration/src/test/java/io/github/dfa1/vortex/integration/AllowUnknownIntegrationTest.java @@ -34,7 +34,7 @@ import static org.assertj.core.api.Assertions.assertThat; import static org.assertj.core.api.Assertions.assertThatThrownBy; -/// Verifies Registry.allowUnknown() passthrough behaviour end-to-end +/// Verifies Registry.allowUnknown() passthrough behavior end-to-end /// using a JNI-written file whose encodings are deliberately absent from the registry. class AllowUnknownIntegrationTest { diff --git a/integration/src/test/java/io/github/dfa1/vortex/integration/JavaWritesRustReadsIntegrationTest.java b/integration/src/test/java/io/github/dfa1/vortex/integration/JavaWritesRustReadsIntegrationTest.java index 881114612..14390e28d 100644 --- a/integration/src/test/java/io/github/dfa1/vortex/integration/JavaWritesRustReadsIntegrationTest.java +++ b/integration/src/test/java/io/github/dfa1/vortex/integration/JavaWritesRustReadsIntegrationTest.java @@ -927,7 +927,7 @@ void javaWriter_rustReader_zigzag_i64(@TempDir Path tmp) throws IOException { @Test void javaWriter_rustReader_runEnd_i32(@TempDir Path tmp) throws IOException { - // Given — RunEnd: runs of same value exercise the run-length proto serialisation + // Given — RunEnd: runs of same value exercise the run-length proto serialization Path file = tmp.resolve("java_runend_i32.vtx"); int[] data = {10, 10, 10, 20, 20, 30, 30, 30, 30}; try (var ch = FileChannel.open(file, StandardOpenOption.CREATE, StandardOpenOption.WRITE); @@ -995,7 +995,7 @@ void javaWriter_rustReader_rle_i64(@TempDir Path tmp) throws IOException { @Test void javaWriter_rustReader_constant_i32(@TempDir Path tmp) throws IOException { - // Given — Constant: scalar proto serialisation and row count metadata + // Given — Constant: scalar proto serialization and row count metadata Path file = tmp.resolve("java_constant_i32.vtx"); int[] data = {42, 42, 42, 42, 42}; try (var ch = FileChannel.open(file, StandardOpenOption.CREATE, StandardOpenOption.WRITE); @@ -1334,7 +1334,7 @@ void javaWriter_rustReader_listView_i64(@TempDir Path tmp) throws IOException { sut.writeChunk(Map.of("items", data)); } - // Then — Rust normalises ListView to List on read; verify flattened elements + // Then — Rust normalizes ListView to List on read; verify flattened elements long[] flatElements = readListLongColumn(file, "items"); assertThat(flatElements).containsExactly(elements); } diff --git a/integration/src/test/java/io/github/dfa1/vortex/integration/RustJavaReaderComparisonIntegrationTest.java b/integration/src/test/java/io/github/dfa1/vortex/integration/RustJavaReaderComparisonIntegrationTest.java index c8194170f..77249bc8c 100644 --- a/integration/src/test/java/io/github/dfa1/vortex/integration/RustJavaReaderComparisonIntegrationTest.java +++ b/integration/src/test/java/io/github/dfa1/vortex/integration/RustJavaReaderComparisonIntegrationTest.java @@ -217,7 +217,7 @@ private static Stats javaStats(Path file) throws Exception { var iter = reader.scan(io.github.dfa1.vortex.reader.ScanOptions.all())) { // Skip extension columns: Rust's stats path reports them under their logical // type (timestamp etc.), so summing their storage longs would diverge from - // Rust's per-column report. Match Rust's behaviour by ignoring them. + // Rust's per-column report. Match Rust's behavior by ignoring them. java.util.Set extensionCols = new java.util.HashSet<>(); if (reader.dtype() instanceof DType.Struct schema) { for (int i = 0; i < schema.fieldNames().size(); i++) { diff --git a/integration/src/test/java/io/github/dfa1/vortex/integration/RustWritesJavaReadsIntegrationTest.java b/integration/src/test/java/io/github/dfa1/vortex/integration/RustWritesJavaReadsIntegrationTest.java index 40fc34e4a..f2db6d779 100644 --- a/integration/src/test/java/io/github/dfa1/vortex/integration/RustWritesJavaReadsIntegrationTest.java +++ b/integration/src/test/java/io/github/dfa1/vortex/integration/RustWritesJavaReadsIntegrationTest.java @@ -189,7 +189,7 @@ private static long[] readJniLongColumn(Path file, String column) throws IOExcep } /// Scans `file` through the JNI Arrow reader and hands every loaded - /// [VectorSchemaRoot] batch to `batch`. Centralises the + /// [VectorSchemaRoot] batch to `batch`. Centralizes the /// open → scan → partition → loadNextBatch boilerplate shared by the /// JNI-reader assertions. private static void forEachArrowBatch(Path file, ScanOptions opts, Consumer batch) diff --git a/integration/src/test/java/io/github/dfa1/vortex/integration/TaxiParquetOracleVsJavaIntegrationTest.java b/integration/src/test/java/io/github/dfa1/vortex/integration/TaxiParquetOracleVsJavaIntegrationTest.java index 7012ede5b..7faff59e0 100644 --- a/integration/src/test/java/io/github/dfa1/vortex/integration/TaxiParquetOracleVsJavaIntegrationTest.java +++ b/integration/src/test/java/io/github/dfa1/vortex/integration/TaxiParquetOracleVsJavaIntegrationTest.java @@ -104,7 +104,7 @@ private static void writeOracleCsv(Path parquet, Path csv) throws IOException { /// /// Null values map to the same defaults as `ParquetImporter.fillRow`: /// 0 for numeric types, `false` for booleans, and `""` for strings. - /// `CsvExporter` then serialises these defaults as their string equivalents. + /// `CsvExporter` then serializes these defaults as their string equivalents. /// /// @param col the column schema /// @param reader the row reader positioned at the current row diff --git a/performance/src/main/java/io/github/dfa1/vortex/performance/ComputeKernelBenchmark.java b/performance/src/main/java/io/github/dfa1/vortex/performance/ComputeKernelBenchmark.java index a0e53664f..710c420da 100644 --- a/performance/src/main/java/io/github/dfa1/vortex/performance/ComputeKernelBenchmark.java +++ b/performance/src/main/java/io/github/dfa1/vortex/performance/ComputeKernelBenchmark.java @@ -42,16 +42,16 @@ import org.openjdk.jmh.annotations.TearDown; import org.openjdk.jmh.annotations.Warmup; -/// Baseline for the encoded-domain compute-kernel specialisation of ADR 0013. +/// Baseline for the encoded-domain compute-kernel specialization of ADR 0013. /// /// The compute kernels ([Compute#filter(Array, Predicate, Arena)] and /// [Compute#sum(Array, Mask)]) today decode every element through the typed accessor: the -/// generic streaming filter path and the type-specialised, boxing-free reduce lane both read +/// generic streaming filter path and the type-specialized, boxing-free reduce lane both read /// `getLong(i)` / `getDouble(i)` per row, so an ALP or Frame-of-Reference column is fully /// reconstructed into the value domain before a single comparison or addition runs. The future /// work compares and reduces directly in the encoded integer domain (ALP residuals, FoR offsets) /// without decoding. This benchmark pins the CURRENT decode-via-accessor cost so that win is -/// provable: the same `@Benchmark` methods will show the speedup once the specialised kernels land. +/// provable: the same `@Benchmark` methods will show the speedup once the specialized kernels land. /// /// One hundred million rows are written as `TOTAL_ROWS / CHUNK_ROWS` chunks of `CHUNK_ROWS` each /// with `WriteOptions.cascading(3)`, so the writer picks real encodings and the four columns decode @@ -85,7 +85,7 @@ /// - `forLoopX` — the naive decode-per-element loop, the developer's baseline. /// - `filterX` — the current kernel, which still decodes through the accessor; the `forLoopX`→ /// `filterX` gap is the kernel's overhead (or benefit) today. -/// - the future encoded-domain specialisation — measured against `forLoopX`, which it must beat by +/// - the future encoded-domain specialization — measured against `forLoopX`, which it must beat by /// comparing and reducing in the integer domain instead of decoding every element. /// /// Run: java -jar performance/target/benchmarks.jar ComputeKernelBenchmark @@ -252,7 +252,7 @@ public long filterDict() { } /// Control: filters the plain (non-encoded) `plain` column with `plain > 0` across every chunk, - /// reading each long straight from the materialised segment. Shows the cost without an encoding + /// reading each long straight from the materialized segment. Shows the cost without an encoding /// to unwind. A per-chunk confined arena holds the mask and is freed each chunk. /// /// @return the number of selected rows over the whole dataset @@ -357,7 +357,7 @@ public long forLoopDict() { } /// Naive baseline for [#filterPlainControl()]: the hand-written `plain > 0` count loop over the - /// materialised accessor across every chunk, reading each long straight from the segment per + /// materialized accessor across every chunk, reading each long straight from the segment per /// element. /// /// @return the number of rows with `plain > 0` over the whole dataset diff --git a/performance/src/main/java/io/github/dfa1/vortex/performance/JavaVsJniFilterBenchmark.java b/performance/src/main/java/io/github/dfa1/vortex/performance/JavaVsJniFilterBenchmark.java index 49c102782..12e64cfba 100644 --- a/performance/src/main/java/io/github/dfa1/vortex/performance/JavaVsJniFilterBenchmark.java +++ b/performance/src/main/java/io/github/dfa1/vortex/performance/JavaVsJniFilterBenchmark.java @@ -75,7 +75,7 @@ public class JavaVsJniFilterBenchmark { /// Target fraction of rows that should satisfy `close > threshold`. /// The threshold is computed in [#setup()] from a sample of the - /// "close" column; the realised selectivity is reported in the setup log + /// "close" column; the realized selectivity is reported in the setup log /// and may differ slightly from this value on small samples. @Param({"0.001", "0.01", "0.1", "1.0"}) public double selectivity; diff --git a/performance/src/main/java/io/github/dfa1/vortex/performance/JavaVsJniReadBenchmark.java b/performance/src/main/java/io/github/dfa1/vortex/performance/JavaVsJniReadBenchmark.java index 70d71990c..4cd7694d8 100644 --- a/performance/src/main/java/io/github/dfa1/vortex/performance/JavaVsJniReadBenchmark.java +++ b/performance/src/main/java/io/github/dfa1/vortex/performance/JavaVsJniReadBenchmark.java @@ -65,7 +65,7 @@ /// file with whatever encodings the Rust implementation chooses. /// /// Both benchmarks project onto the "close" column (F64) and sum all values so -/// the JVM can't optimise away the decode work. +/// the JVM can't optimize away the decode work. /// /// Run: java -jar performance/target/benchmarks.jar JavaVsJniReadBenchmark /// @@ -346,7 +346,7 @@ public long javaReadSymbol() throws IOException { return sum[0]; } - // ── Top-N reads: amortise open + footer/layout decode over N rows ──────── + // ── Top-N reads: amortize open + footer/layout decode over N rows ──────── /// Java read: project on "volume", consume only the first 10 rows. @Benchmark diff --git a/pom.xml b/pom.xml index f30ffdd37..3c103a1df 100644 --- a/pom.xml +++ b/pom.xml @@ -79,7 +79,7 @@ Update this when bumping mockito.version (run: mvn dependency:tree -pl reader | grep byte-buddy-agent). --> 1.18.10 diff --git a/reader/src/main/java/io/github/dfa1/vortex/reader/PostscriptParser.java b/reader/src/main/java/io/github/dfa1/vortex/reader/PostscriptParser.java index 044f631df..aaba93e28 100644 --- a/reader/src/main/java/io/github/dfa1/vortex/reader/PostscriptParser.java +++ b/reader/src/main/java/io/github/dfa1/vortex/reader/PostscriptParser.java @@ -30,7 +30,7 @@ final class PostscriptParser { /// Hard cap on per-layout metadata size. The FlatBuffer runtime returns an unbounded slice /// from `metadataAsSegment()`; a crafted file can claim a multi-gigabyte metadata - /// blob and force later allocators into pathological behaviour. 4 MiB is well above any + /// blob and force later allocators into pathological behavior. 4 MiB is well above any /// real encoding's metadata footprint (the largest is FSST's symbol table at ~32 KiB). static final int MAX_LAYOUT_METADATA_BYTES = 4 * 1024 * 1024; diff --git a/reader/src/main/java/io/github/dfa1/vortex/reader/ScanIterator.java b/reader/src/main/java/io/github/dfa1/vortex/reader/ScanIterator.java index 846243953..e7332543e 100644 --- a/reader/src/main/java/io/github/dfa1/vortex/reader/ScanIterator.java +++ b/reader/src/main/java/io/github/dfa1/vortex/reader/ScanIterator.java @@ -387,7 +387,7 @@ private Map projectedDtypeMap() { /// Returns the row count of every chunk in scan order, without decoding values. /// - /// Walks the file's layout tree (initialising internal state on first call) and + /// Walks the file's layout tree (initializing internal state on first call) and /// returns one element per chunk that the iterator would yield, in the same /// order. Useful for tooling that needs to navigate by absolute row index /// (e.g. an interactive grid viewer) before deciding which chunks to actually @@ -859,7 +859,7 @@ private Array decodeDictLayout(Layout dictLayout, DType dtype, SegmentAllocator // Zip-bomb guard (lazy path): the codes Array has already been decoded above; // its length() reflects the claimed rowCount but its backing buffer may be // mmap-bounded. Validate by inspecting the underlying segment without forcing - // materialisation of non-segment-backed codes (lazy variants). + // materialization of non-segment-backed codes (lazy variants). validateDictCodesCapacity(codes, codesPType, n); return buildLazyDictPrimitive(pDtype, n, values, codes); } diff --git a/reader/src/main/java/io/github/dfa1/vortex/reader/ZonedStatsSchema.java b/reader/src/main/java/io/github/dfa1/vortex/reader/ZonedStatsSchema.java index 2f9ad720c..f0847d722 100644 --- a/reader/src/main/java/io/github/dfa1/vortex/reader/ZonedStatsSchema.java +++ b/reader/src/main/java/io/github/dfa1/vortex/reader/ZonedStatsSchema.java @@ -93,7 +93,7 @@ public static long zoneLength(MemorySegment metadata) { /// /// Unknown bits (set at an index past [Stat#values()]'s length, which /// would mean a newer Vortex writer) are silently skipped — matching the Rust - /// reader's forward-compatibility behaviour. + /// reader's forward-compatibility behavior. /// /// @param metadata raw `vortex.stats` layout metadata, possibly `null` /// @return present stats in ascending ordinal order; empty if metadata carries no bitset diff --git a/reader/src/main/java/io/github/dfa1/vortex/reader/array/AbstractMaterializedArray.java b/reader/src/main/java/io/github/dfa1/vortex/reader/array/AbstractMaterializedArray.java index e3d47e923..83f34659c 100644 --- a/reader/src/main/java/io/github/dfa1/vortex/reader/array/AbstractMaterializedArray.java +++ b/reader/src/main/java/io/github/dfa1/vortex/reader/array/AbstractMaterializedArray.java @@ -12,7 +12,7 @@ /// /// Subclasses keep their own typed element access and the branch-split hot loops /// (`getX` / `fold` / `forEach`): those must stay monomorphic in the leaf class so the -/// JIT can vectorise them, and so are deliberately not hoisted here. This base only +/// JIT can vectorize them, and so are deliberately not hoisted here. This base only /// holds the cold boilerplate. /// /// Not `implements Array`: that interface is sealed to the typed element families diff --git a/reader/src/main/java/io/github/dfa1/vortex/reader/array/Array.java b/reader/src/main/java/io/github/dfa1/vortex/reader/array/Array.java index 5ffc4e293..94f0baba8 100644 --- a/reader/src/main/java/io/github/dfa1/vortex/reader/array/Array.java +++ b/reader/src/main/java/io/github/dfa1/vortex/reader/array/Array.java @@ -6,7 +6,7 @@ import java.lang.foreign.SegmentAllocator; import java.util.Optional; -/// Decoded columnar data. Concrete subtypes specialise element access for the JIT; +/// Decoded columnar data. Concrete subtypes specialize element access for the JIT; /// each covers a specific dtype family. /// /// Buffers are `MemorySegment` slices backed by the memory-mapped file; lifetime @@ -45,21 +45,21 @@ public sealed interface Array /// @return an array of length `rows` Array limited(long rows); - /// Materialises this array into its primary backing [MemorySegment], + /// Materializes this array into its primary backing [MemorySegment], /// allocating from `arena` for lazy variants. /// /// Segment-backed arrays (the `Materialized*` records, `VarBinArray`, /// `GenericArray`, `LazyDecimalArray`) return their existing buffer with no /// copy. Lazy primitive arrays decode element-by-element, the `Lazy*` /// frame-of-reference / zigzag / ALP variants apply their inlined formula in a - /// vectorisable loop, and composite arrays (chunked, dict) concatenate or gather - /// their children. This is the single materialisation contract behind + /// vectorizable loop, and composite arrays (chunked, dict) concatenate or gather + /// their children. This is the single materialization contract behind /// [io.github.dfa1.vortex.reader.decode.DecodeContext#materialize(Array)]. /// /// Array families with no row-addressable primary segment (struct, list, variant, /// the byte-parts decimal layout) throw [io.github.dfa1.vortex.core.error.VortexException]. /// - /// @param arena allocator used to materialise lazy variants + /// @param arena allocator used to materialize lazy variants /// @return the primary [MemorySegment] MemorySegment materialize(SegmentAllocator arena); @@ -83,7 +83,7 @@ static Array limited(Array arr, long rows) { /// otherwise empty — a non-allocating probe. /// /// Unlike [#materialize(java.lang.foreign.SegmentAllocator)], this never allocates or - /// decodes: lazy and composite arrays return empty rather than being materialised. The + /// decodes: lazy and composite arrays return empty rather than being materialized. The /// default is empty; segment-backed types (the `Materialized*` records, `VarBinArray`, /// `GenericArray`, `LazyDecimalArray`) override to return their existing buffer, and /// [MaskedArray] delegates to its inner data. The scan layer's dictionary zip-bomb guard diff --git a/reader/src/main/java/io/github/dfa1/vortex/reader/array/BoolArray.java b/reader/src/main/java/io/github/dfa1/vortex/reader/array/BoolArray.java index 8ea89263d..6c30dac7a 100644 --- a/reader/src/main/java/io/github/dfa1/vortex/reader/array/BoolArray.java +++ b/reader/src/main/java/io/github/dfa1/vortex/reader/array/BoolArray.java @@ -8,7 +8,7 @@ /// [Array] for bit-packed boolean columns (LSB-first, one byte per 8 elements). /// /// The default impl is [MaterializedBoolArray], a buffer-backed record -/// returned when an encoding decoder either materialises values eagerly or +/// returned when an encoding decoder either materializes values eagerly or /// has no lazy variant of its own. public non-sealed interface BoolArray extends Array { diff --git a/reader/src/main/java/io/github/dfa1/vortex/reader/array/ChunkedByteArray.java b/reader/src/main/java/io/github/dfa1/vortex/reader/array/ChunkedByteArray.java index 0ab3c2780..07bef9c9e 100644 --- a/reader/src/main/java/io/github/dfa1/vortex/reader/array/ChunkedByteArray.java +++ b/reader/src/main/java/io/github/dfa1/vortex/reader/array/ChunkedByteArray.java @@ -90,8 +90,8 @@ public Array limited(long rows) { return ChunkedByteArray.of(dtype, rows, ChunkedArrays.limitedChildren(children, offsets, rows)); } - /// Materialises by concatenating each child's segment into one contiguous - /// byte buffer, each child materialised through its own + /// Materializes by concatenating each child's segment into one contiguous + /// byte buffer, each child materialized through its own /// [ByteArray#materialize(SegmentAllocator)]. /// /// @param arena allocator for the output segment diff --git a/reader/src/main/java/io/github/dfa1/vortex/reader/array/ChunkedDoubleArray.java b/reader/src/main/java/io/github/dfa1/vortex/reader/array/ChunkedDoubleArray.java index 448b4f2bb..b5bb2b7db 100644 --- a/reader/src/main/java/io/github/dfa1/vortex/reader/array/ChunkedDoubleArray.java +++ b/reader/src/main/java/io/github/dfa1/vortex/reader/array/ChunkedDoubleArray.java @@ -87,8 +87,8 @@ public Array limited(long rows) { return ChunkedDoubleArray.of(dtype, rows, ChunkedArrays.limitedChildren(children, offsets, rows)); } - /// Materialises by concatenating each child's segment into one contiguous - /// little-endian `f64` buffer, each child materialised through its own + /// Materializes by concatenating each child's segment into one contiguous + /// little-endian `f64` buffer, each child materialized through its own /// [DoubleArray#materialize(SegmentAllocator)]. /// /// @param arena allocator for the output segment diff --git a/reader/src/main/java/io/github/dfa1/vortex/reader/array/ChunkedFloatArray.java b/reader/src/main/java/io/github/dfa1/vortex/reader/array/ChunkedFloatArray.java index 67e39cdca..26c61fe2a 100644 --- a/reader/src/main/java/io/github/dfa1/vortex/reader/array/ChunkedFloatArray.java +++ b/reader/src/main/java/io/github/dfa1/vortex/reader/array/ChunkedFloatArray.java @@ -77,8 +77,8 @@ public Array limited(long rows) { return ChunkedFloatArray.of(dtype, rows, ChunkedArrays.limitedChildren(children, offsets, rows)); } - /// Materialises by concatenating each child's segment into one contiguous - /// little-endian `f32` buffer, each child materialised through its own + /// Materializes by concatenating each child's segment into one contiguous + /// little-endian `f32` buffer, each child materialized through its own /// [FloatArray#materialize(SegmentAllocator)]. /// /// @param arena allocator for the output segment diff --git a/reader/src/main/java/io/github/dfa1/vortex/reader/array/ChunkedIntArray.java b/reader/src/main/java/io/github/dfa1/vortex/reader/array/ChunkedIntArray.java index 0a96c795a..b83feb89f 100644 --- a/reader/src/main/java/io/github/dfa1/vortex/reader/array/ChunkedIntArray.java +++ b/reader/src/main/java/io/github/dfa1/vortex/reader/array/ChunkedIntArray.java @@ -85,8 +85,8 @@ public Array limited(long rows) { return ChunkedIntArray.of(dtype, rows, ChunkedArrays.limitedChildren(children, offsets, rows)); } - /// Materialises by concatenating each child's segment into one contiguous - /// little-endian `i32` buffer, each child materialised through its own + /// Materializes by concatenating each child's segment into one contiguous + /// little-endian `i32` buffer, each child materialized through its own /// [IntArray#materialize(SegmentAllocator)]. /// /// @param arena allocator for the output segment diff --git a/reader/src/main/java/io/github/dfa1/vortex/reader/array/ChunkedLongArray.java b/reader/src/main/java/io/github/dfa1/vortex/reader/array/ChunkedLongArray.java index d6af7e00b..ac14bd923 100644 --- a/reader/src/main/java/io/github/dfa1/vortex/reader/array/ChunkedLongArray.java +++ b/reader/src/main/java/io/github/dfa1/vortex/reader/array/ChunkedLongArray.java @@ -31,7 +31,7 @@ public record ChunkedLongArray(DType dtype, long length, LongArray[] children, l /// Builds a [ChunkedLongArray] from a list of chunk arrays. Nested /// chunked arrays are flattened; [MaskedArray] chunks are unwrapped - /// to their inner data (validity dropped — matches prior concat behaviour). + /// to their inner data (validity dropped — matches prior concat behavior). /// /// @param dtype logical element type /// @param totalRows expected total row count across all chunks @@ -110,8 +110,8 @@ public Array limited(long rows) { return ChunkedLongArray.of(dtype, rows, ChunkedArrays.limitedChildren(children, offsets, rows)); } - /// Materialises by concatenating each child's segment into one contiguous - /// little-endian `i64` buffer. Each child is materialised through its own + /// Materializes by concatenating each child's segment into one contiguous + /// little-endian `i64` buffer. Each child is materialized through its own /// [LongArray#materialize(SegmentAllocator)], so lazy children decode straight /// into the shared destination via a bulk copy. /// diff --git a/reader/src/main/java/io/github/dfa1/vortex/reader/array/ChunkedShortArray.java b/reader/src/main/java/io/github/dfa1/vortex/reader/array/ChunkedShortArray.java index 0fa370900..b4c5d6d94 100644 --- a/reader/src/main/java/io/github/dfa1/vortex/reader/array/ChunkedShortArray.java +++ b/reader/src/main/java/io/github/dfa1/vortex/reader/array/ChunkedShortArray.java @@ -91,8 +91,8 @@ public Array limited(long rows) { return ChunkedShortArray.of(dtype, rows, ChunkedArrays.limitedChildren(children, offsets, rows)); } - /// Materialises by concatenating each child's segment into one contiguous - /// little-endian `i16` buffer, each child materialised through its own + /// Materializes by concatenating each child's segment into one contiguous + /// little-endian `i16` buffer, each child materialized through its own /// [ShortArray#materialize(SegmentAllocator)]. /// /// @param arena allocator for the output segment diff --git a/reader/src/main/java/io/github/dfa1/vortex/reader/array/DateTimePartsArrays.java b/reader/src/main/java/io/github/dfa1/vortex/reader/array/DateTimePartsArrays.java index 8d2e009f3..b1ffb9ab3 100644 --- a/reader/src/main/java/io/github/dfa1/vortex/reader/array/DateTimePartsArrays.java +++ b/reader/src/main/java/io/github/dfa1/vortex/reader/array/DateTimePartsArrays.java @@ -6,7 +6,7 @@ /// /// `days`, `seconds` and `subseconds` children can each be one of /// the four signed-integer typed array interfaces; the writer picks the narrowest -/// ptype that fits. [#readLong(Array, long)] centralises the per-row read so +/// ptype that fits. [#readLong(Array, long)] centralizes the per-row read so /// the record itself stays compact. final class DateTimePartsArrays { diff --git a/reader/src/main/java/io/github/dfa1/vortex/reader/array/DecimalBytePartsArrays.java b/reader/src/main/java/io/github/dfa1/vortex/reader/array/DecimalBytePartsArrays.java index 5c19f5ebe..bd4dc8b65 100644 --- a/reader/src/main/java/io/github/dfa1/vortex/reader/array/DecimalBytePartsArrays.java +++ b/reader/src/main/java/io/github/dfa1/vortex/reader/array/DecimalBytePartsArrays.java @@ -10,7 +10,7 @@ /// decimal mantissa as a single signed-integer child column whose ptype the /// encoder picks (one of `i8 / i16 / i32 / i64`). The child may be wrapped /// in [MaskedArray] for nullable columns. [#readMantissa(Array, long)] -/// centralises the per-row dispatch so the record itself stays compact. +/// centralizes the per-row dispatch so the record itself stays compact. final class DecimalBytePartsArrays { private DecimalBytePartsArrays() { diff --git a/reader/src/main/java/io/github/dfa1/vortex/reader/array/DictArrays.java b/reader/src/main/java/io/github/dfa1/vortex/reader/array/DictArrays.java index 67091c3f7..a4f347477 100644 --- a/reader/src/main/java/io/github/dfa1/vortex/reader/array/DictArrays.java +++ b/reader/src/main/java/io/github/dfa1/vortex/reader/array/DictArrays.java @@ -4,7 +4,7 @@ /// Package-private helpers shared by the `DictXxxArray` records. /// -/// Centralises the codes-type validation (so the four record `of` +/// Centralizes the codes-type validation (so the four record `of` /// factories agree on what counts as a valid codes array) and the scalar /// `readCode` dispatch (so a single update fixes all four records). final class DictArrays { diff --git a/reader/src/main/java/io/github/dfa1/vortex/reader/array/DictDoubleArray.java b/reader/src/main/java/io/github/dfa1/vortex/reader/array/DictDoubleArray.java index a428427bf..24db578de 100644 --- a/reader/src/main/java/io/github/dfa1/vortex/reader/array/DictDoubleArray.java +++ b/reader/src/main/java/io/github/dfa1/vortex/reader/array/DictDoubleArray.java @@ -48,7 +48,7 @@ public double getDouble(long i) { return values.getDouble(DictArrays.readCode(codes, i)); } - /// Materialises by gathering one dictionary value per code into a fresh + /// Materializes by gathering one dictionary value per code into a fresh /// little-endian `f64` segment. The codes switch is hoisted outside the loop so /// each branch is a uniform gather over a single code width. /// diff --git a/reader/src/main/java/io/github/dfa1/vortex/reader/array/DictFloatArray.java b/reader/src/main/java/io/github/dfa1/vortex/reader/array/DictFloatArray.java index 18d91fd92..a675c1003 100644 --- a/reader/src/main/java/io/github/dfa1/vortex/reader/array/DictFloatArray.java +++ b/reader/src/main/java/io/github/dfa1/vortex/reader/array/DictFloatArray.java @@ -47,7 +47,7 @@ public float getFloat(long i) { return values.getFloat(DictArrays.readCode(codes, i)); } - /// Materialises by gathering one dictionary value per code into a fresh + /// Materializes by gathering one dictionary value per code into a fresh /// little-endian `f32` segment. The codes switch is hoisted outside the loop so /// each branch is a uniform gather over a single code width. /// diff --git a/reader/src/main/java/io/github/dfa1/vortex/reader/array/DictIntArray.java b/reader/src/main/java/io/github/dfa1/vortex/reader/array/DictIntArray.java index 55cc11117..472f7a111 100644 --- a/reader/src/main/java/io/github/dfa1/vortex/reader/array/DictIntArray.java +++ b/reader/src/main/java/io/github/dfa1/vortex/reader/array/DictIntArray.java @@ -48,7 +48,7 @@ public int getInt(long i) { return values.getInt(DictArrays.readCode(codes, i)); } - /// Materialises by gathering one dictionary value per code into a fresh + /// Materializes by gathering one dictionary value per code into a fresh /// little-endian `i32` segment. The codes switch is hoisted outside the loop so /// each branch is a uniform gather over a single code width. /// diff --git a/reader/src/main/java/io/github/dfa1/vortex/reader/array/DictLongArray.java b/reader/src/main/java/io/github/dfa1/vortex/reader/array/DictLongArray.java index 4f24fda30..40deab7dc 100644 --- a/reader/src/main/java/io/github/dfa1/vortex/reader/array/DictLongArray.java +++ b/reader/src/main/java/io/github/dfa1/vortex/reader/array/DictLongArray.java @@ -51,7 +51,7 @@ public long getLong(long i) { return values.getLong(DictArrays.readCode(codes, i)); } - /// Materialises by gathering one dictionary value per code into a fresh + /// Materializes by gathering one dictionary value per code into a fresh /// little-endian `i64` segment. The codes switch is hoisted outside the loop so /// each branch is a uniform gather over a single code width. /// diff --git a/reader/src/main/java/io/github/dfa1/vortex/reader/array/DoubleArray.java b/reader/src/main/java/io/github/dfa1/vortex/reader/array/DoubleArray.java index 5d4b95a60..2291ae90c 100644 --- a/reader/src/main/java/io/github/dfa1/vortex/reader/array/DoubleArray.java +++ b/reader/src/main/java/io/github/dfa1/vortex/reader/array/DoubleArray.java @@ -11,7 +11,7 @@ /// [Array] for F64 primitive columns. /// /// The default impl is [MaterializedDoubleArray], a buffer-backed record -/// returned when an encoding decoder either materialises values eagerly or +/// returned when an encoding decoder either materializes values eagerly or /// has no lazy variant of its own. public non-sealed interface DoubleArray extends Array { @@ -58,7 +58,7 @@ default Array limited(long rows) { /// Scalar fallback: decodes every element through [#getDouble(long)] into a fresh /// little-endian segment. Buffer-backed ([MaterializedDoubleArray]) and lazy /// formula-based variants ([LazyAlpDoubleArray], …) override with a zero-copy or - /// vectorised path. + /// vectorized path. /// /// @param arena allocator for the output segment /// @return a little-endian `f64` segment of `length()` elements diff --git a/reader/src/main/java/io/github/dfa1/vortex/reader/array/FixedSizeListArray.java b/reader/src/main/java/io/github/dfa1/vortex/reader/array/FixedSizeListArray.java index d293139be..68bba19b6 100644 --- a/reader/src/main/java/io/github/dfa1/vortex/reader/array/FixedSizeListArray.java +++ b/reader/src/main/java/io/github/dfa1/vortex/reader/array/FixedSizeListArray.java @@ -69,7 +69,7 @@ public Array limited(long rows) { } /// Always throws: a fixed-size list wraps a flat elements child, not a single - /// primary segment of its own. Materialise [#elements()] instead. + /// primary segment of its own. Materialize [#elements()] instead. /// /// @param arena unused /// @return never returns diff --git a/reader/src/main/java/io/github/dfa1/vortex/reader/array/Float16Array.java b/reader/src/main/java/io/github/dfa1/vortex/reader/array/Float16Array.java index f3a79901a..7a9f22d93 100644 --- a/reader/src/main/java/io/github/dfa1/vortex/reader/array/Float16Array.java +++ b/reader/src/main/java/io/github/dfa1/vortex/reader/array/Float16Array.java @@ -6,7 +6,7 @@ /// Wire format: little-endian shorts (2 bytes/element). Element access widens /// to `float` via [Float#float16ToFloat]. The default impl is /// [MaterializedFloat16Array], a buffer-backed record returned when an -/// encoding decoder either materialises values eagerly or has no lazy +/// encoding decoder either materializes values eagerly or has no lazy /// variant of its own. public non-sealed interface Float16Array extends Array { diff --git a/reader/src/main/java/io/github/dfa1/vortex/reader/array/FloatArray.java b/reader/src/main/java/io/github/dfa1/vortex/reader/array/FloatArray.java index a53f892db..1b19e96ee 100644 --- a/reader/src/main/java/io/github/dfa1/vortex/reader/array/FloatArray.java +++ b/reader/src/main/java/io/github/dfa1/vortex/reader/array/FloatArray.java @@ -10,7 +10,7 @@ /// [Array] for F32 primitive columns. /// /// The default impl is [MaterializedFloatArray], a buffer-backed record -/// returned when an encoding decoder either materialises values eagerly or +/// returned when an encoding decoder either materializes values eagerly or /// has no lazy variant of its own. public non-sealed interface FloatArray extends Array { @@ -47,7 +47,7 @@ default Array limited(long rows) { /// Scalar fallback: decodes every element through [#getFloat(long)] into a fresh /// little-endian segment. Buffer-backed ([MaterializedFloatArray]) and lazy /// formula-based variants ([LazyAlpFloatArray], …) override with a zero-copy or - /// vectorised path. + /// vectorized path. /// /// @param arena allocator for the output segment /// @return a little-endian `f32` segment of `length()` elements diff --git a/reader/src/main/java/io/github/dfa1/vortex/reader/array/GenericArray.java b/reader/src/main/java/io/github/dfa1/vortex/reader/array/GenericArray.java index 488b48dc8..2c22ff8fe 100644 --- a/reader/src/main/java/io/github/dfa1/vortex/reader/array/GenericArray.java +++ b/reader/src/main/java/io/github/dfa1/vortex/reader/array/GenericArray.java @@ -58,7 +58,7 @@ public long length() { /// Returns a view of this array clamped to the first `rows` logical rows. /// Buffers and children are reused as-is; callers are expected to respect - /// [#length()] when reading. Used by the scan iterator to honour + /// [#length()] when reading. Used by the scan iterator to honor /// `ScanOptions.limit` for dtypes that don't have a typed array. /// /// @param rows desired logical length; must be `<= length()` @@ -85,7 +85,7 @@ public MemorySegment materialize(SegmentAllocator arena) { return buffers[0]; } - /// Returns the primary (index 0) raw buffer — already materialised, no allocation. + /// Returns the primary (index 0) raw buffer — already materialized, no allocation. /// /// @return the first backing [MemorySegment] @Override @@ -155,7 +155,7 @@ private static BigInteger readSignedLe(MemorySegment buf, long offset, int width private static BigInteger readSigned128Le(MemorySegment buf, long offset) { // Two's-complement i128 on disk in little-endian; BigInteger ingests big-endian. - // No SIMD intrinsic for 16-byte signed integer, so we materialise into a heap + // No SIMD intrinsic for 16-byte signed integer, so we materialize into a heap // buffer here. Only fires for decimal(>18, _) — narrow-precision fast paths above // stay allocation-free. byte[] be = new byte[16]; diff --git a/reader/src/main/java/io/github/dfa1/vortex/reader/array/IntArray.java b/reader/src/main/java/io/github/dfa1/vortex/reader/array/IntArray.java index f5a4bd797..9561bf58b 100644 --- a/reader/src/main/java/io/github/dfa1/vortex/reader/array/IntArray.java +++ b/reader/src/main/java/io/github/dfa1/vortex/reader/array/IntArray.java @@ -11,7 +11,7 @@ /// [Array] for I32/U32 primitive columns. /// /// The default impl is [MaterializedIntArray], a buffer-backed record -/// returned when an encoding decoder either materialises values eagerly or +/// returned when an encoding decoder either materializes values eagerly or /// has no lazy variant of its own. public non-sealed interface IntArray extends Array { @@ -58,7 +58,7 @@ default Array limited(long rows) { /// Scalar fallback: decodes every element through [#getInt(long)] into a fresh /// little-endian segment. Buffer-backed ([MaterializedIntArray]) and lazy /// formula-based variants ([LazyForIntArray], [LazyZigZagIntArray], …) override - /// with a zero-copy or vectorised path. + /// with a zero-copy or vectorized path. /// /// @param arena allocator for the output segment /// @return a little-endian `i32` segment of `length()` elements diff --git a/reader/src/main/java/io/github/dfa1/vortex/reader/array/LazyAlpDoubleArray.java b/reader/src/main/java/io/github/dfa1/vortex/reader/array/LazyAlpDoubleArray.java index 5806210b2..910eaa67d 100644 --- a/reader/src/main/java/io/github/dfa1/vortex/reader/array/LazyAlpDoubleArray.java +++ b/reader/src/main/java/io/github/dfa1/vortex/reader/array/LazyAlpDoubleArray.java @@ -35,7 +35,7 @@ public double getDouble(long i) { /// The decode formula (including the two-step factor application that preserves IEEE /// rounding) lives only in [#getDouble(long)]; this override exists solely to give the /// JIT a monomorphic, inlinable call site (the shared [DoubleArray] default is - /// megamorphic across every implementation and will not inline or auto-vectorise). + /// megamorphic across every implementation and will not inline or auto-vectorize). /// /// @param arena allocator for the output segment /// @return a little-endian `f64` segment of decoded values diff --git a/reader/src/main/java/io/github/dfa1/vortex/reader/array/LazyAlpFloatArray.java b/reader/src/main/java/io/github/dfa1/vortex/reader/array/LazyAlpFloatArray.java index b4f9f37d6..3e01eef5d 100644 --- a/reader/src/main/java/io/github/dfa1/vortex/reader/array/LazyAlpFloatArray.java +++ b/reader/src/main/java/io/github/dfa1/vortex/reader/array/LazyAlpFloatArray.java @@ -30,7 +30,7 @@ public float getFloat(long i) { /// The decode formula (including the two-step factor application that preserves IEEE /// rounding) lives only in [#getFloat(long)]; this override exists solely to give the /// JIT a monomorphic, inlinable call site (the shared [FloatArray] default is - /// megamorphic across every implementation and will not inline or auto-vectorise). + /// megamorphic across every implementation and will not inline or auto-vectorize). /// /// @param arena allocator for the output segment /// @return a little-endian `f32` segment of decoded values diff --git a/reader/src/main/java/io/github/dfa1/vortex/reader/array/LazyConstantDecimalArray.java b/reader/src/main/java/io/github/dfa1/vortex/reader/array/LazyConstantDecimalArray.java index 40dd68ba3..f1ffadf8a 100644 --- a/reader/src/main/java/io/github/dfa1/vortex/reader/array/LazyConstantDecimalArray.java +++ b/reader/src/main/java/io/github/dfa1/vortex/reader/array/LazyConstantDecimalArray.java @@ -37,7 +37,7 @@ public Array limited(long rows) { return new LazyConstantDecimalArray(dtype, rows, value, byteWidth); } - /// Materialises by writing the single constant value, in little-endian + /// Materializes by writing the single constant value, in little-endian /// two's-complement, `length` times into a fresh `byteWidth`-per-row segment. /// /// @param arena allocator for the output segment diff --git a/reader/src/main/java/io/github/dfa1/vortex/reader/array/LazyDateTimePartsLongArray.java b/reader/src/main/java/io/github/dfa1/vortex/reader/array/LazyDateTimePartsLongArray.java index ef44a71ba..eae011ca1 100644 --- a/reader/src/main/java/io/github/dfa1/vortex/reader/array/LazyDateTimePartsLongArray.java +++ b/reader/src/main/java/io/github/dfa1/vortex/reader/array/LazyDateTimePartsLongArray.java @@ -19,7 +19,7 @@ /// `unitsPerDay` = `86_400 * unitsPerSecond`. The reassembled long carries the /// same epoch count the downstream extension decoder /// (`TimestampExtensionDecoder`, `DateExtensionDecoder`, etc.) expects; -/// no buffer materialisation occurs at construction time. +/// no buffer materialization occurs at construction time. /// /// The record's [#dtype()] is the parent Extension dtype (e.g. /// `vortex.timestamp`) so it slots transparently into the extension-decode diff --git a/reader/src/main/java/io/github/dfa1/vortex/reader/array/LazyDecimalArray.java b/reader/src/main/java/io/github/dfa1/vortex/reader/array/LazyDecimalArray.java index 678259efc..832a050a2 100644 --- a/reader/src/main/java/io/github/dfa1/vortex/reader/array/LazyDecimalArray.java +++ b/reader/src/main/java/io/github/dfa1/vortex/reader/array/LazyDecimalArray.java @@ -83,7 +83,7 @@ public MemorySegment materialize(SegmentAllocator arena) { return buf; } - /// Returns the backing buffer directly — already materialised, no allocation. + /// Returns the backing buffer directly — already materialized, no allocation. /// /// @return the backing little-endian two's-complement segment @Override diff --git a/reader/src/main/java/io/github/dfa1/vortex/reader/array/LazyDecimalBytePartsArray.java b/reader/src/main/java/io/github/dfa1/vortex/reader/array/LazyDecimalBytePartsArray.java index 92e38310d..f5e7f278e 100644 --- a/reader/src/main/java/io/github/dfa1/vortex/reader/array/LazyDecimalBytePartsArray.java +++ b/reader/src/main/java/io/github/dfa1/vortex/reader/array/LazyDecimalBytePartsArray.java @@ -16,7 +16,7 @@ /// scale. [#getDecimal(long)] reads one cell from the child via /// [DecimalBytePartsArrays#readMantissa(Array, long)] and combines it with /// the dtype scale to produce a [BigDecimal] on demand — no buffer -/// materialisation occurs at construction time. +/// materialization occurs at construction time. /// /// @param dtype the parent [DType.Decimal] dtype (precision + scale + nullable) /// @param length total logical row count diff --git a/reader/src/main/java/io/github/dfa1/vortex/reader/array/LazyForIntArray.java b/reader/src/main/java/io/github/dfa1/vortex/reader/array/LazyForIntArray.java index 3fe6b4fa2..d1bff4103 100644 --- a/reader/src/main/java/io/github/dfa1/vortex/reader/array/LazyForIntArray.java +++ b/reader/src/main/java/io/github/dfa1/vortex/reader/array/LazyForIntArray.java @@ -27,7 +27,7 @@ public int getInt(long i) { /// Bulk-decodes through [#getInt(long)] into a fresh little-endian `i32` segment. /// The decode formula lives only in [#getInt(long)]; this override exists solely to /// give the JIT a monomorphic, inlinable call site (the shared [IntArray] default is - /// megamorphic across every implementation and will not inline or auto-vectorise). + /// megamorphic across every implementation and will not inline or auto-vectorize). /// /// @param arena allocator for the output segment /// @return a little-endian `i32` segment of decoded values diff --git a/reader/src/main/java/io/github/dfa1/vortex/reader/array/LazyForLongArray.java b/reader/src/main/java/io/github/dfa1/vortex/reader/array/LazyForLongArray.java index 05658f0e8..f8b53eabb 100644 --- a/reader/src/main/java/io/github/dfa1/vortex/reader/array/LazyForLongArray.java +++ b/reader/src/main/java/io/github/dfa1/vortex/reader/array/LazyForLongArray.java @@ -27,7 +27,7 @@ public long getLong(long i) { /// Bulk-decodes through [#getLong(long)] into a fresh little-endian `i64` segment. /// The decode formula lives only in [#getLong(long)]; this override exists solely to /// give the JIT a monomorphic, inlinable call site (the shared [LongArray] default is - /// megamorphic across every implementation and will not inline or auto-vectorise). + /// megamorphic across every implementation and will not inline or auto-vectorize). /// /// @param arena allocator for the output segment /// @return a little-endian `i64` segment of decoded values diff --git a/reader/src/main/java/io/github/dfa1/vortex/reader/array/LazyZigZagIntArray.java b/reader/src/main/java/io/github/dfa1/vortex/reader/array/LazyZigZagIntArray.java index 2ee45fbe1..e1d5cb07e 100644 --- a/reader/src/main/java/io/github/dfa1/vortex/reader/array/LazyZigZagIntArray.java +++ b/reader/src/main/java/io/github/dfa1/vortex/reader/array/LazyZigZagIntArray.java @@ -27,7 +27,7 @@ public int getInt(long i) { /// Bulk-decodes through [#getInt(long)] into a fresh little-endian `i32` segment. /// The decode formula lives only in [#getInt(long)]; this override exists solely to /// give the JIT a monomorphic, inlinable call site (the shared [IntArray] default is - /// megamorphic across every implementation and will not inline or auto-vectorise). + /// megamorphic across every implementation and will not inline or auto-vectorize). /// /// @param arena allocator for the output segment /// @return a little-endian `i32` segment of decoded values diff --git a/reader/src/main/java/io/github/dfa1/vortex/reader/array/LazyZigZagLongArray.java b/reader/src/main/java/io/github/dfa1/vortex/reader/array/LazyZigZagLongArray.java index ec5a7ab56..d6bae36ad 100644 --- a/reader/src/main/java/io/github/dfa1/vortex/reader/array/LazyZigZagLongArray.java +++ b/reader/src/main/java/io/github/dfa1/vortex/reader/array/LazyZigZagLongArray.java @@ -27,7 +27,7 @@ public long getLong(long i) { /// Bulk-decodes through [#getLong(long)] into a fresh little-endian `i64` segment. /// The decode formula lives only in [#getLong(long)]; this override exists solely to /// give the JIT a monomorphic, inlinable call site (the shared [LongArray] default is - /// megamorphic across every implementation and will not inline or auto-vectorise). + /// megamorphic across every implementation and will not inline or auto-vectorize). /// /// @param arena allocator for the output segment /// @return a little-endian `i64` segment of decoded values diff --git a/reader/src/main/java/io/github/dfa1/vortex/reader/array/ListArray.java b/reader/src/main/java/io/github/dfa1/vortex/reader/array/ListArray.java index 99383a486..afa05ffdb 100644 --- a/reader/src/main/java/io/github/dfa1/vortex/reader/array/ListArray.java +++ b/reader/src/main/java/io/github/dfa1/vortex/reader/array/ListArray.java @@ -74,7 +74,7 @@ public Array limited(long rows) { } /// Always throws: a list array is offsets plus a flat elements child, not a - /// single primary segment. Materialise [#elements()] and [#offsets()] separately. + /// single primary segment. Materialize [#elements()] and [#offsets()] separately. /// /// @param arena unused /// @return never returns diff --git a/reader/src/main/java/io/github/dfa1/vortex/reader/array/ListViewArray.java b/reader/src/main/java/io/github/dfa1/vortex/reader/array/ListViewArray.java index 97e28230e..8f9b5a944 100644 --- a/reader/src/main/java/io/github/dfa1/vortex/reader/array/ListViewArray.java +++ b/reader/src/main/java/io/github/dfa1/vortex/reader/array/ListViewArray.java @@ -87,7 +87,7 @@ public Array limited(long rows) { } /// Always throws: a list-view array is offsets, sizes, and a flat elements child, - /// not a single primary segment. Materialise [#elements()], [#offsets()], and + /// not a single primary segment. Materialize [#elements()], [#offsets()], and /// [#sizes()] separately. /// /// @param arena unused diff --git a/reader/src/main/java/io/github/dfa1/vortex/reader/array/LongArray.java b/reader/src/main/java/io/github/dfa1/vortex/reader/array/LongArray.java index b612aff88..dc197ea7d 100644 --- a/reader/src/main/java/io/github/dfa1/vortex/reader/array/LongArray.java +++ b/reader/src/main/java/io/github/dfa1/vortex/reader/array/LongArray.java @@ -11,7 +11,7 @@ /// [Array] for I64/U64 primitive columns. /// /// The default impl is [MaterializedLongArray], a buffer-backed record -/// returned when an encoding decoder either materialises values eagerly or +/// returned when an encoding decoder either materializes values eagerly or /// has no lazy variant of its own. public non-sealed interface LongArray extends Array { @@ -58,7 +58,7 @@ default Array limited(long rows) { /// Scalar fallback: decodes every element through [#getLong(long)] into a fresh /// little-endian segment. Buffer-backed ([MaterializedLongArray]) and lazy /// formula-based variants ([LazyForLongArray], [LazyZigZagLongArray], …) - /// override with a zero-copy or vectorised path. + /// override with a zero-copy or vectorized path. /// /// @param arena allocator for the output segment /// @return a little-endian `i64` segment of `length()` elements diff --git a/reader/src/main/java/io/github/dfa1/vortex/reader/array/MaskedArray.java b/reader/src/main/java/io/github/dfa1/vortex/reader/array/MaskedArray.java index 4f6036bf3..5e8d99734 100644 --- a/reader/src/main/java/io/github/dfa1/vortex/reader/array/MaskedArray.java +++ b/reader/src/main/java/io/github/dfa1/vortex/reader/array/MaskedArray.java @@ -66,11 +66,11 @@ public Array limited(long rows) { return new MaskedArray(truncChild, truncValidity); } - /// Materialises the inner (data) payload, ignoring the validity mask — the + /// Materializes the inner (data) payload, ignoring the validity mask — the /// segment returned is the data buffer only. Unwraps to the inner array's own - /// materialisation; callers that need validity must read [#validity()] separately. + /// materialization; callers that need validity must read [#validity()] separately. /// - /// @param arena allocator used to materialise lazy inner variants + /// @param arena allocator used to materialize lazy inner variants /// @return the inner payload's primary [MemorySegment] @Override public MemorySegment materialize(SegmentAllocator arena) { diff --git a/reader/src/main/java/io/github/dfa1/vortex/reader/array/MaterializedBoolArray.java b/reader/src/main/java/io/github/dfa1/vortex/reader/array/MaterializedBoolArray.java index 647c2fe7c..0f1cb7918 100644 --- a/reader/src/main/java/io/github/dfa1/vortex/reader/array/MaterializedBoolArray.java +++ b/reader/src/main/java/io/github/dfa1/vortex/reader/array/MaterializedBoolArray.java @@ -7,7 +7,7 @@ import java.lang.foreign.ValueLayout; /// Buffer-backed [BoolArray] — the fallback used when an encoding decoder -/// either materialises the output eagerly or has no lazy variant of its own. +/// either materializes the output eagerly or has no lazy variant of its own. public final class MaterializedBoolArray extends AbstractMaterializedArray implements BoolArray { /// Constructs a `MaterializedBoolArray` backed by the given bit-packed buffer. diff --git a/reader/src/main/java/io/github/dfa1/vortex/reader/array/MaterializedByteArray.java b/reader/src/main/java/io/github/dfa1/vortex/reader/array/MaterializedByteArray.java index ef44be37a..6b553d2c6 100644 --- a/reader/src/main/java/io/github/dfa1/vortex/reader/array/MaterializedByteArray.java +++ b/reader/src/main/java/io/github/dfa1/vortex/reader/array/MaterializedByteArray.java @@ -9,7 +9,7 @@ import java.util.function.LongBinaryOperator; /// Buffer-backed [ByteArray] — the fallback used when an encoding decoder -/// either materialises the output eagerly or has no lazy variant of its own. +/// either materializes the output eagerly or has no lazy variant of its own. public final class MaterializedByteArray extends AbstractMaterializedArray implements ByteArray { /// Constructs a `MaterializedByteArray` backed by the given buffer. diff --git a/reader/src/main/java/io/github/dfa1/vortex/reader/array/MaterializedDoubleArray.java b/reader/src/main/java/io/github/dfa1/vortex/reader/array/MaterializedDoubleArray.java index 90d5c3ce4..e404a5d1f 100644 --- a/reader/src/main/java/io/github/dfa1/vortex/reader/array/MaterializedDoubleArray.java +++ b/reader/src/main/java/io/github/dfa1/vortex/reader/array/MaterializedDoubleArray.java @@ -8,7 +8,7 @@ import java.util.function.DoubleConsumer; /// Buffer-backed [DoubleArray] — the fallback used when an encoding decoder -/// either materialises the output eagerly or has no lazy variant of its own. +/// either materializes the output eagerly or has no lazy variant of its own. public final class MaterializedDoubleArray extends AbstractMaterializedArray implements DoubleArray { /// Constructs a `MaterializedDoubleArray` backed by the given buffer. diff --git a/reader/src/main/java/io/github/dfa1/vortex/reader/array/MaterializedFloat16Array.java b/reader/src/main/java/io/github/dfa1/vortex/reader/array/MaterializedFloat16Array.java index 57cc3a904..d2f11bb98 100644 --- a/reader/src/main/java/io/github/dfa1/vortex/reader/array/MaterializedFloat16Array.java +++ b/reader/src/main/java/io/github/dfa1/vortex/reader/array/MaterializedFloat16Array.java @@ -7,7 +7,7 @@ import java.lang.foreign.MemorySegment; /// Buffer-backed [Float16Array] — the fallback used when an encoding decoder -/// either materialises the output eagerly or has no lazy variant of its own. +/// either materializes the output eagerly or has no lazy variant of its own. public final class MaterializedFloat16Array extends AbstractMaterializedArray implements Float16Array { /// Creates a new `MaterializedFloat16Array` backed by the given memory segment. diff --git a/reader/src/main/java/io/github/dfa1/vortex/reader/array/MaterializedFloatArray.java b/reader/src/main/java/io/github/dfa1/vortex/reader/array/MaterializedFloatArray.java index 22485a32c..436fc0d40 100644 --- a/reader/src/main/java/io/github/dfa1/vortex/reader/array/MaterializedFloatArray.java +++ b/reader/src/main/java/io/github/dfa1/vortex/reader/array/MaterializedFloatArray.java @@ -8,7 +8,7 @@ import java.util.function.DoubleBinaryOperator; /// Buffer-backed [FloatArray] — the fallback used when an encoding decoder -/// either materialises the output eagerly or has no lazy variant of its own. +/// either materializes the output eagerly or has no lazy variant of its own. public final class MaterializedFloatArray extends AbstractMaterializedArray implements FloatArray { /// Creates a new `MaterializedFloatArray` backed by the given memory segment. diff --git a/reader/src/main/java/io/github/dfa1/vortex/reader/array/MaterializedIntArray.java b/reader/src/main/java/io/github/dfa1/vortex/reader/array/MaterializedIntArray.java index d7827b9a9..07219bb8d 100644 --- a/reader/src/main/java/io/github/dfa1/vortex/reader/array/MaterializedIntArray.java +++ b/reader/src/main/java/io/github/dfa1/vortex/reader/array/MaterializedIntArray.java @@ -9,7 +9,7 @@ import java.util.function.IntConsumer; /// Buffer-backed [IntArray] — the fallback used when an encoding decoder -/// either materialises the output eagerly or has no lazy variant of its own. +/// either materializes the output eagerly or has no lazy variant of its own. public final class MaterializedIntArray extends AbstractMaterializedArray implements IntArray { /// Creates a new `MaterializedIntArray` backed by the given memory segment. diff --git a/reader/src/main/java/io/github/dfa1/vortex/reader/array/MaterializedLongArray.java b/reader/src/main/java/io/github/dfa1/vortex/reader/array/MaterializedLongArray.java index 0eae8c0d5..c79820ca9 100644 --- a/reader/src/main/java/io/github/dfa1/vortex/reader/array/MaterializedLongArray.java +++ b/reader/src/main/java/io/github/dfa1/vortex/reader/array/MaterializedLongArray.java @@ -9,7 +9,7 @@ import java.util.function.LongConsumer; /// Buffer-backed [LongArray] — the fallback used when an encoding decoder -/// either materialises the output eagerly or has no lazy variant of its own. +/// either materializes the output eagerly or has no lazy variant of its own. public final class MaterializedLongArray extends AbstractMaterializedArray implements LongArray { /// Creates a new `MaterializedLongArray` backed by the given memory segment. diff --git a/reader/src/main/java/io/github/dfa1/vortex/reader/array/MaterializedShortArray.java b/reader/src/main/java/io/github/dfa1/vortex/reader/array/MaterializedShortArray.java index 7e04037a7..29859cc6b 100644 --- a/reader/src/main/java/io/github/dfa1/vortex/reader/array/MaterializedShortArray.java +++ b/reader/src/main/java/io/github/dfa1/vortex/reader/array/MaterializedShortArray.java @@ -9,7 +9,7 @@ import java.util.function.LongBinaryOperator; /// Buffer-backed [ShortArray] — the fallback used when an encoding decoder -/// either materialises the output eagerly or has no lazy variant of its own. +/// either materializes the output eagerly or has no lazy variant of its own. public final class MaterializedShortArray extends AbstractMaterializedArray implements ShortArray { /// Creates a new `MaterializedShortArray` backed by the given memory segment. diff --git a/reader/src/main/java/io/github/dfa1/vortex/reader/array/RleArrays.java b/reader/src/main/java/io/github/dfa1/vortex/reader/array/RleArrays.java index 27c23e381..d8cb7d75b 100644 --- a/reader/src/main/java/io/github/dfa1/vortex/reader/array/RleArrays.java +++ b/reader/src/main/java/io/github/dfa1/vortex/reader/array/RleArrays.java @@ -57,7 +57,7 @@ interface ChunkVisitor { /// Walks the logical range `[offset, offset + length)` chunk by chunk, /// calling `visitor` once per chunk with the local `[rowInChunk, end)` span. /// - /// Centralises the FastLanes chunk-boundary arithmetic shared by every + /// Centralizes the FastLanes chunk-boundary arithmetic shared by every /// `LazyRleXxxArray` record's `forEach` / `fold`. The visitor — not this /// method — owns the per-row loop, so the typed `values[...]` read stays a /// direct, monomorphic array access; this call fires once per chunk diff --git a/reader/src/main/java/io/github/dfa1/vortex/reader/array/RunEndArrays.java b/reader/src/main/java/io/github/dfa1/vortex/reader/array/RunEndArrays.java index 9676ee355..32a8a077e 100644 --- a/reader/src/main/java/io/github/dfa1/vortex/reader/array/RunEndArrays.java +++ b/reader/src/main/java/io/github/dfa1/vortex/reader/array/RunEndArrays.java @@ -7,7 +7,7 @@ /// Package-private helpers shared by the `LazyRunEndXxxArray` records. /// -/// Centralises: +/// Centralizes: /// - the run-ends array-type switch in [#readRunEnd(Array, long)] so all four records /// agree on supported run-ends Array types (U8/U16/U32/U64 backed by /// [ByteArray]/[ShortArray]/[IntArray]/[LongArray]); diff --git a/reader/src/main/java/io/github/dfa1/vortex/reader/array/SparseArrays.java b/reader/src/main/java/io/github/dfa1/vortex/reader/array/SparseArrays.java index 9a533e3c4..a9d591a71 100644 --- a/reader/src/main/java/io/github/dfa1/vortex/reader/array/SparseArrays.java +++ b/reader/src/main/java/io/github/dfa1/vortex/reader/array/SparseArrays.java @@ -8,7 +8,7 @@ /// Package-private helpers shared by the `LazySparseXxxArray` records. /// -/// Centralises: +/// Centralizes: /// - the patch-indices array-type switch in [#readPatchIdx(Array, long)] so all six /// records agree on supported patch-index Array types (U8/U16/U32/U64 backed by /// [ByteArray]/[ShortArray]/[IntArray]/[LongArray]); diff --git a/reader/src/main/java/io/github/dfa1/vortex/reader/array/StructArray.java b/reader/src/main/java/io/github/dfa1/vortex/reader/array/StructArray.java index 9762207e7..ba5ada12e 100644 --- a/reader/src/main/java/io/github/dfa1/vortex/reader/array/StructArray.java +++ b/reader/src/main/java/io/github/dfa1/vortex/reader/array/StructArray.java @@ -73,7 +73,7 @@ public Array limited(long rows) { } /// Always throws: a struct has one segment per field, not a single primary - /// segment. Materialise each [#field(int)] separately. + /// segment. Materialize each [#field(int)] separately. /// /// @param arena unused /// @return never returns diff --git a/reader/src/main/java/io/github/dfa1/vortex/reader/array/UnknownArray.java b/reader/src/main/java/io/github/dfa1/vortex/reader/array/UnknownArray.java index 21c79ddae..dad18a171 100644 --- a/reader/src/main/java/io/github/dfa1/vortex/reader/array/UnknownArray.java +++ b/reader/src/main/java/io/github/dfa1/vortex/reader/array/UnknownArray.java @@ -15,7 +15,7 @@ /// Constructed by `Registry` when `allowUnknown()` is set and an encoding id is not /// in the registry. Data access beyond `buffer(i)` and `child(i)` is not supported. /// -/// @param encodingId the unrecognised encoding id string +/// @param encodingId the unrecognized encoding id string /// @param dtype logical type of the array /// @param length number of logical rows /// @param metadata raw encoding metadata bytes, or `null` diff --git a/reader/src/main/java/io/github/dfa1/vortex/reader/array/VarBinArray.java b/reader/src/main/java/io/github/dfa1/vortex/reader/array/VarBinArray.java index 6352f00a7..c659b123d 100644 --- a/reader/src/main/java/io/github/dfa1/vortex/reader/array/VarBinArray.java +++ b/reader/src/main/java/io/github/dfa1/vortex/reader/array/VarBinArray.java @@ -89,7 +89,7 @@ public VarBinArray limited(long rows) { MemorySegment bytesSegment(); /// Returns the concatenated raw bytes segment directly — the primary data - /// buffer is already materialised, so no copy or allocation is needed. + /// buffer is already materialized, so no copy or allocation is needed. /// Note this is the data buffer only; the per-row offsets are exposed /// separately by [OffsetMode#offsetsSegment()]. /// @@ -100,7 +100,7 @@ default MemorySegment materialize(SegmentAllocator arena) { return bytesSegment(); } - /// Returns the concatenated raw bytes segment — already materialised, no allocation. + /// Returns the concatenated raw bytes segment — already materialized, no allocation. /// /// @return the bytes [MemorySegment] @Override @@ -137,7 +137,7 @@ default Optional segmentIfPresent() { /// @return a `VarBinArray` containing the first `rows` elements VarBinArray limited(long rows); - /// Materialises any `VarBinArray` into a flat [OffsetMode]. The fast path + /// Materializes any `VarBinArray` into a flat [OffsetMode]. The fast path /// returns `src` unchanged when it is already an [OffsetMode]. Other modes /// (ViewMode in particular) walk every row through the typed accessors, copy the bytes /// into a fresh contiguous segment allocated from `arena`, and build an I64 @@ -145,7 +145,7 @@ default Optional segmentIfPresent() { /// depends on the bytes-plus-offsets shape. /// /// @param src any VarBinArray - /// @param arena allocator for the materialised bytes and offsets segments + /// @param arena allocator for the materialized bytes and offsets segments /// @return an OffsetMode view over the same logical content static OffsetMode toOffsetMode(VarBinArray src, SegmentAllocator arena) { if (src instanceof OffsetMode om) { @@ -341,7 +341,7 @@ private long dictReadOff(long i) { /// /// [#bytesSegment()] is the [MemorySegment#NULL] sentinel — chunked /// arrays have no single contiguous bytes segment. Callers that need contiguous - /// bytes must materialise via the chunked children. + /// bytes must materialize via the chunked children. /// /// @param dtype logical element type (Utf8 or Binary) /// @param length total logical row count @@ -463,10 +463,10 @@ public VarBinArray limited(long rows) { /// bytes 4-7 hold a 4-byte prefix (ignored on read), bytes 8-11 the u32 buffer /// index into `dataBufs`, and bytes 12-15 the u32 offset within that /// buffer. Per-row accessors resolve the view on demand — no concat or - /// materialisation at construction time. + /// materialization at construction time. /// /// [#bytesSegment()] returns [MemorySegment#NULL] because there is - /// no single contiguous bytes segment; callers needing one must materialise via + /// no single contiguous bytes segment; callers needing one must materialize via /// the typed accessors. /// /// @param dtype logical element type (Utf8 or Binary) diff --git a/reader/src/main/java/io/github/dfa1/vortex/reader/array/VariantArray.java b/reader/src/main/java/io/github/dfa1/vortex/reader/array/VariantArray.java index 4a89f334b..003607d21 100644 --- a/reader/src/main/java/io/github/dfa1/vortex/reader/array/VariantArray.java +++ b/reader/src/main/java/io/github/dfa1/vortex/reader/array/VariantArray.java @@ -64,7 +64,7 @@ public Array limited(long rows) { } /// Always throws: a variant array is core-storage plus optional shredded children, - /// not a single primary segment. Materialise [#coreStorage()] / [#shredded()] + /// not a single primary segment. Materialize [#coreStorage()] / [#shredded()] /// separately. /// /// @param arena unused diff --git a/reader/src/main/java/io/github/dfa1/vortex/reader/compute/FilterKernel.java b/reader/src/main/java/io/github/dfa1/vortex/reader/compute/FilterKernel.java index c266678e3..1e2829405 100644 --- a/reader/src/main/java/io/github/dfa1/vortex/reader/compute/FilterKernel.java +++ b/reader/src/main/java/io/github/dfa1/vortex/reader/compute/FilterKernel.java @@ -27,7 +27,7 @@ interface FilterKernel { /// Evaluates `predicate` over `array`, keeping only positions `current` already selects. /// - /// @param array the input array, possibly a lazy variant; never materialised whole + /// @param array the input array, possibly a lazy variant; never materialized whole /// @param current the incoming selection mask, must have the same length as `array` /// @param predicate the predicate to evaluate per position /// @param arena the arena for the output bitmap; its [Arena#allocate(long)] zero-fills, which diff --git a/reader/src/main/java/io/github/dfa1/vortex/reader/compute/Mask.java b/reader/src/main/java/io/github/dfa1/vortex/reader/compute/Mask.java index 2f6d878f7..355962267 100644 --- a/reader/src/main/java/io/github/dfa1/vortex/reader/compute/Mask.java +++ b/reader/src/main/java/io/github/dfa1/vortex/reader/compute/Mask.java @@ -10,9 +10,9 @@ /// sketched in ADR 0013 §1. /// /// A mask answers, for each logical position in `[0, length())`, whether that row is selected. -/// Kernels intersect masks across pipeline stages and downstream stages honour them (skip +/// Kernels intersect masks across pipeline stages and downstream stages honor them (skip /// excluded positions during a reduce, emit a smaller result for a take) so that nothing -/// materialises until a sink demands it. +/// materializes until a sink demands it. /// /// The four variants trade representation for the producer that creates them: /// - [Mask.AllTrue] / [Mask.AllFalse] — every position selected / rejected, allocation-free. diff --git a/reader/src/main/java/io/github/dfa1/vortex/reader/compute/PrimitiveFilter.java b/reader/src/main/java/io/github/dfa1/vortex/reader/compute/PrimitiveFilter.java index ff07e1037..821e6c48b 100644 --- a/reader/src/main/java/io/github/dfa1/vortex/reader/compute/PrimitiveFilter.java +++ b/reader/src/main/java/io/github/dfa1/vortex/reader/compute/PrimitiveFilter.java @@ -15,17 +15,17 @@ import java.lang.foreign.MemorySegment; import java.lang.foreign.ValueLayout; -/// The type-specialised fast lane of [StreamingFilterKernel]: for a primitive numeric column it runs +/// The type-specialized fast lane of [StreamingFilterKernel]: for a primitive numeric column it runs /// a monomorphic, boxing-free loop with the comparison value unboxed once outside the loop, instead /// of the generic [Values#valueAt(Array, long)] / [Compare#values(Object, Object, DType)] path that /// boxes every element to an [Object]. /// -/// The ADR 0013 §3 tier-2 contract still holds — nothing is materialised, the only allocation is the +/// The ADR 0013 §3 tier-2 contract still holds — nothing is materialized, the only allocation is the /// result bitmap from the caller's [Arena] — but the per-element decode is the typed accessor /// (`getLong`, `getInt`, `getDouble`, …) read straight into a primitive. All type, signedness, and /// predicate-operator decisions are hoisted out of the per-row loop (the hot-loop rule's -/// branch-split idiom); only types this class recognises take the fast lane, everything else returns -/// `null` so the caller falls back to the generic kernel with no behavioural drift. +/// branch-split idiom); only types this class recognizes take the fast lane, everything else returns +/// `null` so the caller falls back to the generic kernel with no behavioral drift. /// /// Two value domains cover the numerics: /// - long domain — [LongArray] / [IntArray] / [ShortArray] / [ByteArray], read and widened to a @@ -40,15 +40,15 @@ final class PrimitiveFilter { private PrimitiveFilter() { } - /// Filters `array` by `predicate` through the specialised primitive loops, or returns `null` if - /// the array or predicate is not one this class specialises (the caller then uses the generic + /// Filters `array` by `predicate` through the specialized primitive loops, or returns `null` if + /// the array or predicate is not one this class specializes (the caller then uses the generic /// kernel). /// /// @param array the array to filter, possibly a [MaskedArray] over a primitive child /// @param current the incoming selection mask, applied once over the whole predicate result /// @param predicate the predicate to evaluate /// @param arena the arena for the result bitmap; its zero-fill seeds the unselected bits to 0 - /// @return the selection mask, or `null` if the input is not specialisable + /// @return the selection mask, or `null` if the input is not specializable static Mask tryFilter(Array array, Mask current, Predicate predicate, Arena arena) { Array data; BoolArray validity; @@ -59,7 +59,7 @@ static Mask tryFilter(Array array, Mask current, Predicate predicate, Arena aren data = array; validity = null; } - if (!canSpecialise(data, predicate)) { + if (!canSpecialize(data, predicate)) { return null; } long n = array.length(); @@ -70,14 +70,14 @@ static Mask tryFilter(Array array, Mask current, Predicate predicate, Arena aren return new Mask.BitmapMask(bits, n); } - /// Reports whether the concrete array and the full predicate tree can take the specialised path: + /// Reports whether the concrete array and the full predicate tree can take the specialized path: /// the array must be a primitive long- or double-domain column and every comparison leaf must /// carry a [Number] value (so it unboxes to a primitive exactly as the generic path would). /// /// @param data the unwrapped (non-masked) array /// @param predicate the predicate to inspect - /// @return `true` if both the array and the predicate are specialisable - private static boolean canSpecialise(Array data, Predicate predicate) { + /// @return `true` if both the array and the predicate are specializable + private static boolean canSpecialize(Array data, Predicate predicate) { if (!(data.dtype() instanceof DType.Primitive)) { return false; } @@ -109,7 +109,7 @@ private static boolean predicateOk(Predicate predicate) { /// Evaluates the predicate tree into a fresh bitmap that encodes null semantics (a null position /// is excluded from a value leaf) but not the incoming mask. Composites combine child bitmaps - /// word-wise; leaves run the specialised value or validity loops. + /// word-wise; leaves run the specialized value or validity loops. /// /// @param data the unwrapped array /// @param validity the validity bitmap, or `null` when every position is valid @@ -137,7 +137,7 @@ private static MemorySegment eval(Array data, BoolArray validity, Predicate pred }; } - /// Runs a value comparison leaf: the specialised match loop fills the bitmap, then any null + /// Runs a value comparison leaf: the specialized match loop fills the bitmap, then any null /// positions are cleared (three-valued logic — a null never satisfies a value predicate). /// /// @param data the unwrapped array diff --git a/reader/src/main/java/io/github/dfa1/vortex/reader/compute/ReduceKernel.java b/reader/src/main/java/io/github/dfa1/vortex/reader/compute/ReduceKernel.java index 9845a8387..bcfd35a4e 100644 --- a/reader/src/main/java/io/github/dfa1/vortex/reader/compute/ReduceKernel.java +++ b/reader/src/main/java/io/github/dfa1/vortex/reader/compute/ReduceKernel.java @@ -5,7 +5,7 @@ /// A kernel that folds the selected positions of an [Array] into a single result, the reduce /// signature sketched in ADR 0013 §2. /// -/// The kernel honours the current mask — only positions it selects contribute — and skips null +/// The kernel honors the current mask — only positions it selects contribute — and skips null /// values, mirroring the Rust reduction semantics: a sum or count over zero selected non-null rows /// is the identity, a min or max over zero selected non-null rows is absent. /// @@ -18,7 +18,7 @@ interface ReduceKernel { /// Folds the positions of `array` that `current` selects into a single result. /// - /// @param array the input array, possibly a lazy variant; never materialised whole + /// @param array the input array, possibly a lazy variant; never materialized whole /// @param current the selection mask, must have the same length as `array` /// @return the reduction result R apply(Array array, Mask current); diff --git a/reader/src/main/java/io/github/dfa1/vortex/reader/compute/Reductions.java b/reader/src/main/java/io/github/dfa1/vortex/reader/compute/Reductions.java index 01aafb1ed..4232b9480 100644 --- a/reader/src/main/java/io/github/dfa1/vortex/reader/compute/Reductions.java +++ b/reader/src/main/java/io/github/dfa1/vortex/reader/compute/Reductions.java @@ -14,9 +14,9 @@ /// The concrete streaming [ReduceKernel] instances — `SUM`, `COUNT`, `MIN`, `MAX` — of ADR 0013 §3. /// -/// Each reduction has a type-specialised fast lane (a boxing-free primitive accumulation loop) and a +/// Each reduction has a type-specialized fast lane (a boxing-free primitive accumulation loop) and a /// generic boxing fallback (`*Generic`) for the types the fast lane does not cover (Decimal, Utf8, -/// `f16`, …). The specialised lane unwraps a [MaskedArray] once, hoists the column's signedness and +/// `f16`, …). The specialized lane unwraps a [MaskedArray] once, hoists the column's signedness and /// the validity / all-true-mask decisions out of the per-row loop (the hot-loop rule's branch-split /// idiom), and reads each element through the concrete typed accessor — no [Object] box, no /// [Comparable#compareTo(Object)], no autobox. The result is boxed once at the end. @@ -45,7 +45,7 @@ final class Reductions { private Reductions() { } - /// Sums the selected non-null values, taking the specialised primitive lane when possible. + /// Sums the selected non-null values, taking the specialized primitive lane when possible. /// /// @param array the array to fold /// @param current the selection mask @@ -62,7 +62,7 @@ private static Number sum(Array array, Mask current) { if (isLongDomain(data)) { return sumLongDomain(data, data.dtype().isUnsigned(), validity, current, n); } - // f16 and any other numeric primitive without a specialised accessor stay on the boxing path. + // f16 and any other numeric primitive without a specialized accessor stay on the boxing path. return sumGeneric(array, current); } @@ -89,7 +89,7 @@ private static Long count(Array array, Mask current) { return count; } - /// Finds the extreme selected non-null value, taking the specialised primitive lane when possible. + /// Finds the extreme selected non-null value, taking the specialized primitive lane when possible. /// /// @param array the array to fold /// @param current the selection mask @@ -109,7 +109,7 @@ private static Object extremum(Array array, Mask current, boolean min) { } // ---------------------------------------------------------------------------------------------- - // Specialised long-domain lanes. + // Specialized long-domain lanes. // ---------------------------------------------------------------------------------------------- /// Sums a long-domain column. The hot path (no nulls, all-true mask) is a tight monomorphic loop; @@ -210,7 +210,7 @@ private static long widenLong(Array data, boolean unsigned, long i) { } // ---------------------------------------------------------------------------------------------- - // Specialised double-domain lanes. + // Specialized double-domain lanes. // ---------------------------------------------------------------------------------------------- /// Sums a double-domain column into a [Double], skipping nulls and unselected positions. @@ -305,7 +305,7 @@ private static boolean included(boolean noNull, BoolArray validity, boolean allT // ---------------------------------------------------------------------------------------------- /// Folds the selected non-null values into a sum through the boxing accessor — the fallback for - /// types without a specialised lane and the oracle the fast lane is tested against. + /// types without a specialized lane and the oracle the fast lane is tested against. /// /// @param array the array to fold /// @param current the selection mask @@ -349,7 +349,7 @@ static Long countGeneric(Array array, Mask current) { } /// Finds the extreme selected non-null value through the boxing accessor — the fallback for types - /// without a specialised lane and the oracle the fast lane is tested against. + /// without a specialized lane and the oracle the fast lane is tested against. /// /// @param array the array to fold /// @param current the selection mask @@ -385,7 +385,7 @@ static Object extremumGeneric(Array array, Mask current, boolean min) { /// /// @param array the array to unwrap /// @return the underlying value array - private static Array unwrap(Array array) { + static Array unwrap(Array array) { return array instanceof MaskedArray masked ? masked.inner() : array; } @@ -393,15 +393,15 @@ private static Array unwrap(Array array) { /// /// @param array the array to inspect /// @return the validity bitmap, or `null` - private static BoolArray validityOf(Array array) { + static BoolArray validityOf(Array array) { return array instanceof MaskedArray masked ? masked.validity() : null; } - /// Reports whether the unwrapped array is a specialised long-domain primitive. + /// Reports whether the unwrapped array is a specialized long-domain primitive. /// /// @param data the unwrapped array /// @return `true` for [LongArray] / [IntArray] / [ShortArray] / [ByteArray] - private static boolean isLongDomain(Array data) { + static boolean isLongDomain(Array data) { return data instanceof LongArray || data instanceof IntArray || data instanceof ShortArray || data instanceof ByteArray; } @@ -415,7 +415,7 @@ private static boolean isLongDomain(Array data) { /// /// @param dtype the column dtype to validate /// @throws VortexException if `dtype` is not a numeric primitive column - private static void requireNumeric(DType dtype) { + static void requireNumeric(DType dtype) { if (!(dtype instanceof DType.Primitive)) { throw new VortexException("compute: SUM is not supported on a non-numeric column of dtype " + dtype); @@ -426,7 +426,7 @@ private static void requireNumeric(DType dtype) { /// /// @param array the array to inspect /// @return `true` if the column is a floating-point primitive - private static boolean isFloating(Array array) { + static boolean isFloating(Array array) { return array.dtype() instanceof DType.Primitive prim && prim.ptype().isFloating(); } diff --git a/reader/src/main/java/io/github/dfa1/vortex/reader/compute/StreamingFilterKernel.java b/reader/src/main/java/io/github/dfa1/vortex/reader/compute/StreamingFilterKernel.java index 8d7a3c55c..26747a7bf 100644 --- a/reader/src/main/java/io/github/dfa1/vortex/reader/compute/StreamingFilterKernel.java +++ b/reader/src/main/java/io/github/dfa1/vortex/reader/compute/StreamingFilterKernel.java @@ -9,12 +9,12 @@ import java.util.Objects; /// The streaming [FilterKernel] of ADR 0013 §3. A primitive numeric column takes the boxing-free, -/// type-specialised fast lane in [PrimitiveFilter]; every other [Array] falls back to the generic +/// type-specialized fast lane in [PrimitiveFilter]; every other [Array] falls back to the generic /// tier-2 path here, which works for any array through the per-element boxing accessor (no -/// encoded-domain specialisation — that is the deferred performance escalation). The two paths are -/// behaviourally identical, so the fast lane is a pure performance choice gated by a type check. +/// encoded-domain specialization — that is the deferred performance escalation). The two paths are +/// behaviorally identical, so the fast lane is a pure performance choice gated by a type check. /// -/// The kernel honours the incoming mask (an excluded position is never evaluated) and mirrors the +/// The kernel honors the incoming mask (an excluded position is never evaluated) and mirrors the /// Rust three-valued-logic filter semantics: a null value makes every value predicate false, so the /// row is excluded; the dedicated null tests select on validity directly. The output is a /// [Mask.BitmapMask] of `array.length()` bits, allocated off-heap from the caller's [Arena] — the @@ -36,18 +36,18 @@ public Mask apply(Array array, Mask current, Predicate predicate, Arena arena) { // Nothing can be selected — return an allocation-free all-false of the same length. return Mask.allFalse(n); } - // Fast lane: a primitive numeric column runs the boxing-free, type-specialised loops. A null - // return means the input is not specialisable (Decimal, Utf8, an unhandled type, or a + // Fast lane: a primitive numeric column runs the boxing-free, type-specialized loops. A null + // return means the input is not specializable (Decimal, Utf8, an unhandled type, or a // non-numeric predicate value), so the generic per-element path below takes over. - Mask specialised = PrimitiveFilter.tryFilter(array, current, predicate, arena); - if (specialised != null) { - return specialised; + Mask specialized = PrimitiveFilter.tryFilter(array, current, predicate, arena); + if (specialized != null) { + return specialized; } return applyGeneric(array, current, predicate, arena, n); } - /// Runs the generic boxing baseline directly, the oracle the specialised fast lane must match. - /// Package-private so an equivalence test can assert `specialised == generic` against it. + /// Runs the generic boxing baseline directly, the oracle the specialized fast lane must match. + /// Package-private so an equivalence test can assert `specialized == generic` against it. /// /// @param array the array to filter /// @param current the incoming selection mask, already validated to match `array`'s length @@ -85,7 +85,7 @@ private Mask applyGeneric(Array array, Mask current, Predicate predicate, Arena /// @param i the zero-based position /// @param predicate the predicate to evaluate /// @return `true` if the value at `i` satisfies `predicate` - private static boolean evaluate(Array array, long i, Predicate predicate) { + static boolean evaluate(Array array, long i, Predicate predicate) { return switch (predicate) { case Predicate.Eq eq -> !Values.isNullAt(array, i) && Compare.values(Values.valueAt(array, i), eq.value(), array.dtype()) == 0; diff --git a/reader/src/main/java/io/github/dfa1/vortex/reader/compute/Values.java b/reader/src/main/java/io/github/dfa1/vortex/reader/compute/Values.java index 0756548b0..030527961 100644 --- a/reader/src/main/java/io/github/dfa1/vortex/reader/compute/Values.java +++ b/reader/src/main/java/io/github/dfa1/vortex/reader/compute/Values.java @@ -16,11 +16,11 @@ /// The single generic value-access path shared by every streaming kernel: one element read and one /// null test that work for *any* [Array] via its typed accessor, the matrix-explosion mitigation -/// ADR 0013 prescribes (one generic path over the encodings, specialise the hot ones later). +/// ADR 0013 prescribes (one generic path over the encodings, specialize the hot ones later). /// /// Reads decode through the per-type accessor a single element at a time — the tier-2 streaming /// contract of ADR 0013 §3. No method here ever calls [Array#materialize(java.lang.foreign.SegmentAllocator)]: -/// a full-buffer materialise is the forbidden tier-3 last resort and the kernels do not need it. +/// a full-buffer materialize is the forbidden tier-3 last resort and the kernels do not need it. final class Values { private Values() { @@ -54,7 +54,7 @@ static Object valueAt(Array array, long i) { Array data = array instanceof MaskedArray masked ? masked.inner() : array; // NOTE: this boxing path is the correctness baseline of ADR 0013 §3 tier 2 — one generic // accessor read that works for every encoding. It deliberately violates the hot-loop - // "no per-element boxing" ideal; encoded-domain specialisation of the hot encodings (ALP, + // "no per-element boxing" ideal; encoded-domain specialization of the hot encodings (ALP, // FoR, BitPacked, Dict) is the deferred performance escalation, not part of this step. return switch (data) { case LongArray la -> la.getLong(i); diff --git a/reader/src/main/java/io/github/dfa1/vortex/reader/decode/BitpackedEncodingDecoder.java b/reader/src/main/java/io/github/dfa1/vortex/reader/decode/BitpackedEncodingDecoder.java index 10cb18e06..130d01476 100644 --- a/reader/src/main/java/io/github/dfa1/vortex/reader/decode/BitpackedEncodingDecoder.java +++ b/reader/src/main/java/io/github/dfa1/vortex/reader/decode/BitpackedEncodingDecoder.java @@ -33,7 +33,7 @@ public EncodingId encodingId() { @Override public Array decode(DecodeContext ctx) { MemorySegment rawMeta = ctx.metadata(); - // proto3 elides default-valued fields, so ProtoBitPackedMetadata(0, 0, null) serialises + // proto3 elides default-valued fields, so ProtoBitPackedMetadata(0, 0, null) serializes // to a 0-byte payload and the writer skips the empty vector. Treat absent metadata // as all-defaults rather than rejecting — happens when bit_width=0 (constant // residuals nested under FoR / RLE). @@ -90,8 +90,8 @@ private static void fastlanesUnpackToSeg( /// Per-row unpack schedule for one FastLanes block, precomputed once per decode call. Every /// array is indexed by `row` in `[0, typeBits)`. This setup is identical for the 8/16/32/64-bit - /// unpackers, so it lives here; the per-element unpack loops stay specialised per width because - /// their typed `ValueLayout` access must constant-fold for the JIT to vectorise them. + /// unpackers, so it lives here; the per-element unpack loops stay specialized per width because + /// their typed `ValueLayout` access must constant-fold for the JIT to vectorize them. /// /// @param shifts low-bit shift to apply to the current word, per row /// @param remainingBits bits spilling into the next word (0 when the value fits one word) diff --git a/reader/src/main/java/io/github/dfa1/vortex/reader/decode/ChunkedEncodingDecoder.java b/reader/src/main/java/io/github/dfa1/vortex/reader/decode/ChunkedEncodingDecoder.java index 471716289..defa94e75 100644 --- a/reader/src/main/java/io/github/dfa1/vortex/reader/decode/ChunkedEncodingDecoder.java +++ b/reader/src/main/java/io/github/dfa1/vortex/reader/decode/ChunkedEncodingDecoder.java @@ -67,7 +67,7 @@ private static long[] readOffsets(DecodeContext ctx, int nchunks) { /// Wraps the decoded chunk children in a zero-copy view: a `ChunkedXxxArray` /// for primitives and Bool, a [StructArray] of per-field chunked views for - /// [DType.Struct]. No concat / no per-row materialise. + /// [DType.Struct]. No concat / no per-row materialize. private static Array wrap(List chunks, DType dtype, long totalRows) { if (dtype instanceof DType.Primitive pt) { return wrapPrimitive(chunks, pt, dtype, totalRows); @@ -79,7 +79,7 @@ private static Array wrap(List chunks, DType dtype, long totalRows) { return wrapStruct(chunks, struct, totalRows); } if (dtype instanceof DType.Variant) { - // Each chunk decoded as Variant materialises to its inner-typed constant array + // Each chunk decoded as Variant materializes to its inner-typed constant array // (see ConstantEncodingDecoder). Wrap the chunks under that inner dtype; the // VariantArray container re-applies the logical Variant dtype. if (chunks.isEmpty()) { diff --git a/reader/src/main/java/io/github/dfa1/vortex/reader/decode/ConstantEncodingDecoder.java b/reader/src/main/java/io/github/dfa1/vortex/reader/decode/ConstantEncodingDecoder.java index bfdc911c4..f071e7413 100644 --- a/reader/src/main/java/io/github/dfa1/vortex/reader/decode/ConstantEncodingDecoder.java +++ b/reader/src/main/java/io/github/dfa1/vortex/reader/decode/ConstantEncodingDecoder.java @@ -79,7 +79,7 @@ private static Array arrayFromScalar(DecodeContext ctx, ProtoScalarValue scalar, throw new VortexException(EncodingId.VORTEX_CONSTANT, "constant extension storage must be primitive, got " + ext.storageDType()); } - // Build the constant storage directly under the Extension dtype. No materialisation: + // Build the constant storage directly under the Extension dtype. No materialization: // extension consumers read storage through its family typed-getter (see // ExtensionStorage.epochInteger), so a lazy constant array works and stays O(1). return constantPrimitive(dtype, sp.ptype(), scalar, n); @@ -91,7 +91,7 @@ private static Array arrayFromScalar(DecodeContext ctx, ProtoScalarValue scalar, } /// Builds a metadata-only constant primitive array carrying `outDtype` — used for both - /// bare primitive constants and extension constants (whose primitive storage is relabelled + /// bare primitive constants and extension constants (whose primitive storage is relabeled /// with the extension's logical dtype). O(1): no buffer is allocated. /// /// @param outDtype logical dtype the returned array reports diff --git a/reader/src/main/java/io/github/dfa1/vortex/reader/decode/DateTimePartsEncodingDecoder.java b/reader/src/main/java/io/github/dfa1/vortex/reader/decode/DateTimePartsEncodingDecoder.java index 7cba397e6..6cad44a1b 100644 --- a/reader/src/main/java/io/github/dfa1/vortex/reader/decode/DateTimePartsEncodingDecoder.java +++ b/reader/src/main/java/io/github/dfa1/vortex/reader/decode/DateTimePartsEncodingDecoder.java @@ -16,7 +16,7 @@ /// /// Reassembles the three children (days, seconds, subseconds) into a /// [LazyDateTimePartsLongArray] of epoch counts in the extension's -/// [TimeUnit]. No per-row materialisation happens at decode time — +/// [TimeUnit]. No per-row materialization happens at decode time — /// the downstream extension decoder reads the reassembled long via the /// lazy `getLong` accessor. public final class DateTimePartsEncodingDecoder implements EncodingDecoder { diff --git a/reader/src/main/java/io/github/dfa1/vortex/reader/decode/DecodeContext.java b/reader/src/main/java/io/github/dfa1/vortex/reader/decode/DecodeContext.java index 6adf12346..dbcd482ec 100644 --- a/reader/src/main/java/io/github/dfa1/vortex/reader/decode/DecodeContext.java +++ b/reader/src/main/java/io/github/dfa1/vortex/reader/decode/DecodeContext.java @@ -77,14 +77,14 @@ public MemorySegment decodeChildSegment(int i, DType dtype, long rowCount) { return registry.decodeAsSegment(childCtx); } - /// Materialises an already-decoded array into a flat primary segment, allocating lazy + /// Materializes an already-decoded array into a flat primary segment, allocating lazy /// variants from this context's arena. /// /// Use when a decoder already holds a decoded child — e.g. after unwrapping a /// [io.github.dfa1.vortex.reader.array.MaskedArray] for its validity — and needs the /// raw buffer for a bulk read, rather than re-decoding via [#decodeChildSegment(int)]. /// - /// @param arr the decoded array to materialise + /// @param arr the decoded array to materialize /// @return the array's primary [MemorySegment] public MemorySegment materialize(Array arr) { return arr.materialize(arena); diff --git a/reader/src/main/java/io/github/dfa1/vortex/reader/decode/DictEncodingDecoder.java b/reader/src/main/java/io/github/dfa1/vortex/reader/decode/DictEncodingDecoder.java index b78c84f46..43ac689e4 100644 --- a/reader/src/main/java/io/github/dfa1/vortex/reader/decode/DictEncodingDecoder.java +++ b/reader/src/main/java/io/github/dfa1/vortex/reader/decode/DictEncodingDecoder.java @@ -25,7 +25,7 @@ /// eagerly expand `codes` and `values` into a contiguous output segment /// via `expandU8/U16/U32` — these mirror the broadcast-aware scatter loop /// with `SegmentBroadcast.capacity` (ConstantEncoding fan-out), so the -/// output is materialised at decode time. ADR 0012's lazy-dict scope is the +/// output is materialized at decode time. ADR 0012's lazy-dict scope is the /// layout-level path in `ScanIterator.decodeDictLayout`, which is now lazy /// via `DictXxxArray`; this encoding-level path runs only when a parent /// decoder explicitly calls `decodeChild` on a `vortex.dict` segment, diff --git a/reader/src/main/java/io/github/dfa1/vortex/reader/decode/UnknownArrayNode.java b/reader/src/main/java/io/github/dfa1/vortex/reader/decode/UnknownArrayNode.java index f1c6f4476..2804d04c4 100644 --- a/reader/src/main/java/io/github/dfa1/vortex/reader/decode/UnknownArrayNode.java +++ b/reader/src/main/java/io/github/dfa1/vortex/reader/decode/UnknownArrayNode.java @@ -2,7 +2,7 @@ import java.lang.foreign.MemorySegment; -/// Array node whose encoding id is not a recognised [io.github.dfa1.vortex.core.model.EncodingId]. +/// Array node whose encoding id is not a recognized [io.github.dfa1.vortex.core.model.EncodingId]. /// Produced when a file uses an encoding this build does not know about. Decoded as /// [io.github.dfa1.vortex.reader.array.UnknownArray] when /// [io.github.dfa1.vortex.reader.ReadRegistry#isAllowUnknown()] is set; otherwise the decode call throws. diff --git a/reader/src/test/java/io/github/dfa1/vortex/reader/FlatSegmentDecoderDecodeTest.java b/reader/src/test/java/io/github/dfa1/vortex/reader/FlatSegmentDecoderDecodeTest.java index 126f79f28..1aa40de1f 100644 --- a/reader/src/test/java/io/github/dfa1/vortex/reader/FlatSegmentDecoderDecodeTest.java +++ b/reader/src/test/java/io/github/dfa1/vortex/reader/FlatSegmentDecoderDecodeTest.java @@ -32,7 +32,7 @@ void decode_unknownEncodingWithBufferPadding_returnsUnknownArray() { try (Arena arena = Arena.ofConfined()) { // Given — a flat segment whose single buffer carries 8 bytes of leading padding and - // an unrecognised encoding id. The 8 pad bytes are the entire buffer-data region + // an unrecognized encoding id. The 8 pad bytes are the entire buffer-data region // (buffer length 0), so the decoder must advance dataOffset by +8 to slice it; a // subtraction would slice at offset -8. int padding = 8; diff --git a/reader/src/test/java/io/github/dfa1/vortex/reader/LayoutDepthBombSecurityTest.java b/reader/src/test/java/io/github/dfa1/vortex/reader/LayoutDepthBombSecurityTest.java index 1773c453d..f36b14bb0 100644 --- a/reader/src/test/java/io/github/dfa1/vortex/reader/LayoutDepthBombSecurityTest.java +++ b/reader/src/test/java/io/github/dfa1/vortex/reader/LayoutDepthBombSecurityTest.java @@ -24,7 +24,7 @@ * Adversarial tests for the layout-tree recursion in * [PostscriptParser]'s `convertLayout`. * - *

The reader walks the layout tree recursively when materialising a file's + *

The reader walks the layout tree recursively when materializing a file's * `FbsLayout` object. Without a depth cap a crafted file with thousands of * nested children produces a [StackOverflowError] during `VortexReader.open`, * breaking the contract that every malformed input must surface as a [VortexException]. diff --git a/reader/src/test/java/io/github/dfa1/vortex/reader/LayoutKindTest.java b/reader/src/test/java/io/github/dfa1/vortex/reader/LayoutKindTest.java index 82901d75a..a144d68fd 100644 --- a/reader/src/test/java/io/github/dfa1/vortex/reader/LayoutKindTest.java +++ b/reader/src/test/java/io/github/dfa1/vortex/reader/LayoutKindTest.java @@ -16,7 +16,7 @@ /// which fixes each method's return to its `encodingId` rather than a constant. class LayoutKindTest { - /// Each layout kind paired with its encoding id and the predicate that should recognise it. + /// Each layout kind paired with its encoding id and the predicate that should recognize it. private enum Kind { FLAT(Layout.FLAT, Layout::isFlat), CHUNKED(Layout.CHUNKED, Layout::isChunked), @@ -39,7 +39,7 @@ private static Layout layout(String encodingId) { @ParameterizedTest @EnumSource(Kind.class) - void predicate_recognisesOnlyItsOwnEncodingId(Kind kind) { + void predicate_recognizesOnlyItsOwnEncodingId(Kind kind) { // Given — a layout carrying this kind's encoding id Layout sut = layout(kind.encodingId); diff --git a/reader/src/test/java/io/github/dfa1/vortex/reader/MalformedFiles.java b/reader/src/test/java/io/github/dfa1/vortex/reader/MalformedFiles.java index b39e65465..4e44b0fef 100644 --- a/reader/src/test/java/io/github/dfa1/vortex/reader/MalformedFiles.java +++ b/reader/src/test/java/io/github/dfa1/vortex/reader/MalformedFiles.java @@ -20,7 +20,7 @@ /// Shared FlatBuffer builders for hand-constructing (well-formed and malformed) /// Vortex metadata blobs in the reader security tests. Each returns a sliced /// little-endian [ByteBuffer] positioned at the finished root, ready to splice -/// into a file body. Centralised here so the bounds/depth/zip-bomb suites build +/// into a file body. Centralized here so the bounds/depth/zip-bomb suites build /// their fixtures the same way instead of each copy-pasting the kit. final class MalformedFiles { diff --git a/reader/src/test/java/io/github/dfa1/vortex/reader/array/DictRecordSmokeTest.java b/reader/src/test/java/io/github/dfa1/vortex/reader/array/DictRecordSmokeTest.java index 1756871e0..5164ff5d2 100644 --- a/reader/src/test/java/io/github/dfa1/vortex/reader/array/DictRecordSmokeTest.java +++ b/reader/src/test/java/io/github/dfa1/vortex/reader/array/DictRecordSmokeTest.java @@ -130,7 +130,7 @@ void u64CodesDispatch() { } @Test - void forEachLongMaterialisesInOrder() { + void forEachLongMaterializesInOrder() { try (Arena arena = Arena.ofConfined()) { LongArray values = longArray(arena, 100L, 200L); ByteArray codes = byteArray(arena, (byte) 1, (byte) 0, (byte) 1); @@ -226,7 +226,7 @@ void u8CodesDispatch() { } @Test - void forEachIntMaterialisesInOrder() { + void forEachIntMaterializesInOrder() { try (Arena arena = Arena.ofConfined()) { IntArray values = intArray(arena, I32, 10, 20); ByteArray codes = byteArray(arena, (byte) 1, (byte) 0, (byte) 1); @@ -314,7 +314,7 @@ void u8CodesDispatch() { } @Test - void forEachDoubleMaterialisesInOrder() { + void forEachDoubleMaterializesInOrder() { try (Arena arena = Arena.ofConfined()) { DoubleArray values = doubleArray(arena, 1.5, 2.5); ByteArray codes = byteArray(arena, (byte) 0, (byte) 1, (byte) 0); diff --git a/reader/src/test/java/io/github/dfa1/vortex/reader/array/LazyDecimalBytePartsArrayTest.java b/reader/src/test/java/io/github/dfa1/vortex/reader/array/LazyDecimalBytePartsArrayTest.java index f87636815..efe06382c 100644 --- a/reader/src/test/java/io/github/dfa1/vortex/reader/array/LazyDecimalBytePartsArrayTest.java +++ b/reader/src/test/java/io/github/dfa1/vortex/reader/array/LazyDecimalBytePartsArrayTest.java @@ -60,7 +60,7 @@ void getDecimal_widensFromNarrowerChildPtype() { @Test void getDecimal_nullCellInMaskedChild_throws() { // Given — nullable decimal columns wrap the mantissa child in MaskedArray. - // Without honouring the validity bitmap, getDecimal would happily return + // Without honoring the validity bitmap, getDecimal would happily return // a garbage BigDecimal for a row whose mantissa bytes are undefined. try (Arena arena = Arena.ofConfined()) { MemorySegment mspBuf = arena.allocate(16); diff --git a/reader/src/test/java/io/github/dfa1/vortex/reader/array/TestArrays.java b/reader/src/test/java/io/github/dfa1/vortex/reader/array/TestArrays.java index c05b9753c..1b616355e 100644 --- a/reader/src/test/java/io/github/dfa1/vortex/reader/array/TestArrays.java +++ b/reader/src/test/java/io/github/dfa1/vortex/reader/array/TestArrays.java @@ -12,7 +12,7 @@ /// allocates a little-endian, read-only segment from an auto [Arena] (GC-managed — /// test data is short-lived, so callers need no try-with-resources) with the /// default signed dtype (from [DTypes]) for its width. Tests that need an unsigned -/// dtype (U8/U16) for widening behaviour build the array inline instead. +/// dtype (U8/U16) for widening behavior build the array inline instead. public final class TestArrays { private TestArrays() { diff --git a/reader/src/test/java/io/github/dfa1/vortex/reader/compute/ComputeEquivalenceTest.java b/reader/src/test/java/io/github/dfa1/vortex/reader/compute/ComputeEquivalenceTest.java index 3874fb1e7..38288a877 100644 --- a/reader/src/test/java/io/github/dfa1/vortex/reader/compute/ComputeEquivalenceTest.java +++ b/reader/src/test/java/io/github/dfa1/vortex/reader/compute/ComputeEquivalenceTest.java @@ -24,12 +24,12 @@ import static org.assertj.core.api.Assertions.assertThat; -/// Randomised oracle test: the type-specialised fast lane of [StreamingFilterKernel] / +/// Randomised oracle test: the type-specialized fast lane of [StreamingFilterKernel] / /// [Reductions] must be bit-identical to the generic boxing path for every primitive input. For each /// seeded scenario it builds a random primitive column (each integer width signed and unsigned, /// `f32` / `f64`, with and without nulls), a random incoming mask, and a random predicate, then -/// asserts the specialised `filter` / `sum` / `count` / `min` / `max` results equal what the generic -/// path produces. Catching any divergence here guards against behavioural drift, the one thing the +/// asserts the specialized `filter` / `sum` / `count` / `min` / `max` results equal what the generic +/// path produces. Catching any divergence here guards against behavioral drift, the one thing the /// fast lane must never introduce. class ComputeEquivalenceTest { @@ -48,14 +48,14 @@ private static Stream seeds() { @ParameterizedTest @MethodSource("seeds") - void specialisedMatchesGeneric(int seed) { + void specializedMatchesGeneric(int seed) { // Given a randomly shaped primitive column, incoming mask and predicate for this seed Random random = new Random(seed); Array array = randomColumn(random); Mask incoming = randomMask(random, array.length()); Predicate predicate = randomPredicate(random, array, 0); - // When the specialised fast lane and the generic oracle both run + // When the specialized fast lane and the generic oracle both run // Then filter and every reduction agree, position by position and value for value assertFilterAgrees(array, incoming, predicate); assertReductionsAgree(array, incoming); @@ -98,15 +98,15 @@ void nanAndInfinitiesMatchGeneric() { } private void assertFilterAgrees(Array array, Mask incoming, Predicate predicate) { - Mask specialised = filterKernel.apply(array, incoming, predicate, ARENA); + Mask specialized = filterKernel.apply(array, incoming, predicate, ARENA); Mask generic = filterKernel.applyGeneric(array, incoming, predicate, ARENA); long n = array.length(); for (long i = 0; i < n; i++) { - assertThat(specialised.get(i)) + assertThat(specialized.get(i)) .as("position %d for predicate %s on %s", i, predicate, array.dtype()) .isEqualTo(generic.get(i)); } - assertThat(specialised.trueCount()).isEqualTo(generic.trueCount()); + assertThat(specialized.trueCount()).isEqualTo(generic.trueCount()); } private void assertReductionsAgree(Array array, Mask mask) { diff --git a/reader/src/test/java/io/github/dfa1/vortex/reader/compute/PredicateTest.java b/reader/src/test/java/io/github/dfa1/vortex/reader/compute/PredicateTest.java index f61043265..3840e9056 100644 --- a/reader/src/test/java/io/github/dfa1/vortex/reader/compute/PredicateTest.java +++ b/reader/src/test/java/io/github/dfa1/vortex/reader/compute/PredicateTest.java @@ -342,7 +342,7 @@ void exhaustivePatternSwitchCoversEveryVariant() { new Predicate.Or(new Predicate.IsNull(), new Predicate.IsNotNull()) }; - // When labelling each variant through an exhaustive, default-free pattern switch + // When labeling each variant through an exhaustive, default-free pattern switch String result = label(all[0]); // Then every variant resolves to a distinct label diff --git a/reader/src/test/java/io/github/dfa1/vortex/reader/compute/ReduceKernelTest.java b/reader/src/test/java/io/github/dfa1/vortex/reader/compute/ReduceKernelTest.java index a3164f251..a393101f2 100644 --- a/reader/src/test/java/io/github/dfa1/vortex/reader/compute/ReduceKernelTest.java +++ b/reader/src/test/java/io/github/dfa1/vortex/reader/compute/ReduceKernelTest.java @@ -51,7 +51,7 @@ void minAndMaxReturnExtremes() { } @Test - void reductionsHonourMaskedSubset() { + void reductionsHonorMaskedSubset() { // Given a long array and a range mask selecting only indices 1 and 2 Array array = ComputeArrays.longArray(ARENA, 10, 20, 30, 40); Mask subset = new Mask.RangeMask(1, 3, array.length()); @@ -143,7 +143,7 @@ void unsignedColumnUsesUnsignedSemantics() { Object min = Reductions.MIN.apply(array, all); Object max = Reductions.MAX.apply(array, all); - // Then MIN/MAX honour unsigned order and SUM wraps in two's complement like ZoneReducer + // Then MIN/MAX honor unsigned order and SUM wraps in two's complement like ZoneReducer assertThat(min).isEqualTo(1L); assertThat(max).isEqualTo(highBit); assertThat(sum).isEqualTo(1L + highBit + 5L); diff --git a/reader/src/test/java/io/github/dfa1/vortex/reader/compute/StreamingFilterKernelTest.java b/reader/src/test/java/io/github/dfa1/vortex/reader/compute/StreamingFilterKernelTest.java index 4b873d43c..41d490ecc 100644 --- a/reader/src/test/java/io/github/dfa1/vortex/reader/compute/StreamingFilterKernelTest.java +++ b/reader/src/test/java/io/github/dfa1/vortex/reader/compute/StreamingFilterKernelTest.java @@ -203,33 +203,33 @@ void noPositionsSelected() { } @Test - void dictEncodingAlignsWithMaterialisedEquivalent() { - // Given a dict array [10,20,30,10,20] and its materialised twin — the ADR risk-note check + void dictEncodingAlignsWithMaterializedEquivalent() { + // Given a dict array [10,20,30,10,20] and its materialized twin — the ADR risk-note check // that a mask stays positionally aligned under a cascaded encoding Array dict = ComputeArrays.dictLongArray(ARENA, new long[]{10, 20, 30}, new int[]{0, 1, 2, 0, 1}); - Array materialised = ComputeArrays.longArray(ARENA, 10, 20, 30, 10, 20); + Array materialized = ComputeArrays.longArray(ARENA, 10, 20, 30, 10, 20); // When the same predicate runs over both representations Mask dictMask = filter(dict, new Predicate.Gt(15L)); - Mask materialisedMask = filter(materialised, new Predicate.Gt(15L)); + Mask materializedMask = filter(materialized, new Predicate.Gt(15L)); // Then the masks are identical - assertThat(dictMask).isEqualTo(materialisedMask); + assertThat(dictMask).isEqualTo(materializedMask); assertThat(selected(dictMask)).containsExactly(1L, 2L, 4L); } @Test - void chunkedEncodingAlignsWithMaterialisedEquivalent() { - // Given a chunked array [[1,2],[3,4,5]] and its materialised twin + void chunkedEncodingAlignsWithMaterializedEquivalent() { + // Given a chunked array [[1,2],[3,4,5]] and its materialized twin Array chunked = ComputeArrays.chunkedLongArray(ARENA, new long[]{1, 2}, new long[]{3, 4, 5}); - Array materialised = ComputeArrays.longArray(ARENA, 1, 2, 3, 4, 5); + Array materialized = ComputeArrays.longArray(ARENA, 1, 2, 3, 4, 5); // When the same predicate runs over both representations Mask chunkedMask = filter(chunked, new Predicate.Gt(2L)); - Mask materialisedMask = filter(materialised, new Predicate.Gt(2L)); + Mask materializedMask = filter(materialized, new Predicate.Gt(2L)); // Then the masks are identical across the chunk boundary - assertThat(chunkedMask).isEqualTo(materialisedMask); + assertThat(chunkedMask).isEqualTo(materializedMask); assertThat(selected(chunkedMask)).containsExactly(2L, 3L, 4L); } diff --git a/writer/src/main/java/io/github/dfa1/vortex/writer/VortexWriter.java b/writer/src/main/java/io/github/dfa1/vortex/writer/VortexWriter.java index 0348cbc2d..3e05ff684 100644 --- a/writer/src/main/java/io/github/dfa1/vortex/writer/VortexWriter.java +++ b/writer/src/main/java/io/github/dfa1/vortex/writer/VortexWriter.java @@ -412,7 +412,7 @@ public void writeChunk(Map columns) throws IOException { } // Pre-validate row counts so a length mismatch is rejected with a clear error - // before any data is serialised. Without this check, the writer would produce a + // before any data is serialized. Without this check, the writer would produce a // file whose column chunks claim different row counts — readable but logically // inconsistent. long expectedLen = -1L; @@ -649,7 +649,7 @@ private ByteBuffer buildArrayFlatBuffer(EncodeResult result, long nullCount) { ? io.github.dfa1.vortex.core.fbs.FbsArrayStats.createMinVector(fbb, result.statsMin()) : 0; int maxVec = result.hasStats() ? io.github.dfa1.vortex.core.fbs.FbsArrayStats.createMaxVector(fbb, result.statsMax()) : 0; - // forceDefaults only while building ArrayStats, so null_count = 0 is serialised (flatbuffers + // forceDefaults only while building ArrayStats, so null_count = 0 is serialized (flatbuffers // omits a scalar equal to its default otherwise) — matching the Rust writer and letting the // reader prune IS NULL on zero-null chunks. Reset immediately so the Array/ArrayNode tables // keep their normal (offset-default-omitting) layout. @@ -760,7 +760,7 @@ private DType columnDtype(String colName) { /// Writes one `vortex.stats` zone-map for `colName`: one zone per chunk, with NULL_COUNT always, /// MAX/MIN (plus always-false `_is_truncated` flags) when `minMaxDtype` is non-null, and SUM when - /// `sumDtype` is non-null. `minBytes`/`maxBytes`/`sumBytes` hold each zone's serialised scalar — + /// `sumDtype` is non-null. `minBytes`/`maxBytes`/`sumBytes` hold each zone's serialized scalar — /// read only when the matching dtype is set; a `null` `sumBytes` entry marks an overflowed zone /// (recorded as a null sum). Field/bit order follows ZonedStatsSchema: MAX(3), MIN(4), SUM(5), /// NULL_COUNT(6). @@ -875,7 +875,7 @@ private static DType zoneSumDtype(DType dtype) { }; } - /// The serialised per-chunk SUM scalar for `data` of logical type `dtype`, or `null` when the + /// The serialized per-chunk SUM scalar for `data` of logical type `dtype`, or `null` when the /// column is not summable (non-primitive) or the sum overflowed. Validity placeholders are zero /// and therefore sum-neutral, so a nullable carrier sums correctly via its values. private static byte[] columnSum(DType dtype, Object data) { @@ -887,7 +887,7 @@ private static byte[] columnSum(DType dtype, Object data) { } /// Builds the per-zone min (or max) values array for the resolved min/max `dtype`, decoding each - /// zone's serialised [ProtoScalarValue] stat into the array shape its encoder expects. + /// zone's serialized [ProtoScalarValue] stat into the array shape its encoder expects. private static Object zoneStatValues(DType minMaxDtype, List statBytes) throws IOException { return switch (minMaxDtype) { case DType.Primitive p -> statColumn(p.ptype(), statBytes); @@ -897,7 +897,7 @@ private static Object zoneStatValues(DType minMaxDtype, List statBytes) } /// Builds the per-zone SUM array for `sumDtype` (i64/u64 → `long[]`, f64 → `double[]`), decoding - /// each zone's serialised scalar. Zones whose sum overflowed carry a `null` entry in `sumBytes`; + /// each zone's serialized scalar. Zones whose sum overflowed carry a `null` entry in `sumBytes`; /// `valid[i]` is set accordingly so the stat field reports them as null. private static Object sumColumn(DType sumDtype, List sumBytes, boolean[] valid) throws IOException { PType ptype = ((DType.Primitive) sumDtype).ptype(); @@ -918,7 +918,7 @@ private static Object sumColumn(DType sumDtype, List sumBytes, boolean[] return a; } - /// Builds the per-zone string array by decoding each zone's serialised string [ProtoScalarValue] + /// Builds the per-zone string array by decoding each zone's serialized string [ProtoScalarValue] /// stat. Used for Utf8 columns whose `vortex.varbin` encoder records full string min/max scalars. private static String[] statStringColumn(List statBytes) throws IOException { String[] out = new String[statBytes.size()]; @@ -929,7 +929,7 @@ private static String[] statStringColumn(List statBytes) throws IOExcept } /// Builds the per-zone values array in the storage shape the primitive encoder expects, decoding - /// each zone's serialised [ProtoScalarValue] stat. + /// each zone's serialized [ProtoScalarValue] stat. private static Object statColumn(PType ptype, List statBytes) throws IOException { int n = statBytes.size(); return switch (ptype) { @@ -976,7 +976,7 @@ private static Object statColumn(PType ptype, List statBytes) throws IOE yield a; } case F16 -> { - // F16 min/max are serialised as f32 scalars; re-pack to float16 storage. + // F16 min/max are serialized as f32 scalars; re-pack to float16 storage. short[] a = new short[n]; for (int i = 0; i < n; i++) { a[i] = Float.floatToFloat16((float) scalarDouble(statBytes.get(i))); @@ -987,13 +987,13 @@ private static Object statColumn(PType ptype, List statBytes) throws IOE } private static long scalarLong(byte[] bytes) throws IOException { - // Integer columns serialise min/max as int64 (signed) or uint64 (unsigned). + // Integer columns serialize min/max as int64 (signed) or uint64 (unsigned). ProtoScalarValue sv = decodeScalar(bytes); return sv.int64_value() != null ? sv.int64_value() : sv.uint64_value(); } private static double scalarDouble(byte[] bytes) throws IOException { - // Float columns serialise min/max as f64 (F64) or f32 (F32). Branch rather than use a + // Float columns serialize min/max as f64 (F64) or f32 (F32). Branch rather than use a // ternary so the F32 path widens Float -> double explicitly instead of mixing boxed types. ProtoScalarValue sv = decodeScalar(bytes); if (sv.f64_value() != null) { diff --git a/writer/src/main/java/io/github/dfa1/vortex/writer/WriteOptions.java b/writer/src/main/java/io/github/dfa1/vortex/writer/WriteOptions.java index c49ca8049..4d73a3b54 100644 --- a/writer/src/main/java/io/github/dfa1/vortex/writer/WriteOptions.java +++ b/writer/src/main/java/io/github/dfa1/vortex/writer/WriteOptions.java @@ -29,7 +29,7 @@ public static WriteOptions defaults() { } /// Enable cascading compression with up to `depth` recursive levels. - /// Depth 0 preserves current first-match behaviour. + /// Depth 0 preserves current first-match behavior. /// /// @param depth maximum cascade depth /// @return `WriteOptions` with cascading enabled at the given depth diff --git a/writer/src/main/java/io/github/dfa1/vortex/writer/encode/AlpEncodingEncoder.java b/writer/src/main/java/io/github/dfa1/vortex/writer/encode/AlpEncodingEncoder.java index d535b63b8..d929646ea 100644 --- a/writer/src/main/java/io/github/dfa1/vortex/writer/encode/AlpEncodingEncoder.java +++ b/writer/src/main/java/io/github/dfa1/vortex/writer/encode/AlpEncodingEncoder.java @@ -63,12 +63,12 @@ public CascadeStep encodeCascade(DType dtype, Object data, EncodeContext ctx) { return CascadeStep.terminal(encode(dtype, data, ctx)); } - /// Picks `(expE, expF)` by minimising the estimated post-cascade byte size + /// Picks `(expE, expF)` by minimizing the estimated post-cascade byte size /// (FoR + bitpack on the encoded integers, plus per-exception patch overhead) on a - /// stratified sample, breaking ties in favour of the smaller `e - f` gap. + /// stratified sample, breaking ties in favor of the smaller `e - f` gap. /// Mirrors Rust's `ALPFloat::find_best_exponents`. /// - /// The previous heuristic (minimise exception count) picked combinations like + /// The previous heuristic (minimize exception count) picked combinations like /// `(e=14, f=0)` that produced few exceptions but huge encoded mantissas, forcing /// the cascade into Dict+FoR+BitPacked instead of a clean ALP→BitPacked chain. private static int[] findExponentsF64(double[] values) { diff --git a/writer/src/main/java/io/github/dfa1/vortex/writer/encode/BitpackedEncodingEncoder.java b/writer/src/main/java/io/github/dfa1/vortex/writer/encode/BitpackedEncodingEncoder.java index 35fcbe356..e184b7f71 100644 --- a/writer/src/main/java/io/github/dfa1/vortex/writer/encode/BitpackedEncodingEncoder.java +++ b/writer/src/main/java/io/github/dfa1/vortex/writer/encode/BitpackedEncodingEncoder.java @@ -139,7 +139,7 @@ public EncodeResult encode(DType dtype, Object data, EncodeContext ctx) { return new EncodeResult(root, List.of(packed, idxBuf, valBuf), statsMin, statsMax); } - /// Picks the bit-width that minimises `packed_bytes + exceptions_bytes`. + /// Picks the bit-width that minimizes `packed_bytes + exceptions_bytes`. /// Mirrors `vortex-fastlanes::bitpack_compress::best_bit_width`. private static int bestBitWidth(int[] bitWidthFreq, int bytesPerException, int n) { if (n == 0) { diff --git a/writer/src/main/java/io/github/dfa1/vortex/writer/encode/CascadeStep.java b/writer/src/main/java/io/github/dfa1/vortex/writer/encode/CascadeStep.java index f6648fba9..fea4a1d53 100644 --- a/writer/src/main/java/io/github/dfa1/vortex/writer/encode/CascadeStep.java +++ b/writer/src/main/java/io/github/dfa1/vortex/writer/encode/CascadeStep.java @@ -18,8 +18,8 @@ /// @param partialRoot partially-assembled root encode node (may be `null` when not applicable) /// @param ownedBuffers buffers owned directly by the root node, before child buffers are appended /// @param openChildren child slots to be filled recursively by the cascading compressor -/// @param statsMin serialised minimum stat bytes, or `null` -/// @param statsMax serialised maximum stat bytes, or `null` +/// @param statsMin serialized minimum stat bytes, or `null` +/// @param statsMax serialized maximum stat bytes, or `null` /// @param applicable `false` if this encoding cannot handle the input data @SuppressWarnings("java:S6218") // internal data carrier; record components are arrays of immutable primitives or refs that flow through pipelines without ever being compared. public record CascadeStep( diff --git a/writer/src/main/java/io/github/dfa1/vortex/writer/encode/CascadingCompressor.java b/writer/src/main/java/io/github/dfa1/vortex/writer/encode/CascadingCompressor.java index a7b493c0d..4e04f5500 100644 --- a/writer/src/main/java/io/github/dfa1/vortex/writer/encode/CascadingCompressor.java +++ b/writer/src/main/java/io/github/dfa1/vortex/writer/encode/CascadingCompressor.java @@ -153,7 +153,7 @@ private EncodeResult encodeWithCtx(DType dtype, Object data, EncodeContext ctx) } // Non-primitives (extension types): find the accepting encoding and splice // through it so its cascaded children (e.g. datetimeparts → days/seconds/subseconds) - // are recursively compressed rather than stored as raw primitives. Honour the + // are recursively compressed rather than stored as raw primitives. Honor the // excluded set so spliceResult's notApplicable retry can rotate to the next // accepting encoding (e.g. DateTimePartsEncoding → ExtEncoding when the input // is raw storage rather than DateTimePartsData). @@ -316,7 +316,7 @@ private EncodingEncoder findPrimitiveEncoding(DType dtype, Set exclu return enc; } } - // Fall through to any accepting encoding (still honouring exclusions so that + // Fall through to any accepting encoding (still honoring exclusions so that // spliceResult's notApplicable retry rotates to the next candidate). for (EncodingEncoder enc : encodings) { if (excluded.contains(enc.encodingId())) { diff --git a/writer/src/main/java/io/github/dfa1/vortex/writer/encode/EncodeResult.java b/writer/src/main/java/io/github/dfa1/vortex/writer/encode/EncodeResult.java index 5949508ca..e0c8ab624 100644 --- a/writer/src/main/java/io/github/dfa1/vortex/writer/encode/EncodeResult.java +++ b/writer/src/main/java/io/github/dfa1/vortex/writer/encode/EncodeResult.java @@ -9,8 +9,8 @@ /// /// @param rootNode the root encode node describing the encoding tree structure /// @param buffers flat list of data buffers in the order referenced by `rootNode` -/// @param statsMin serialised minimum value bytes for zone-map pruning, or `null` -/// @param statsMax serialised maximum value bytes for zone-map pruning, or `null` +/// @param statsMin serialized minimum value bytes for zone-map pruning, or `null` +/// @param statsMax serialized maximum value bytes for zone-map pruning, or `null` @SuppressWarnings("java:S6218") // internal data carrier; record components are arrays of immutable primitives or refs that flow through pipelines without ever being compared. public record EncodeResult( EncodeNode rootNode, @@ -22,8 +22,8 @@ public record EncodeResult( /// /// @param encodingId the encoding identifier for the leaf node /// @param data the single data buffer - /// @param min serialised minimum stat bytes, or `null` - /// @param max serialised maximum stat bytes, or `null` + /// @param min serialized minimum stat bytes, or `null` + /// @param max serialized maximum stat bytes, or `null` /// @return an [EncodeResult] backed by a single-buffer leaf node public static EncodeResult simple(EncodingId encodingId, MemorySegment data, byte[] min, byte[] max) { return new EncodeResult(EncodeNode.leaf(encodingId, 0), List.of(data), min, max); diff --git a/writer/src/main/java/io/github/dfa1/vortex/writer/encode/NullableData.java b/writer/src/main/java/io/github/dfa1/vortex/writer/encode/NullableData.java index 5709f338b..839b63f84 100644 --- a/writer/src/main/java/io/github/dfa1/vortex/writer/encode/NullableData.java +++ b/writer/src/main/java/io/github/dfa1/vortex/writer/encode/NullableData.java @@ -9,7 +9,7 @@ /// alongside real data. `validity` has the same logical length: `true` /// for valid rows, `false` for nulls. /// -/// The writer recognises this shape and emits the `vortex.masked` +/// The writer recognizes this shape and emits the `vortex.masked` /// wire layout: a non-nullable child (the storage) plus an optional Bool /// validity child. Readers reconstruct a [io.github.dfa1.vortex.reader.array.MaskedArray]. /// diff --git a/writer/src/main/java/io/github/dfa1/vortex/writer/encode/PcoIntMultDetector.java b/writer/src/main/java/io/github/dfa1/vortex/writer/encode/PcoIntMultDetector.java index 9e965df13..59d76ca13 100644 --- a/writer/src/main/java/io/github/dfa1/vortex/writer/encode/PcoIntMultDetector.java +++ b/writer/src/main/java/io/github/dfa1/vortex/writer/encode/PcoIntMultDetector.java @@ -32,7 +32,7 @@ private PcoIntMultDetector() { /// /// @param latents unsigned-ordered latent values /// @param dtypeSize physical width in bits (16/32/64) — restricts GCD search - /// @return the chosen base if IntMult is favourable, otherwise empty + /// @return the chosen base if IntMult is favorable, otherwise empty static OptionalLong choose(long[] latents, int dtypeSize) { long[] sample = sample(latents); if (sample == null) { diff --git a/writer/src/main/java/io/github/dfa1/vortex/writer/encode/PrimitiveEncodingEncoder.java b/writer/src/main/java/io/github/dfa1/vortex/writer/encode/PrimitiveEncodingEncoder.java index 55d7abd73..094f19aa0 100644 --- a/writer/src/main/java/io/github/dfa1/vortex/writer/encode/PrimitiveEncodingEncoder.java +++ b/writer/src/main/java/io/github/dfa1/vortex/writer/encode/PrimitiveEncodingEncoder.java @@ -86,7 +86,7 @@ private static MemorySegment encodePrimitive(PType ptype, Object data, Arena are }; } - /// Computes the serialised min/max [io.github.dfa1.vortex.core.proto.ProtoScalarValue] pair for a raw + /// Computes the serialized min/max [io.github.dfa1.vortex.core.proto.ProtoScalarValue] pair for a raw /// primitive array, in the same signed/unsigned/float shape the per-segment stats use. Returns /// `null` for an empty array. Shared so the dictionary zone-map path computes per-chunk min/max /// identically to the flat path. @@ -290,7 +290,7 @@ public static byte[][] minMaxStats(PType ptype, Object data) { }; } - /// Computes the serialised SUM [io.github.dfa1.vortex.core.proto.ProtoScalarValue] for a raw primitive + /// Computes the serialized SUM [io.github.dfa1.vortex.core.proto.ProtoScalarValue] for a raw primitive /// array, in the widened shape Rust uses for zone-map sums: signed ints → `i64`, unsigned ints /// → `u64`, floats → `f64`. Returns `null` on integer overflow (Rust drops the zone's sum) and /// for an empty array. Floats never overflow to `null` (they saturate to infinity). diff --git a/writer/src/main/java/io/github/dfa1/vortex/writer/encode/RunEndEncodingEncoder.java b/writer/src/main/java/io/github/dfa1/vortex/writer/encode/RunEndEncodingEncoder.java index 0d1c7b6c2..e56b612e1 100644 --- a/writer/src/main/java/io/github/dfa1/vortex/writer/encode/RunEndEncodingEncoder.java +++ b/writer/src/main/java/io/github/dfa1/vortex/writer/encode/RunEndEncodingEncoder.java @@ -45,7 +45,7 @@ public Estimate expectedRatio(DType dtype, Object data, ArrayStats stats) { } // Skip rule: if every value is distinct, each row is its own run — pure overhead. // Defer to the sample-encoded path otherwise; RunEnd's actual compression depends - // on run-length distribution which is not summarised by distinct count alone. + // on run-length distribution which is not summarized by distinct count alone. if (distinct >= n) { return Estimate.SKIP; } diff --git a/writer/src/main/java/io/github/dfa1/vortex/writer/encode/VarBinEncodingEncoder.java b/writer/src/main/java/io/github/dfa1/vortex/writer/encode/VarBinEncodingEncoder.java index 6ee60b069..527ee0218 100644 --- a/writer/src/main/java/io/github/dfa1/vortex/writer/encode/VarBinEncodingEncoder.java +++ b/writer/src/main/java/io/github/dfa1/vortex/writer/encode/VarBinEncodingEncoder.java @@ -69,7 +69,7 @@ public EncodeResult encode(DType dtype, Object data, EncodeContext ctx) { return new EncodeResult(root, List.of(bytesBuf, offsetsBuf), statsMin, statsMax); } - /// Computes the serialised min/max string [ProtoScalarValue] pair for a string array, skipping + /// Computes the serialized min/max string [ProtoScalarValue] pair for a string array, skipping /// `null` entries (lexicographic by [String#compareTo]). Returns `null` when every entry is /// `null`. Shared so the dictionary zone-map path computes per-chunk string min/max identically /// to the flat path. diff --git a/writer/src/main/java/io/github/dfa1/vortex/writer/encode/ZstdEncodingEncoder.java b/writer/src/main/java/io/github/dfa1/vortex/writer/encode/ZstdEncodingEncoder.java index aef2996d9..6d36f9b3a 100644 --- a/writer/src/main/java/io/github/dfa1/vortex/writer/encode/ZstdEncodingEncoder.java +++ b/writer/src/main/java/io/github/dfa1/vortex/writer/encode/ZstdEncodingEncoder.java @@ -100,7 +100,7 @@ private EncodeResult encodeVarBin(String[] strings, Arena arena) { private EncodeResult buildResult(MemorySegment raw, FrameLayout layout, Arena arena) { // Zero-copy: each frame is an arena-native slice of raw, compressed straight into another - // arena segment. A single-value-per-array config yields one frame (the prior behaviour). + // arena segment. A single-value-per-array config yields one frame (the prior behavior). Frames frames = compressFrames(raw, layout, arena); EncodeNode root = new EncodeNode(EncodingId.VORTEX_ZSTD, MemorySegment.ofArray(frames.metadata()), new EncodeNode[0], frameBufferIndices(frames.compressed().size(), 0)); diff --git a/writer/src/test/java/io/github/dfa1/vortex/writer/VortexReads.java b/writer/src/test/java/io/github/dfa1/vortex/writer/VortexReads.java index ab19ae67e..da47ccaba 100644 --- a/writer/src/test/java/io/github/dfa1/vortex/writer/VortexReads.java +++ b/writer/src/test/java/io/github/dfa1/vortex/writer/VortexReads.java @@ -10,7 +10,7 @@ import java.util.ArrayList; import java.util.List; -/// Shared scan-and-collect helpers for writer round-trip tests. Materialise each +/// Shared scan-and-collect helpers for writer round-trip tests. Materialize each /// chunk's values into a heap container before the chunk's arena closes; the /// returned arrays/lists outlive the scan lifecycle. final class VortexReads { diff --git a/writer/src/test/java/io/github/dfa1/vortex/writer/WriterZoneMapTest.java b/writer/src/test/java/io/github/dfa1/vortex/writer/WriterZoneMapTest.java index df7648c05..95d71957d 100644 --- a/writer/src/test/java/io/github/dfa1/vortex/writer/WriterZoneMapTest.java +++ b/writer/src/test/java/io/github/dfa1/vortex/writer/WriterZoneMapTest.java @@ -457,7 +457,7 @@ void zoneMaps_perTypeStatsDecodePerZoneMinMax(PType ptype, @TempDir Path tmp) th } } - /// Reads the per-zone stat at `idx` from a materialised min/max segment, widened to double for + /// Reads the per-zone stat at `idx` from a materialized min/max segment, widened to double for /// uniform assertion across the fixed-width primitive types. private static double readStat(MemorySegment seg, PType ptype, int idx) { return switch (ptype) {