Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions .claude/skills/improve-performance.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ performance improvements in Java/Maven projects.
## Workflow Overview

```
setup → benchmark → profile → analyse → change → repeat
setup → benchmark → profile → analyze → change → repeat
```

---
Expand Down Expand Up @@ -63,7 +63,7 @@ Store the baseline score. Compare every subsequent run against it.

---

## Step 4 — Analyse results
## Step 4 — Analyze results

Key things to look for:

Expand Down Expand Up @@ -128,7 +128,7 @@ Common optimizations to consider (in order of typical impact):
constant-folds the stride / alignment / order. Inline `ValueLayout.JAVA_LONG_UNALIGNED` on each
call defeats this.
4. **Use `getAtIndex` / `setAtIndex`** in tight loops over a `MemorySegment` — stride is implicit,
bounds check hoists, and the auto-vectoriser reads the shape cleanly.
bounds check hoists, and the auto-vectorizer reads the shape cleanly.
5. **Aligned arena allocation** — `arena.allocate(n, 64)` keeps SIMD-friendly addresses.
6. **Improve data locality** — colocate fields accessed together, prefer flat arrays / segments
over linked structures.
Expand Down
24 changes: 12 additions & 12 deletions CHANGELOG.md

Large diffs are not rendered by default.

3 changes: 3 additions & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -204,6 +204,9 @@ in the Rust source for the exact schema, then implement from spec.
## Code style

- 4-space indent, **zero SonarQube bugs/smells**, no `sun.misc.Unsafe` or internal JDK APIs.
- **American English everywhere** (javadoc, comments, identifiers):
`recognize`/`optimize`/`finalize`/`serialize`/`normalize`/`behavior`/`color` — never
`-ise`/`-isation`/`-our`. Matches the JDK (`Object.finalize`, `Serializable`).
- Prefer explicit over clever; fail fast on unhandled cases.
- Idiomatic modern Java: reuse the JDK (override `Iterator.forEachRemaining`, don't invent
`forEachChunk`; use `Optional`, records, sealed types, pattern switches, virtual threads, FFM).
Expand Down
4 changes: 2 additions & 2 deletions SECURITY.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ Use GitHub's private vulnerability reporting:
1. Open <https://github.com/dfa1/vortex-java/security/advisories/new>.
2. Fill in the form. Include a minimal reproduction (a `.vortex` file or the bytes that
trigger the issue) where possible.
3. You'll receive an acknowledgement within **3 business days**.
3. You'll receive an acknowledgment within **3 business days**.

If GitHub's reporting flow is unavailable, email the maintainer at the address on the project's
Maven Central metadata.
Expand All @@ -42,7 +42,7 @@ In scope:
- Any malformed `.vortex` input that causes silent data corruption — wrong row count,
wrong values, or a misaligned column with a successful return.
- Any vulnerability in `VortexWriter` that produces files which would later trigger the
above behaviours when read.
above behaviors when read.

Out of scope:

Expand Down
6 changes: 3 additions & 3 deletions TODO.md
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,7 @@ Per-encoding gotchas:
and `vortex-jni`; assert both throw or both return identical row count + values. Reuse
`RustWritesJavaReadsIntegrationTest` harness.
- [ ] **OSS-Fuzz submission** — Jazzer is a first-class OSS-Fuzz engine; submit the project
once the corpus + targets stabilise. Free continuous fuzzing.
once the corpus + targets stabilize. Free continuous fuzzing.

## Build

Expand All @@ -85,8 +85,8 @@ Per-encoding gotchas:

## Compute

- [ ] **Compute primitives — masks, kernels, no-materialise** — pushdown filter/compare/aggregate
kernels operating on Lazy arrays without materialising. See [ADR-0013](docs/adr/0013-compute-primitives.md)
- [ ] **Compute primitives — masks, kernels, no-materialize** — pushdown filter/compare/aggregate
kernels operating on Lazy arrays without materializing. See [ADR-0013](docs/adr/0013-compute-primitives.md)
(Proposed). Gate: a concrete downstream consumer (e.g. the vortex-arrow bridge or filter pushdown).
Done: §6 read-side surface — `ScanIterator.columnZoneStats(col)` exposes per-zone
min/max/sum/null count, decoding sum from the `vortex.stats` zone-map table (matches files from
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,7 @@
/// A single Vortex file exposed to Calcite as a flat SQL table with column projection and
/// zone-map filter push-down.
///
/// Projection (`projects`) is honoured exactly — only the requested columns are decoded and
/// Projection (`projects`) is honored exactly — only the requested columns are decoded and
/// returned. Filters (`filters`) that translate to a [RowFilter] are pushed into the scan for
/// *chunk skipping* via zone-map statistics, but are **left in Calcite's list** rather than
/// consumed: zone-map pruning is approximate (it drops whole chunks that cannot match, not
Expand Down Expand Up @@ -521,7 +521,7 @@ private static Match classify(RowFilter filter, int zone,

/// Classifies one zone against a column-bound [Predicate] from the zone's statistics `s`. The
/// comparison ops carry the same three-valued-logic semantics as before the [RowFilter] /
/// [Predicate] unification: an unrecognised stat shape or a partially-overlapping zone is
/// [Predicate] unification: an unrecognized stat shape or a partially-overlapping zone is
/// [Match#BOUNDARY], a zone provably outside the predicate is [Match#OUT], and a zone every row
/// of which matches (which, for a value comparison, also requires the zone to carry no nulls) is
/// [Match#IN]. The composite and range predicates ([Predicate.Between] / [Predicate.And] /
Expand Down Expand Up @@ -736,7 +736,7 @@ public Enumerator<Object[]> enumerator() {
}

/// Streaming [Enumerator] over a Vortex scan: advances chunk by chunk, decoding each requested
/// column once per chunk and materialising one `Object[]` row per [#moveNext()]. Rows are not
/// column once per chunk and materializing one `Object[]` row per [#moveNext()]. Rows are not
/// retained, so the working set stays at one chunk rather than the whole result.
private final class VortexEnumerator implements Enumerator<Object[]> {

Expand Down Expand Up @@ -899,7 +899,7 @@ private static void collectColumns(RowFilter filter, java.util.Set<String> out)
}
}

/// Lenient translation for the scan path ([#toRowFilter]): an unrecognised node (or `AND`
/// Lenient translation for the scan path ([#toRowFilter]): an unrecognized node (or `AND`
/// conjunct) is simply dropped, since the scan re-checks every row so a partially captured filter
/// is still correct, just less selective for zone-map pruning. Delegates to the shared
/// [#comparison] dispatch with `strict = false`.
Expand All @@ -908,7 +908,7 @@ private static Optional<RowFilter> toComparison(RexNode node, List<String> names
}

/// Strict counterpart of [#toComparison]: the same column-vs-literal / `AND` grammar, but a
/// single unrecognised node (or one `AND` conjunct) collapses the whole result to empty rather
/// single unrecognized node (or one `AND` conjunct) collapses the whole result to empty rather
/// than being dropped, and bare `IS NULL` / `IS NOT NULL` are also translatable. Used by
/// [#translatePushedFilters] so aggregate push-down answers from stats only when the [RowFilter]
/// captures the predicate in full. Delegates to the shared [#comparison] dispatch with
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -228,7 +228,7 @@ private static Pushdown runPushdown(Path file) throws Exception {
}
}

/// Runs a query, prints every row as a labelled table, and returns the row count.
/// Runs a query, prints every row as a labeled table, and returns the row count.
private static long printAndCount(Connection conn, String title, String sql) throws Exception {
System.out.printf("%n[%s]%n", title);
long rows = 0;
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@
import static org.assertj.core.api.Assertions.assertThatThrownBy;

/// Coverage for the adapter surface across every column type: SQL type mapping
/// ([VortexTable#getRowType]), row materialisation ([VortexTable] scan + enumerator),
/// ([VortexTable#getRowType]), row materialization ([VortexTable] scan + enumerator),
/// [VortexSchema] lookup, and [VortexAggregates].
class VortexAdapterCoverageTest {

Expand Down Expand Up @@ -94,7 +94,7 @@ void getRowType_mapsEveryColumnToItsSqlType() {
}

@Test
void scan_materialisesEveryColumnToItsJavaType() {
void scan_materializesEveryColumnToItsJavaType() {
// Given
VortexTable table = new VortexTable(file);

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@ public final class LazyGridSource implements AutoCloseable {
///
/// @param handle open Vortex file handle owned by `worker`
/// @param worker I/O dispatcher for the handle's confined thread
/// @return initialised source
/// @return initialized source
/// @throws InterruptedException if the calling thread is interrupted while
/// waiting for the worker
public static LazyGridSource open(VortexHandle handle, IoWorker worker)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -48,17 +48,17 @@ public static String moveTo(int row, int col) {
return CSI + row + ";" + col + "H";
}

/// Standard SGR foreground colour (codes 30-37 normal, 90-97 bright).
/// Standard SGR foreground color (codes 30-37 normal, 90-97 bright).
///
/// @param code SGR colour code
/// @param code SGR color code
/// @return CSI sequence
public static String fg(int code) {
return CSI + code + "m";
}

/// Standard SGR background colour (codes 40-47 normal, 100-107 bright).
/// Standard SGR background color (codes 40-47 normal, 100-107 bright).
///
/// @param code SGR colour code
/// @param code SGR color code
/// @return CSI sequence
public static String bg(int code) {
return CSI + code + "m";
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7,9 +7,9 @@

/// Translates raw stdin bytes into [Key] events.
///
/// Recognises common CSI sequences emitted by xterm-compatible terminals:
/// Recognizes common CSI sequences emitted by xterm-compatible terminals:
/// `ESC [ A/B/C/D` for arrows, `ESC [ 5~ / 6~` for PgUp/PgDn,
/// `ESC [ H / F` and `ESC [ 1~ / 4~` for Home/End. Any unrecognised
/// `ESC [ H / F` and `ESC [ 1~ / 4~` for Home/End. Any unrecognized
/// escape sequence is dropped and decoding continues with the next byte.
///
/// Stateless across reads - call [#next(InputStream)] for each event.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -107,7 +107,7 @@ void doubleValueAgainstLongColumn_returnsOk() {

@Test
void unknownOperator_returnsUsageError() {
// Given — a lone '!' is a recognised operator char but not a valid operator
// Given — a lone '!' is a recognized operator char but not a valid operator
// When
CliTestSupport.Captured result = capture(() ->
FilterCommand.run(new String[]{"filter", file.toString(), "id", "!", "1"}));
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -96,7 +96,7 @@ void next_eof_returnsEof() throws IOException {

@Test
void next_unknownCsiLetter_yieldsEscape() throws IOException {
// Given — ESC [ Z is xterm reverse-tab; we don't recognise it
// Given — ESC [ Z is xterm reverse-tab; we don't recognize it
ByteArrayInputStream in = bytes(0x1B, '[', 'Z');

// When
Expand Down Expand Up @@ -135,7 +135,7 @@ void next_ss3SequenceVariant_decodesArrows() throws IOException {

@Test
void next_unknownEscapePrefix_yieldsEscape() throws IOException {
// Given — `ESC X` (X is neither '[' nor 'O') is not a recognised
// Given — `ESC X` (X is neither '[' nor 'O') is not a recognized
// CSI or SS3 sequence. Must return Escape rather than try to decode further.
ByteArrayInputStream in = bytes(0x1B, 'X', 'A');

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -95,7 +95,7 @@ public enum EncodingId {
/// callers that demand a known id chain `.orElseThrow(...)`.
///
/// @param id raw encoding id string (e.g. `"vortex.primitive"`)
/// @return matching constant, or empty if not recognised
/// @return matching constant, or empty if not recognized
public static Optional<EncodingId> parse(String id) {
return Optional.ofNullable(LOOKUP.get(id));
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -104,7 +104,7 @@ int beginLenDelim() {
return mark;
}

/// Finalises a length-delimited region opened by [#beginLenDelim()].
/// Finalizes a length-delimited region opened by [#beginLenDelim()].
/// Writes the payload length as a varint at the reserved offset and shifts the payload
/// left if the varint is shorter than 5 bytes.
void endLenDelim(int mark) {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ void parse_knownIds_returnEnumConstant(String wire, ExtensionId expected) {

@Test
void parse_unknownId_returnsEmpty() {
// Given — open-world extension id; library doesn't recognise it
// Given — open-world extension id; library doesn't recognize it
// When / Then — non-throwing miss so the registry can route to passthrough
assertThat(ExtensionId.parse("acme.geopoint")).isEmpty();
}
Expand Down
2 changes: 1 addition & 1 deletion docs/adr/0001-split-read-and-write-runtimes.md
Original file line number Diff line number Diff line change
Expand Up @@ -302,7 +302,7 @@ of CI / integration-test fallout, plus reviewer time. Not a weekend.

- **Side-by-side period drift.** Phases 1–3 leave both the old `Registry`
and the new `ReadRegistry`/`WriteRegistry` registered for each encoding
during transition. Risk: divergent behaviour if a bug fix lands on one
during transition. Risk: divergent behavior if a bug fix lands on one
side and not the other. Mitigation: integration tests run against both
paths during the transition; the old `Registry` becomes a thin forwarder
early in Phase 1.
Expand Down
2 changes: 1 addition & 1 deletion docs/adr/0002-pluggable-dtype-layout-compute.md
Original file line number Diff line number Diff line change
Expand Up @@ -165,7 +165,7 @@ all of the following, in writing:

1. **A named downstream consumer.** Not a hypothetical "someone might
want X." A concrete project / team with a name and a use case.
2. **A spec for the new variant.** Wire format, serialisation,
2. **A spec for the new variant.** Wire format, serialization,
round-trip semantics. Not just "register a custom type" in the
abstract.
3. **Confirmation the existing `Extension` mechanism does not fit.**
Expand Down
10 changes: 5 additions & 5 deletions docs/adr/0006-benchmark-publishing.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,15 +17,15 @@ results to `gh-pages/dev/bench` via `benchmark-action/github-action-benchmark`.
GitHub Actions shared runners share physical hosts with other tenants. JMH
benchmarks are sensitive to CPU frequency scaling, SMT contention, and OS
scheduler noise. Typical variance on shared runners is **20–40%** per run —
larger than the signal for a 10–15% decode optimisation. A number published
larger than the signal for a 10–15% decode optimization. A number published
from a shared runner cannot be cited, compared across commits, or used to
claim a performance target is met.

The current workflow does carry a regression threshold (`alert-threshold: 130%`)
with `comment-on-alert: true`. That is a **2.3 σ** guard relative to the noise
floor — it catches catastrophic regressions (5–10×) but misses 10–30%
regressions, which are the ones that actually matter during encoder
optimisation work.
optimization work.

### The alternative: local-run publish

Expand Down Expand Up @@ -95,7 +95,7 @@ cd -

Without the CI workflow, regressions are caught by:

1. **Running `bench-publish` before and after an optimisation PR.** The
1. **Running `bench-publish` before and after an optimization PR.** The
commit SHAs in the filenames make A/B comparison mechanical.
2. **Adding a JMH regression test** (`@BenchmarkMode(Throughput)` with an
`assert` or a baseline comparison in the performance module) — not
Expand Down Expand Up @@ -141,13 +141,13 @@ longer updated.
- `bench-publish` requires local Java + Maven build; not runnable from a
mobile device / tablet.
- Numbers accumulate only when the developer actively publishes. Long
gaps between optimisation cycles leave stale README tables.
gaps between optimization cycles leave stale README tables.

### Risk
- If a second contributor joins and cannot reproduce numbers on different
hardware, the single-machine baseline becomes a coordination problem.
Mitigation: `benchmark-meta.json` documents the reference hardware;
normalise by throughput ratio (new hardware / reference) rather than
normalize by throughput ratio (new hardware / reference) rather than
absolute scores.

## References
Expand Down
Loading
Loading