Skip to content

[BWARE] Add sort support to compressed column groups#2507

Merged
Baunsgaard merged 7 commits into
apache:mainfrom
Baunsgaard:split/compressedSort
Jun 25, 2026
Merged

[BWARE] Add sort support to compressed column groups#2507
Baunsgaard merged 7 commits into
apache:mainfrom
Baunsgaard:split/compressedSort

Conversation

@Baunsgaard

Copy link
Copy Markdown
Contributor

Implement single-column sort for compressed matrices via a new AColGroup.sort that reorders the dictionary and remaps indexes. Add CLALibSort driver, IDictionary/Dictionary sort with shared index permutation, and per-column-group sort implementations.

Implement single-column sort for compressed matrices via a new AColGroup.sort that reorders the dictionary and remaps indexes. Add CLALibSort driver, IDictionary/Dictionary sort with shared index permutation, and per-column-group sort implementations.
@Baunsgaard Baunsgaard force-pushed the split/compressedSort branch from 3bd84e0 to 61ac368 Compare June 24, 2026 13:46
@Baunsgaard Baunsgaard changed the title Add sort support to compressed column groups [BWARE] Add sort support to compressed column groups Jun 24, 2026
@codecov

codecov Bot commented Jun 24, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 76.36364% with 52 lines in your changes missing coverage. Please review.
✅ Project coverage is 71.55%. Comparing base (e295b40) to head (539e3d4).
⚠️ Report is 3 commits behind head on main.

Files with missing lines Patch % Lines
...ysds/runtime/compress/colgroup/ColGroupSDCFOR.java 0.00% 24 Missing ⚠️
.../apache/sysds/runtime/compress/lib/CLALibSort.java 91.48% 1 Missing and 3 partials ⚠️
...e/sysds/runtime/compress/colgroup/ColGroupSDC.java 88.00% 1 Missing and 2 partials ⚠️
...time/compress/colgroup/ColGroupSDCSingleZeros.java 72.72% 1 Missing and 2 partials ⚠️
...ysds/runtime/compress/colgroup/ColGroupDDCLZW.java 0.00% 2 Missing ⚠️
...s/runtime/compress/colgroup/ColGroupSDCSingle.java 81.81% 1 Missing and 1 partial ⚠️
...ds/runtime/compress/colgroup/ColGroupSDCZeros.java 91.66% 1 Missing and 1 partial ⚠️
...untime/compress/colgroup/ColGroupUncompressed.java 50.00% 1 Missing and 1 partial ⚠️
...ess/colgroup/dictionary/MatrixBlockDictionary.java 50.00% 1 Missing and 1 partial ⚠️
...sysds/runtime/compress/colgroup/ColGroupEmpty.java 0.00% 1 Missing ⚠️
... and 7 more
Additional details and impacted files
@@             Coverage Diff              @@
##               main    #2507      +/-   ##
============================================
- Coverage     71.56%   71.55%   -0.02%     
- Complexity    49052    49113      +61     
============================================
  Files          1574     1575       +1     
  Lines        189565   189784     +219     
  Branches      37188    37232      +44     
============================================
+ Hits         135658   135791     +133     
- Misses        43422    43496      +74     
- Partials      10485    10497      +12     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Add CompressedSortTest covering the single-column sort of compressed
column groups (DDC, SDC, SDCSingle, SDCSingleZeros, CONST) by comparing
the decompressed result against an ascending reference sort.

Fix ColGroupSDCSingleZeros.sort and ColGroupSDCSingle.sort, which both
indexed the per-value counts array beyond the dictionary sort length
(throwing ArrayIndexOutOfBoundsException) and never advanced the offset
cursor. These encodings hold a single non-default value, so sorting is a
contiguous block placed before or after the default values depending on
sign/ordering relative to the default.
Route the order (SortIndex) reorg through CLALibSort so a single column
held in a single column group is sorted ascending while staying
compressed. Multiple columns, multiple groups, descending order, index
return, or encodings without a sort implementation fall back to a
decompressed reorg via a shared fallback in CLALibReorg.

Rewrite CLALibSort to expose a SortIndex-based entry that returns null
when the compressed fast-path does not apply, instead of the previous
unused, semantically inconsistent sortOperations-style helper.

Fix ColGroupUncompressed.sort, which built a quantile value/weight table
via sortOperations instead of ordering the column; it now reorders the
rows ascending.

Expand CompressedSortTest to drive the order operation end to end through
reorgOperations, covering compressed sorting (DDC, SDC variants, CONST,
uncompressed column group) and the decompression fallbacks (descending
and multi-column).
… table

Restore the CompressedMatrixBlock.sortOperations(weights, result, k)
override so the qsort/median/quantile path (SortKeys lop) runs through
CLALibSort instead of always decompressing.

For the unweighted single-column single-group case, CLALibSort now sorts
the few distinct values via the column-group sort and builds the exact
(1 + nnz) x 2 value/weight table that MatrixBlock.sortOperations
produces: one row per non-zero value (weight 1) plus a single collapsed
zero row, ordered ascending. This keeps downstream pickValue/median/IQM
results bit-identical (their averaging logic depends on the per-element
table layout) while avoiding a full-length sort. Weighted, multi-column,
multi-group, or unsupported encodings fall back to a decompressed sort.

Add median/quantile coverage to CompressedSortTest comparing the
compressed value/weight table and the resulting median/quantile picks
against the uncompressed reference.
Emit the compressed single-column quantile/median value-weight table through
the same reorg used by MatrixBlock.sortOperations instead of building it and
calling recomputeNonZeros. The uncompressed reference leaves the result's
non-zero count unmaintained (0); recomputing it on the compressed side made
the two paths asymmetric and broke CompressedVectorTest.testSortOperations,
which relies on both sides reporting the same empty/non-empty state. Routing
through reorg makes the produced table bit-for-bit identical to the
uncompressed path, including its metadata.
Add a testSort case mirroring testSortOperations so the new compressed
order() reorg path (CLALibSort) is exercised across the full single-column
parameter matrix (sparsity, value type, value range, DDC/SDC/UNCOMPRESSED),
comparing the compressed result against the uncompressed reference. This
cheaply hits many more encoding variations than the dedicated CompressedSortTest.
- Add a single-column sort() test to DictionaryTests covering
  MatrixBlockDictionary.sort() and Dictionary.sort(), validating the
  returned permutation yields a non-decreasing sequence.
- Add CLALibSort fallback tests: index-return order, unsupported (OLE)
  encoding for both the order and quantile paths, and a dense
  all-negative quantile table.
- Simplify CLALibSort.sortTableSingleColumn: the value/weight table is
  re-sorted by the reorg used for metadata parity, so the explicit
  negative/zero/positive ordering and zeroWritten tracking were
  redundant. Emit one weight-1 row per non-zero plus one collapsed zero
  row in any order, matching MatrixBlock.sortOperations.
@Baunsgaard Baunsgaard merged commit abddc1d into apache:main Jun 25, 2026
50 checks passed
@github-project-automation github-project-automation Bot moved this from In Progress to Done in SystemDS PR Queue Jun 25, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

1 participant