Skip to content

[BWARE] Tighten MatrixBlock quantile/sort API surface#2513

Merged
Baunsgaard merged 5 commits into
apache:mainfrom
Baunsgaard:split/vectorApi
Jun 26, 2026
Merged

[BWARE] Tighten MatrixBlock quantile/sort API surface#2513
Baunsgaard merged 5 commits into
apache:mainfrom
Baunsgaard:split/vectorApi

Conversation

@Baunsgaard

@Baunsgaard Baunsgaard commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

Tightens the MatrixBlock quantile/sort API surface so the entry points are sealed against accidental overrides and the weighted/unweighted quantile logic is clearly separated.

@codecov

codecov Bot commented Jun 25, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 93.33333% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 71.57%. Comparing base (b51bde1) to head (3309b94).

Files with missing lines Patch % Lines
.../sysds/runtime/compress/CompressedMatrixBlock.java 0.00% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main    #2513      +/-   ##
============================================
+ Coverage     71.55%   71.57%   +0.02%     
- Complexity    49103    49127      +24     
============================================
  Files          1575     1575              
  Lines        189784   189793       +9     
  Branches      37232    37235       +3     
============================================
+ Hits         135799   135846      +47     
+ Misses        43493    43463      -30     
+ Partials      10492    10484       -8     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@Baunsgaard Baunsgaard changed the title [BWARE] Tighten MatrixBlock API and extract aggregateUnary specialization [BWARE] Tighten MatrixBlock quantile/sort API surface Jun 26, 2026
Refactors the MatrixBlock surface so quantile/sort entry points are
sealed against accidental overrides, sparseToDense is fluent, and the
unary-aggregate dispatch lives in its own helper.

- MatrixBlock:
    - sparseToDense() / sparseToDense(int) now return MatrixBlock so
      they chain at the call site instead of forcing a separate
      lookup of `this`
    - sortOperations, pickValues, pickValue marked final (subclasses
      like CompressedMatrixBlock must route through these instead of
      replacing them)
    - pickValue now dispatches to a new pickUnweightedValue /
      pickWeightedValue helper pair and documents the sorted-input
      precondition
    - isShallowSerialize: drop the two SparseBlockMCSR opt-in branches
      that depended on size estimates; the simpler condition is
      enough and the heuristic was unsafe under MCSR-to-CSR
      conversion
- LibAggregateUnarySpecialization (new): owns the
  aggregateUnary(MatrixBlock, AggregateUnaryOperator, ...) dispatch
  with separate sparse and dense helpers
- LibMatrixMult: in two LMM helpers, allocate the dense result block
  when ret.getDenseBlock() returned null, instead of NPE'ing
Resolve issues found while reviewing the quantile/sort refactor:

- Remove dead duplicate LibAggregateUnarySpecialization: it was a
  byte-for-byte copy of the already-wired LibMatrixAggUnarySpecialization
  with no callers, so it added a second source of truth that would drift.
- Fix the median() Javadoc that referenced a non-existent @param quantile,
  which broke the Javadoc documentation build.
- Restore the original SparseBlockMCSR branches in isShallowSerialize:
  dropping them left inclConvert dead and made toShallowSerializeBlock an
  unconditional no-op; this unrelated serialization change is reverted.
- Allocate the dense WDivMM result block once in the single-threaded driver
  before dispatching MatrixMultWDivTask workers, instead of a per-thread
  null-check-then-allocate guard that raced on the shared result block.
- Add QuantilePickTest covering the single-column unweighted pickValue path.
- Drop the accidental Jackson dependency change from pom.xml that leaked in
  from an unrelated branch.
- Revert LibMatrixMult to upstream: the WDivMM dense-output allocation hoist
  was unrelated to the MatrixBlock API tightening and forced a full dense
  allocation for the basic+sparse-weight result that is meant to stay sparse
  (a memory regression on large sparse weights); upstream needs no such guard.
- Fix median() Javadoc: it locates the median via the weight column and does
  not support a single-column input, so drop the inaccurate unweighted claim.
- Align pickUnweightedValue with its sibling by reading getNumRows().
- Revert an unrelated CTABLE Javadoc edit and stray whitespace hunks.
- Clarify the CompressedMatrixBlock.sparseToDense override comment and tidy
  the new quantile Javadocs.
Quantile picking normally runs on the uncompressed value/weight table from
sortOperations, so the inherited pickValue path is never reached on a compressed
block through that flow. Add tests that sort a single column while keeping it
compressed and then call pickValue directly on the CompressedMatrixBlock,
asserting the result matches the uncompressed sorted column across DDC and SDC
(with zeros and negatives). This locks in the behavior of the now-inherited
pickValue after the redundant overrides were removed.
- median() now dispatches on column count like pickValue: a single-column
  (unweighted) matrix is picked over its one column instead of reading a
  non-existent weight column, which previously threw on single-column input.
- Align pickUnweightedValue with the weighted convention (ceil-based rank with
  an implicit weight of 1 per value) so a single column yields the same
  quantile/median as the equivalent two-column (value, weight) representation;
  the prior round-based position was off by one for medians.
- Expand QuantilePickTest to assert the canonical values with messages, cover
  even/odd averaging, the single-element edge case, and single-column median().
@Baunsgaard Baunsgaard merged commit 5b891a2 into apache:main Jun 26, 2026
50 checks passed
@github-project-automation github-project-automation Bot moved this from In Progress to Done in SystemDS PR Queue Jun 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

1 participant