[BWARE] Tighten MatrixBlock quantile/sort API surface#2513
Merged
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2513 +/- ##
============================================
+ Coverage 71.55% 71.57% +0.02%
- Complexity 49103 49127 +24
============================================
Files 1575 1575
Lines 189784 189793 +9
Branches 37232 37235 +3
============================================
+ Hits 135799 135846 +47
+ Misses 43493 43463 -30
+ Partials 10492 10484 -8 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
Refactors the MatrixBlock surface so quantile/sort entry points are
sealed against accidental overrides, sparseToDense is fluent, and the
unary-aggregate dispatch lives in its own helper.
- MatrixBlock:
- sparseToDense() / sparseToDense(int) now return MatrixBlock so
they chain at the call site instead of forcing a separate
lookup of `this`
- sortOperations, pickValues, pickValue marked final (subclasses
like CompressedMatrixBlock must route through these instead of
replacing them)
- pickValue now dispatches to a new pickUnweightedValue /
pickWeightedValue helper pair and documents the sorted-input
precondition
- isShallowSerialize: drop the two SparseBlockMCSR opt-in branches
that depended on size estimates; the simpler condition is
enough and the heuristic was unsafe under MCSR-to-CSR
conversion
- LibAggregateUnarySpecialization (new): owns the
aggregateUnary(MatrixBlock, AggregateUnaryOperator, ...) dispatch
with separate sparse and dense helpers
- LibMatrixMult: in two LMM helpers, allocate the dense result block
when ret.getDenseBlock() returned null, instead of NPE'ing
Resolve issues found while reviewing the quantile/sort refactor: - Remove dead duplicate LibAggregateUnarySpecialization: it was a byte-for-byte copy of the already-wired LibMatrixAggUnarySpecialization with no callers, so it added a second source of truth that would drift. - Fix the median() Javadoc that referenced a non-existent @param quantile, which broke the Javadoc documentation build. - Restore the original SparseBlockMCSR branches in isShallowSerialize: dropping them left inclConvert dead and made toShallowSerializeBlock an unconditional no-op; this unrelated serialization change is reverted. - Allocate the dense WDivMM result block once in the single-threaded driver before dispatching MatrixMultWDivTask workers, instead of a per-thread null-check-then-allocate guard that raced on the shared result block. - Add QuantilePickTest covering the single-column unweighted pickValue path. - Drop the accidental Jackson dependency change from pom.xml that leaked in from an unrelated branch.
- Revert LibMatrixMult to upstream: the WDivMM dense-output allocation hoist was unrelated to the MatrixBlock API tightening and forced a full dense allocation for the basic+sparse-weight result that is meant to stay sparse (a memory regression on large sparse weights); upstream needs no such guard. - Fix median() Javadoc: it locates the median via the weight column and does not support a single-column input, so drop the inaccurate unweighted claim. - Align pickUnweightedValue with its sibling by reading getNumRows(). - Revert an unrelated CTABLE Javadoc edit and stray whitespace hunks. - Clarify the CompressedMatrixBlock.sparseToDense override comment and tidy the new quantile Javadocs.
Quantile picking normally runs on the uncompressed value/weight table from sortOperations, so the inherited pickValue path is never reached on a compressed block through that flow. Add tests that sort a single column while keeping it compressed and then call pickValue directly on the CompressedMatrixBlock, asserting the result matches the uncompressed sorted column across DDC and SDC (with zeros and negatives). This locks in the behavior of the now-inherited pickValue after the redundant overrides were removed.
- median() now dispatches on column count like pickValue: a single-column (unweighted) matrix is picked over its one column instead of reading a non-existent weight column, which previously threw on single-column input. - Align pickUnweightedValue with the weighted convention (ceil-based rank with an implicit weight of 1 per value) so a single column yields the same quantile/median as the equivalent two-column (value, weight) representation; the prior round-based position was off by one for medians. - Expand QuantilePickTest to assert the canonical values with messages, cover even/odd averaging, the single-element edge case, and single-column median().
6b9a18e to
3309b94
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Tightens the
MatrixBlockquantile/sort API surface so the entry points are sealed against accidental overrides and the weighted/unweighted quantile logic is clearly separated.