Skip to content

Adapt Materialize to use the columnated merge batcher#23121

Merged
antiguru merged 2 commits into
MaterializeInc:mainfrom
antiguru:columnated_merge_batcher
Nov 20, 2023
Merged

Adapt Materialize to use the columnated merge batcher#23121
antiguru merged 2 commits into
MaterializeInc:mainfrom
antiguru:columnated_merge_batcher

Conversation

@antiguru

@antiguru antiguru commented Nov 10, 2023

Copy link
Copy Markdown
Member

cc @frankmcsherry

Update our dependency on Differential to use the columnated merge batcher. This allows us to move some of the allocations during hydration to lgalloc.

As it's not obvious: The ColKeyBatch in differential changes such that its merge batcher will use columnated data instead of vector-allocated data. This doesn't show up in this PR because we still depend on the same type definition. For more details, see TimelyDataflow/differential-dataflow#418

Motivation

Part of MaterializeInc/database-issues#6848.

Tips for reviewer

Checklist

  • This PR has adequate test coverage / QA involvement has been duly considered.
  • This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
  • If this PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way), then it is tagged with a T-proto label.
  • If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).
  • This PR includes the following user-facing behavior changes:

@antiguru antiguru force-pushed the columnated_merge_batcher branch 7 times, most recently from dcc6f97 to d21baad Compare November 17, 2023 20:41
@antiguru antiguru force-pushed the columnated_merge_batcher branch from d21baad to 7a99184 Compare November 20, 2023 14:27
@antiguru antiguru marked this pull request as ready for review November 20, 2023 14:28
@antiguru antiguru requested review from a team, frankmcsherry, umanwizard and vmarcos November 20, 2023 14:28

@vmarcos vmarcos left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems fine to me; I included a few questions on details below.

Comment thread src/compute/src/render/reduce.rs Outdated
Comment on lines +137 to +138
T: Timestamp + Lattice + Columnation,
S::Timestamp: Lattice + Refines<T> + Columnation,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was not completely clear to me whether these trait bounds are absolutely required here? We are forcing CollectionBundle to require Columnation on its timestamp types, but we could require these bounds only in the implementation including ensure_collections?

Another way to phrase the question more conceptually: Do we want to ensure Columnation for timestamp types in connection with arrangement usage only or across all contexts (arrangements, dataflow edges) in this PR? The difference is not large now in that the timestamps used in arrangements and the ones in dataflow edges are today the same, but changing the requirements expresses an opinion that they should at least share some more behavior than we've required so far.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, we need the bound on CollectionBundle because ArrangementFlavor needs it. We use both T and S::Timestamp in the handles, so the Columnation bound propagates everywhere.

Comment on lines +636 to +637
T: Timestamp + Lattice + Columnation,
S::Timestamp: Lattice + Refines<T> + Columnation,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAICT, even if we'd like Columnation in the definition of CollectionBundle, then we'd need to require the bound for the scope timestamp type S::Timestamp here, but not for the anchor timestamp type T?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also needed:

error[E0277]: the trait bound `T: Columnation` is not satisfied
   --> src/compute/src/render/context.rs:372:13
    |
372 |     RowUnit(KeyValArrangementImport<S, Row, (), T>),
    |             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ the trait `Columnation` is not implemented for `T`
    |
    = note: required for `TStack<((mz_repr::Row, ()), T, i64), Overflowing<u32>>` to implement `differential_dataflow::trace::implementations::Layout`
    = note: required for `Spine<Rc<OrdValBatch<TStack<((mz_repr::Row, ()), T, i64), Overflowing<u32>>, TimelyStack<((mz_repr::Row, ()), T, i64)>>>>` to implement `TraceReader`
help: consider further restricting this bound
    |
369 |     T: Timestamp + Lattice + timely::container::columnation::Columnation,
    |                            +++++++++++++++++++++++++++++++++++++++++++++

Signed-off-by: Moritz Hoffmann <mh@materialize.com>
Signed-off-by: Moritz Hoffmann <mh@materialize.com>
@antiguru antiguru force-pushed the columnated_merge_batcher branch from 7a99184 to 91e092d Compare November 20, 2023 20:43
@antiguru antiguru enabled auto-merge November 20, 2023 20:44
@antiguru

Copy link
Copy Markdown
Member Author

Thanks for the reviews!

@antiguru antiguru merged commit c44fb2f into MaterializeInc:main Nov 20, 2023
@antiguru antiguru deleted the columnated_merge_batcher branch November 20, 2023 21:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants