Skip to content

repr: switch Diff type from isize -> i64#10777

Merged
petrosagg merged 1 commit into
MaterializeInc:mainfrom
petrosagg:diff-i64
Feb 22, 2022
Merged

repr: switch Diff type from isize -> i64#10777
petrosagg merged 1 commit into
MaterializeInc:mainfrom
petrosagg:diff-i64

Conversation

@petrosagg

@petrosagg petrosagg commented Feb 20, 2022

Copy link
Copy Markdown
Contributor

Motivation

Even though we currently only run materialized in 64bit machines it's
good to have a platform independent type, especially since we persist it.

Checklist

  • This PR has adequate test coverage / QA involvement has been duly considered.
  • This PR adds a release note for any user-facing behavior changes.

@petrosagg petrosagg requested a review from benesch February 20, 2022 18:02

@benesch benesch left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉 woohoo, thanks!

@petrosagg

Copy link
Copy Markdown
Contributor Author

@danhhz could you take a look at the changes in persist? The types are still hardcoded but persist now imports mz_repr and uses mz_repr::Diff instead of copying the definition of the type

@danhhz danhhz left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy to switch isize to i64 in persist, but please don't introduce a dep on repr. src/persist is meant to be a strong abstraction boundary of something that could be independently useful by other differential dataflow users. The featureset is obviously entirely driven by mz's needs and I have no plans to actually extract it, but strong abstractions are critical in building maintainable complex systems, and a dep on repr fails the litmus test.

It's absolutely appropriate to use mz_repr::Diff in STORAGE and then have persist be i64. If we want that type to be able to vary, we also could abstract out ts and diff with something like Codec (though ideally more restrictive than Codec because there are some nice perf things we get in columnar storage from knowing it's an 8-byte integral type).

Happy to hop on a call to hash this out if there's some other requirements I'm not seeing here.

@petrosagg

Copy link
Copy Markdown
Contributor Author

don't introduce a dep on repr. src/persist is meant to be a strong abstraction boundary of something that could be independently useful by other differential dataflow users.

But is that really the case? The fact that this PR had to go in persist and change the type means that the abstraction boundary doesn't actually exist. If tomorrow we make mz_repr::Diff be a i128 we'd have to change persist again, and if mz_repr::Diff switched from a type alias to stuct Diff(i64) or something of that sort persist would be forced to import that crate.

From my point of view adding the mz_repr crate dependency only highlights the reality that is already there and perhaps makes it clearer what needs to be done for the dependency to be removed (i.e make Diff a generic parameter).

If we want that type to be able to vary, we also could abstract out ts and diff with something like Codec

That would be amazing, I asked about this in #persist too. IMO only when this is done we can claim that persist is an independent crate that could be useful for DD users.

@danhhz

danhhz commented Feb 22, 2022

Copy link
Copy Markdown
Contributor

Yeah, I saw your question in #persist after I wrote this comment, but Slack is only working on mobile for me ATM, so I haven't responded to it yet.

I think it's important to distinguish between "the abstraction boundary doesn't actually exist" (your words) and "the featureset is obviously entirely driven by mz's needs" (mine). All we've needed so far is u64 for ts and isize for diff and so that's what we have. We have a ton of work left to do in productionizing persist and so I'm cutting corners where they can be cut. Doing the refactor to make ts and diff swappable has so far been one of those.

In the short term, the way to land this PR is to use i64 in src/persist. It sounds like you'd like these to be abstract in persist and I'm happy to do the work. I have two followup questions:

  • What's the timing on your need for this? Is it a "drop everything it's blocking platform" sort of thing or more just an eventual cleanup?
  • What sort of flexibility do we need? It's pretty easy to make it abstract over any 8-byte integral type. It's also possible to literally just use Codec, but we'd lose some nice performance optimizations in the way we do columnar storage, so I'd like to avoid the full generalization if possible. E.g. if we also want to support 16-byte integral types (your i128 example), then persist would probably want to do something like specialize 2 different columnar batch schemas so both are optimized. But that adds a lot of complexity, so I'd like for us to only do it if we're actually going to switch to i128, not just for the interest of "generality is good".

Even though we currently only run materialized in 64bit machines it's
good to have a platform indepenent type, especially since we persist it.

Signed-off-by: Petros Angelatos <petrosagg@gmail.com>
@petrosagg

Copy link
Copy Markdown
Contributor Author

If we viewed persist only as materialize's way of persisting data then depending on mz-repr sounds like the right thing to do. As per its description:

This module contains the types for representing data in Materialize that all layers of the stack can understand. Think of it as the lingua franca: individual layers may use different representations internally, but they all agree to use this representation at their boundaries.

What's the timing on your need for this? Is it a "drop everything it's blocking platform" sort of thing or more just an eventual cleanup?

It's only a wish (not even a need) if making persist useful for other DD users is a goal. There is no constraint coming from platform that requires this so no need to rethink priorities. Also this PR is probably not the right place to challenge that goal.

In the short term, the way to land this PR is to use i64 in src/persist.

I just did this change, there is now no mention of mz_repr in persist.

@petrosagg petrosagg merged commit 1dd9add into MaterializeInc:main Feb 22, 2022
@petrosagg petrosagg deleted the diff-i64 branch February 22, 2022 18:33
@danhhz

danhhz commented Feb 22, 2022

Copy link
Copy Markdown
Contributor

Thank you! Happy to continue to discuss making Time and Diff pluggable if you like!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants