Skip to content
This repository was archived by the owner on Sep 12, 2018. It is now read-only.

Parse transactions with rust-peg rather than combine#681

Merged
ncalexan merged 5 commits into
mozilla:masterfrom
ncalexan:rust-peg-tx-parsing
May 10, 2018
Merged

Parse transactions with rust-peg rather than combine#681
ncalexan merged 5 commits into
mozilla:masterfrom
ncalexan:rust-peg-tx-parsing

Conversation

@ncalexan

@ncalexan ncalexan commented May 7, 2018

Copy link
Copy Markdown
Member

OK, I'm done with combine -- it's just too much work to evolve things forward. See the notes in the commit message for fad0b6d. I'm happy to add more details if needed.

This builds on top of my other work in progress (#673), so this review is for 5e11c0a..2e9919c only.

@ncalexan ncalexan requested review from fluffyemily and rnewman May 7, 2018 22:45

@rnewman rnewman left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Literally the only thing I don't like about this parser change is that it puts types into edn. We will at some point need to publish that edn crate on crates.io, I expect.

Is there a reasonable way to encapsulate/reuse those bits of grammar? I know we have more than one entry point, but even so…

Comment thread db/src/internal_types.rs

/// The transactor is tied to `edn::ValueAndSpan` right now, but in the future we'd like to support
/// `TypedValue` directly for programmatic use. `TransactableValue` encapsulates the interface
/// value types (i.e., values in the value place) need to support to be transacted.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this is something like PatternNonValuePlace versus PatternValuePlace, right?

In the query engine we need to handle some kinds of ambiguity — 123 can be a ref or a Long, :foo/bar can be a keyword or an attribute, etc. — and the ambiguities differ for value and non-value places.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. The Entity type started out pretty simple (like [e a v]) and rapidly grew cases, which were literally represented (like EntidOrTempIdOrLookupRef). That's not scaling, so I'm looking at calling those EntityPlace/AttributePlace/ValuePlace. That's not quite the same as what you're suggesting below -- naming the branches -- but it leads to it eventually.

Comment thread db/src/internal_types.rs
}

impl TransactableValue for ValueAndSpan {
fn into_typed_value(self, schema: &Schema, value_type: ValueType) -> Result<TypedValue> {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been thinking for a while that we really have a trait, Interpret or WithRespectTo, which is like this:

fn interpret<T>(self, schema: &Schema) -> Result<T>;

— we do this in a bunch of places, like interpreting an entity ID against a schema to get an attribute, interpreting a keyword against a schema to get an entid, and so on.

This seems like an extension of that, with the obvious complication that you need some way of refining the expected type…

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aye. This is context-sensitive parsing: in your trait the context is fixed to be &Schema, but in the transactor we really care about a context that is (&Schema, ValueType). I don't see a clear path to this specific Interpret trait right now, but it's definitely the direction the transactor and the query engine should be heading.

Comment thread db/src/tx.rs
match v.as_tempid() {
Some(tempid) => Ok(Either::Right(LookupRefOrTempId::TempId(self.intern_temp_id(tempid)))),
None => {
if let TypedValue::Ref(entid) = v.into_typed_value(&self.schema, ValueType::Ref)? {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This block really makes me think that PatternNonValue::Ref, PatternNonValue::Tempid or whatever is neater than this set of switches and Eithers.

@ncalexan

Copy link
Copy Markdown
Member Author

Literally the only thing I don't like about this parser change is that it puts types into edn. We will at some point need to publish that edn crate on crates.io, I expect.

I don't really care about publishing the edn crate, but it's a shame to cut off the option. I wondered about keeping the type definitions in the tx crate, but parts of the transaction entity type uses edn types (keywords and symbols). That means that we can't really avoid gluing the crates together.

What I might be able to do is push the transaction types into a sub-module and put all of that stuff behind a feature flag, so that a consumer that didn't want Mentat-specific tx and query parsing bits could opt-out of them. I think that's follow-up, though.

Is there a reasonable way to encapsulate/reuse those bits of grammar? I know we have more than one entry point, but even so…

I asked the author of rust-peg about a very limited form of re-use in kevinmehall/rust-peg#191 but haven't heard back. I think there are lots of ways to achieve this (it might just be preprocessing!) but it's not a priority for me.

ncalexan added 5 commits May 10, 2018 10:19
This is a stepping stone to transacting entities that are not based on
`edn::ValueAndSpan`.  We need to turn some value places (general) into
entity places (restricted), and those restrictions are captured in
tx-parser right now.  But for `TypedValue` value places, those
restrictions are encoded in the type itself.  This lays the track to
accept other value types in value places, which is good for
programmatic builder interfaces.
This was always a choice, but we've outgrown it: now we want to accept
value types that don't come from EDN and/or tx-parser.
There are few reasons to do this:

- it's difficult to add symbol interning to combine-based parsers like
  tx-parser -- literally every type changes to reflect the interner,
  and that means every convenience macro we've built needs to chagne.
  It's trivial to add interning to rust-peg-based parsers.

- combine has rolled forward to 3.2, and I spent a similar amount of
  time investigating how to upgrade tx-parser (to take advantage of
  the new parser! macros in combine that I think are necessary for
  adapting to changing types) as I did just converting to rust-peg.

- it's easy to improve the error messages in rust-peg, where-as I have
  tried twice to improve the nested error messages in combine and am
  stumped.

- it's roughly 4x faster to parse strings directly as opposed to
  edn::ValueAndSpan, and it'll be even better when we intern directly.
@ncalexan ncalexan force-pushed the rust-peg-tx-parsing branch from 2e9919c to 9a4bd0d Compare May 10, 2018 17:34
@ncalexan ncalexan merged commit 9a4bd0d into mozilla:master May 10, 2018
ncalexan added a commit to ncalexan/mentat that referenced this pull request May 31, 2018
This is a pre-requisite for moving the existing `combine`-based parser
to use `rust-peg` -- part of the push to use `rust-peg` for all parsing
started in mozilla#681.  We need the
types for the parsed structure "very early", and the `edn` crate is
the earliest such crate.

This is an unfortunate destruction of boundaries between parts of the
system, but it's the best way we have to achieve this right now.
ncalexan added a commit to ncalexan/mentat that referenced this pull request May 31, 2018
This was left over from mozilla#681.
ncalexan added a commit to ncalexan/mentat that referenced this pull request Jun 1, 2018
This was left over from mozilla#681.
ncalexan added a commit to ncalexan/mentat that referenced this pull request Jun 4, 2018
This is a pre-requisite for moving the existing `combine`-based parser
to use `rust-peg` -- part of the push to use `rust-peg` for all parsing
started in mozilla#681.  We need the
types for the parsed structure "very early", and the `edn` crate is
the earliest such crate.

This is an unfortunate destruction of boundaries between parts of the
system, but it's the best way we have to achieve this right now.
ncalexan added a commit to ncalexan/mentat that referenced this pull request Jun 4, 2018
This was left over from mozilla#681.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants