Parse transactions with rust-peg rather than combine#681
Conversation
rnewman
left a comment
There was a problem hiding this comment.
Literally the only thing I don't like about this parser change is that it puts types into edn. We will at some point need to publish that edn crate on crates.io, I expect.
Is there a reasonable way to encapsulate/reuse those bits of grammar? I know we have more than one entry point, but even so…
|
|
||
| /// The transactor is tied to `edn::ValueAndSpan` right now, but in the future we'd like to support | ||
| /// `TypedValue` directly for programmatic use. `TransactableValue` encapsulates the interface | ||
| /// value types (i.e., values in the value place) need to support to be transacted. |
There was a problem hiding this comment.
So this is something like PatternNonValuePlace versus PatternValuePlace, right?
In the query engine we need to handle some kinds of ambiguity — 123 can be a ref or a Long, :foo/bar can be a keyword or an attribute, etc. — and the ambiguities differ for value and non-value places.
There was a problem hiding this comment.
Yes. The Entity type started out pretty simple (like [e a v]) and rapidly grew cases, which were literally represented (like EntidOrTempIdOrLookupRef). That's not scaling, so I'm looking at calling those EntityPlace/AttributePlace/ValuePlace. That's not quite the same as what you're suggesting below -- naming the branches -- but it leads to it eventually.
| } | ||
|
|
||
| impl TransactableValue for ValueAndSpan { | ||
| fn into_typed_value(self, schema: &Schema, value_type: ValueType) -> Result<TypedValue> { |
There was a problem hiding this comment.
I've been thinking for a while that we really have a trait, Interpret or WithRespectTo, which is like this:
fn interpret<T>(self, schema: &Schema) -> Result<T>;
— we do this in a bunch of places, like interpreting an entity ID against a schema to get an attribute, interpreting a keyword against a schema to get an entid, and so on.
This seems like an extension of that, with the obvious complication that you need some way of refining the expected type…
There was a problem hiding this comment.
Aye. This is context-sensitive parsing: in your trait the context is fixed to be &Schema, but in the transactor we really care about a context that is (&Schema, ValueType). I don't see a clear path to this specific Interpret trait right now, but it's definitely the direction the transactor and the query engine should be heading.
| match v.as_tempid() { | ||
| Some(tempid) => Ok(Either::Right(LookupRefOrTempId::TempId(self.intern_temp_id(tempid)))), | ||
| None => { | ||
| if let TypedValue::Ref(entid) = v.into_typed_value(&self.schema, ValueType::Ref)? { |
There was a problem hiding this comment.
This block really makes me think that PatternNonValue::Ref, PatternNonValue::Tempid or whatever is neater than this set of switches and Eithers.
I don't really care about publishing the What I might be able to do is push the transaction types into a sub-module and put all of that stuff behind a feature flag, so that a consumer that didn't want Mentat-specific
I asked the author of |
This is a stepping stone to transacting entities that are not based on `edn::ValueAndSpan`. We need to turn some value places (general) into entity places (restricted), and those restrictions are captured in tx-parser right now. But for `TypedValue` value places, those restrictions are encoded in the type itself. This lays the track to accept other value types in value places, which is good for programmatic builder interfaces.
This was always a choice, but we've outgrown it: now we want to accept value types that don't come from EDN and/or tx-parser.
There are few reasons to do this: - it's difficult to add symbol interning to combine-based parsers like tx-parser -- literally every type changes to reflect the interner, and that means every convenience macro we've built needs to chagne. It's trivial to add interning to rust-peg-based parsers. - combine has rolled forward to 3.2, and I spent a similar amount of time investigating how to upgrade tx-parser (to take advantage of the new parser! macros in combine that I think are necessary for adapting to changing types) as I did just converting to rust-peg. - it's easy to improve the error messages in rust-peg, where-as I have tried twice to improve the nested error messages in combine and am stumped. - it's roughly 4x faster to parse strings directly as opposed to edn::ValueAndSpan, and it'll be even better when we intern directly.
2e9919c to
9a4bd0d
Compare
This is a pre-requisite for moving the existing `combine`-based parser to use `rust-peg` -- part of the push to use `rust-peg` for all parsing started in mozilla#681. We need the types for the parsed structure "very early", and the `edn` crate is the earliest such crate. This is an unfortunate destruction of boundaries between parts of the system, but it's the best way we have to achieve this right now.
This was left over from mozilla#681.
This was left over from mozilla#681.
This is a pre-requisite for moving the existing `combine`-based parser to use `rust-peg` -- part of the push to use `rust-peg` for all parsing started in mozilla#681. We need the types for the parsed structure "very early", and the `edn` crate is the earliest such crate. This is an unfortunate destruction of boundaries between parts of the system, but it's the best way we have to achieve this right now.
This was left over from mozilla#681.
OK, I'm done with
combine-- it's just too much work to evolve things forward. See the notes in the commit message for fad0b6d. I'm happy to add more details if needed.This builds on top of my other work in progress (#673), so this review is for 5e11c0a..2e9919c only.