Implement Liquid::C::Expression to optimize expression evaluation#60
Conversation
df377cf to
afae259
Compare
macournoyer
left a comment
There was a problem hiding this comment.
Looking good! Just some questions.
| break; | ||
| case OP_RENDER_VARIABLE_RESCUE: | ||
| args->node_line_number = (unsigned int)*const_ptr++; | ||
| args->ip = ip; |
There was a problem hiding this comment.
ip and const_ptr are saved because next instruction is assumed to be calling a Ruby function?
There was a problem hiding this comment.
This PR just moves this case to keep the rendering instruction handling together, since they shouldn't be used in the Liquid::C::Expression code.
This was added in the last PR (#59) and described as follows
Another design choice I made for this PR is to do error handling for the whole block body render, rather than for each variable render. This way we can reduce the state saving cost of
rb_rescuefrom liquid code that doesn't encounter variable render errors. In order to recover from the exception, we just restore the stack size to what it was at the start of the block body render and iterate the instructions to jump just past the variable write.
In order to iterate the instructions to the end of the variable render on an exception, we need to save the instruction position at the start of variable rendering.
I'll add some comments to the code to make this clearer.
There was a problem hiding this comment.
I'll also move code around in the last PR so that the diff is easier to read in this PR.
d7d9ee0 to
900c227
Compare
ea00f8b to
5a5b34a
Compare
900c227 to
23b8e12
Compare
5a5b34a to
9d8a3d7
Compare
fe51bc8 to
5dc1ca1
Compare
9d8a3d7 to
239b210
Compare
5dc1ca1 to
5cee268
Compare
239b210 to
eeae3a1
Compare
5cee268 to
56096af
Compare
eeae3a1 to
a2c628d
Compare
a2c628d to
25be6e4
Compare
56096af to
1fef4af
Compare
818b8a0 to
6f561d6
Compare
1fef4af to
2855e4a
Compare
| @@ -25,12 +26,23 @@ void vm_assembler_gc_mark(vm_assembler_t *code) | |||
| switch (*ip++) { | |||
There was a problem hiding this comment.
Can we mark the ops that need to be marked and call liquid_vm_next_instruction in vm.c to move to the next instruction?
There was a problem hiding this comment.
I want to explicitly handle all cases here to make sure we actually mark all ruby constants. Otherwise, it would be easy to accidentally miss a new instruction that has a ruby constant argument, where failing to GC mark that constant could end up being a subtle and hard to debug bug.
Perhaps we could use an assertion to ensure that liquid_vm_next_instruction moves the instruction and constant pointers consistently with this function. However, that brings the annoyance #ifndef NDEBUG specific state.
Instead, what I want to do in the near future is get rid of all non-ruby constants from the constants buffer. That will simplify GC marking, serialization and deserialization. However, that would add copying overhead for OP_WRITE_RAW, which makes sense for serialization & deserialization but is unnecessary overhead without serialization & deserialization.
There was a problem hiding this comment.
get rid of all non-ruby constants from the constants buffer
Where do you plan to put them? It looks like probably interleaved into the code buffer given the implementation of PUSH_INT*, which sounds good to me overall. It does make it more difficult to include medium size non-ruby constants inline. I mentioned in another comment that perhaps we could skip the conversion through RString in some cases, but I also wouldn't want to have strings in the code buffer. (in the OP_LOOKUP_COMMAND we can just use an enum, but I can imagine potentially wanting to skip RString for some drop methods and filter names).
There was a problem hiding this comment.
Yeah, I wanted to add them to the instructions buffer if they can be used in their serialized form.
I was planning on using an immediate string argument for OP_WRITE_RAW, since it seems rare that we would re-use the string and we won't want the deserialization overhead.
For filter, variable and key names, it is actually quite likely that we will re-use the object, so I think a table lookup would make sense.
Longer term, I'm more interested in what is most efficient, where we can re-use code to make things easier.
2e17c30 to
14e0e9a
Compare
| if (rstring_eq(key, "size") || rstring_eq(key, "first") || rstring_eq(key, "last")) | ||
| vm_assembler_add_lookup_command(code, key); | ||
| else | ||
| vm_assembler_add_lookup_const_key(code, key); |
There was a problem hiding this comment.
Feels like we could eventually avoid RString here entirely, but it's pretty minor.
| result = empty_string; | ||
| break; | ||
| } | ||
| break; |
There was a problem hiding this comment.
Very ugly and fast approach 💯
| rb_enc_raise(utf8_encoding, cLiquidSyntaxError, "[:%s] is not a valid expression", symbol_names[p.cur.type]); | ||
|
|
||
| return expr; | ||
| vm_assembler_add_push_nil(code); |
There was a problem hiding this comment.
Are there cases that could hit this?
There was a problem hiding this comment.
The above rb_enc_raise calls are NO_RETURN, so I guess this is just dead code. I'll remove this to avoid confusion.
| } | ||
| if (is_command) { | ||
| Check_Type(key, T_STRING); | ||
| ID intern_key = rb_intern(RSTRING_PTR(key)); |
There was a problem hiding this comment.
Yeah it seems like we could avoid the conversion through RString here as mentioned in another comment. Definitely not a blocker, tiny optimization.
There was a problem hiding this comment.
We could optimize commands for this case, but we would still need to pass a string as the key to rb_funcall(object, id_aref, 1, key) if the object is a hash.
| Check_Type(key, T_STRING); | ||
| ID intern_key = rb_intern(RSTRING_PTR(key)); | ||
| if (rb_respond_to(object, intern_key)) { | ||
| VALUE next_object = rb_funcall(object, rb_intern(RSTRING_PTR(key)), 0); |
There was a problem hiding this comment.
Duplicate rb_intern here (was in the original code, probably my fault)
| @@ -25,12 +26,23 @@ void vm_assembler_gc_mark(vm_assembler_t *code) | |||
| switch (*ip++) { | |||
There was a problem hiding this comment.
get rid of all non-ruby constants from the constants buffer
Where do you plan to put them? It looks like probably interleaved into the code buffer given the implementation of PUSH_INT*, which sounds good to me overall. It does make it more difficult to include medium size non-ruby constants inline. I mentioned in another comment that perhaps we could skip the conversion through RString in some cases, but I also wouldn't want to have strings in the code buffer. (in the OP_LOOKUP_COMMAND we can just use an enum, but I can imagine potentially wanting to skip RString for some drop methods and filter names).
|
|
||
| def test_find_dynamic_variable | ||
| context = Liquid::Context.new({"x" => "y", "y" => 42}) | ||
| expr = Liquid::C::Expression.strict_parse('[x]') |
There was a problem hiding this comment.
I thought [x] on its own was supposed to be a syntax error, and that it required an identifier in front. It looks like this will make {{ [x] }} possible, which is currently disallowed
Liquid::Template.parse("{{ [x] }}", "x" => "y", "y" => 42, error_mode: :strict).render
...
# => Liquid::SyntaxError (Liquid syntax error: [:open_square, "["] is not a valid expression in "{{ [x] }}")I'm not necessarily opposed to making it allowed, but we should get consensus from @Shopify/guardians-of-the-liquid and modify this in Ruby too.
There was a problem hiding this comment.
Oh, I was looking at what was possible with Liquid::Expression.parse("[x]"). I didn't realize we considered a dynamic variable lookup to be a strict parse error. I change it back to being a strict parse error, since I didn't intend to change it here.
There was a problem hiding this comment.
I’ll also note that if you try to parse and render it in lax mode, it returns an empty string.
There was a problem hiding this comment.
~/src/liquid(master)$ bundle exec irb -Ilib -rliquid
irb> Liquid::Template.parse('{{ [x] }}').render({ 'x' => 'y', 'y' => 42 })
=> "42"
irb> context = Liquid::Context.new({ 'x' => 'y', 'y' => 42 })
irb> expression = Liquid::Expression.parse('[x]')
irb> context.evaluate(expression)
=> 42
There was a problem hiding this comment.
Oh, it was returning an empty string for you since you were passing the variable values as an argument to parse, when they are an argument to render
There was a problem hiding this comment.
Ah, obviously 🤦
Well, maybe we should make this work on the strict parser then. Because it works in prod with the lax fallback.
There was a problem hiding this comment.
Actually, it looks like this has been the behaviour strict parse behaviour for variables in liquid-c for a long time now. I think since variable parsing was added to liquid-c (see how TOKEN_OPEN_SQUARE is allowed at the start of parse_variable in the PR that introduced the function https://github.com/Shopify/liquid-c/pull/13/files#diff-307861feddcb5da5f0af73741d9fdea2R93)
So I'm hesitant to change it. Especially since it could be a breaking change for strictly parsed code. I'm also unsure if it was really the intention of the strict parser to remove any features, so we could consider whether this should be considered a strict parser bug in the liquid gem.
There was a problem hiding this comment.
I agree, in the end it seems we should add support for this to the Ruby strict parser.
| vm_assembler_add_push_const(code, keyword_args); | ||
|
|
||
| if (const_keyword_args != Qnil) { | ||
| rb_hash_freeze(const_keyword_args); |
There was a problem hiding this comment.
I think we are going to have to always freeze or always dup the hash. Otherwise, it is too easy to test a filter using dynamic keyword arguments and then get errors in production when a frozen hash is passed in for constant keyword arguments.
There was a problem hiding this comment.
I've opened #88 to remove the constant keyword arguments optimization on master. We can always add it back later.
6e7d0d2 to
6525790
Compare
6525790 to
e5c3c4a
Compare
Depends on #59 and Shopify/liquid#1300Problem
In #59, expressions were compiled to ruby constants or lookup ruby objects (Liquid::VariableLookup & Liquid::RangeLookup) that are evaluated with
context_evaluate. This was done in that PR for simplicity, but they should really be compiled to more granular VM instructions for rendering performance and for serializability (for future caching of VM compiled templates).Solution
I replaced the previous code for strict parsing expressions and variable lookups to instead emit VM code, which is now compiled directly into Liquid::C::BlockBody objects when compiling variables. Expressions are also often parsed and evaluated independent of Liquid::Variable in tags, so I introduced a
Liquid::C::Expressionobject that gets returned fromLiquid::Expression.parseif the expression includes a variable lookup and strict parsing of it succeeds. I ended up doing both of these in the same commit, since I also leverage theLiquid::C::Expressionobject for filter keyword argument parsing to handle automatic cleanup on a parse exception.Note that a couple of direct uses of
Liquid::Variable.new(in assign and echo tags) no longer leverage liquid-c for parsing, which slightly impacts parse speed. I'll embed those tags in the liquid VM code in a follow-up PR rather than introducing a Liquid::C::Variable class. This PR significantly speeds up rendering, as shown in the benchmarks, which more than makes up for this parsing regression.Since
Liquid::Variable.newis no longer monkey patched andLiquid::Expression.parseno longer turns a comparable object, I had to update the liquid-c variable unit tests so they are tested through normal template parsing and rendering like an integration test. I also stopped testing against the liquid gem's variable unit tests for the same reasons.To help with detecting and debugging VM stack underflows or overflows, I added some assertions that are enabled by default for CI and gem development but are disabled by default for gem installation.
Benchmarks
Before this PR (on the #59 branch)
after