Compile variable nodes into the Liquid::C::BlockBody VM code by dylanahsmith · Pull Request #59 · Shopify/liquid-c

dylanahsmith · 2020-09-16T17:18:40Z

~~Depends on #58 and Shopify/liquid#1294~~

This is a continuation of #58 that compiles the strict parseable liquid variables directly in the block body VM code for performance.

For simplicity, variables and expressions in tags do not yet leverage this VM code and the variable VM code still relies on the existing expression evaluation code (e.g. context_evaluate of Liquid::VariableLookup and Liquid::RangeLookup). These will be optimized in following PR(s).

All the commits except the (currently) last one (Compile variable nodes into the Liquid::C::BlockBody VM code) are refactors:

I've moved the VM instructions and constants out of block_body_t into a vm_assembler_t struct, since we will want to embed the VM code in a future Liquid::C::Variable. In the future, I intend to wrap the assembler in a Liquid::C::Assembler as a safe API for compiling tags from ruby code (edit: extracted to Extract a vm_assembler_t struct from block_body_t #69)
I changed the c_buffer struct to store pointers for the end of the data and capacity, instead of storing the sizes and capacity directly. This is because we are mostly getting and updating the data end pointer, which is especially true in order to use this for the VM stack where we reserve space for the block and don't have to check the capacity for each write/push. (edit: extracted to Refactor c_buffer to make it suitable to re-use for a stack #70)
~~I've also added support to c_buffer to be initialized with size 0 (without an allocation) which is the one case where we can't just double the capacity until we have the requested extra capacity.~~ (edit: extracted to Refactor c_buffer to make it suitable to re-use for a stack #70)
I introduced a Liquid::C::VM that is attached to the Liquid::Context through an hidden instance variable, so the stack can be shared with nested blocks
Leveraging Refactor to support liquid-c VM compilation of variables liquid#1294 to only run liquid integration tests to avoid having to be fully compatible with the deprecated Liquid::BlockBody#nodelist
I've added a liquid_vm_next_instruction function to reduce duplication for code iterating instructions by using it for both Liquid::C::BlockBody#remove_blank_strings and Liquid::C::BlockBody#nodelist. This way I can further leverage it for variable error handling. (edit: extracted to Add liquid_vm_next_instruction to reduce duplication #72)
Drop support for ruby 2.4 which is no longer supported upstream

The VM stack is one of the significant additions from this PR. I chose to re-use the stack for nested blocks, which means that it needs to be expandable, hence the use of c_buffer_t for its memory allocation. We can't globally determine the maximum stack size, since this stack could even be re-used in a template partial, but this PR does calculate the maximum stack size needed for a block body so that we can reserve stack space at the start of block body render and not have to check for sufficient capacity throughout the block body render.

Another design choice I made for this PR is to do error handling for the whole block body render, rather than for each variable render. This way we can reduce the state saving cost of rb_rescue from liquid code that doesn't encounter variable render errors. In order to recover from the exception, we just restore the stack size to what it was at the start of the block body render and iterate the instructions to jump just past the variable write.

Since the variable lookup and expression evaluation is still happen as it did before, the most significant change for performance is the filter invocation, which can happen without allocating an arguments array by leveraging the VM stack calling the filter function directly using rb_funcallv. The filter invokable? is still being done at runtime (we need to pass filters as a parse option to do it at parse time) but is still optimized by doing the check directly from C using a hash with symbol keys.

Benchmark

Before this PR (on the #58 branch)

              parse:    146.972  (± 2.0%) i/s -      1.470k in  10.007250s
             render:    162.266  (± 2.5%) i/s -      1.635k in  10.081595s
     parse & render:     70.513  (± 2.8%) i/s -    708.000  in  10.047458s

after

              parse:    163.206  (± 1.8%) i/s -      1.632k in  10.002316s
             render:    172.302  (± 4.1%) i/s -      1.734k in  10.080641s
     parse & render:     78.371  (± 1.3%) i/s -    784.000  in  10.006170s

macournoyer

All comments are refactoring & optimization ideas. This is shaping up nicely 👏 Performance improvements are adding up fast!

Great decision on going the stackless route (re-using the stack for nested blocks) 👍

However, I'm still doubtful about the constant pointer approach.

macournoyer · 2020-09-17T14:29:53Z

+        if (*ip == OP_WRITE_RAW) {
+            size_t *size_ptr = &const_ptr[1];
+            if (*size_ptr) {
+                *size_ptr = 0; // effectively a no-op


Could you instead introduce a OP_NOP instruction, and replace the whole instruction?

Yes, although it would then have an unused constant arguments. I imagine that a OP_NOP would be faster, but I'm not concerned about further optimizing this edge case. Primarily these blank strings are handled in this way to minimize impact on code that isn't relying on this feature while keeping backwards compatibility, so this seemed simpler than a no-op instruction.

keeping backwards compatibility

For #nodelist? I guess this discussion will answer #58 (comment) too

Regarding backwards compatibility, I was referring to removing/ignoring blank strings from blank blocks. I think our only remaining dependency on nodelist is for tags.

macournoyer · 2020-09-17T14:51:01Z

+    }
+}
+
+void liquid_vm_next_instruction(const uint8_t **ip_ptr, const size_t **const_ptr_ptr)


This got me confused at first. I think it should be renamed to liquid_vm_next_instruction_and_const.

But I still don't get the advantage of this approach over passing the constant index via an operand.

And I think it is painting us in a corner for many optimizations (block_body_remove_blank_strings is an example).

These constants are pointer sized operands to the instruction, similar to one the byte sized ones.

In hindsight, the instruction pointer should be been a struct with a pointer to both to provide a bit of a zero-cost abstraction around them. We are also missing an abstraction for accessing those operands from the block body code. However, this PR is at least a step in the right directly by moving the vm assembler out of block.c.

However, all of this is still internal to liquid-c, so we can definitely change the VM instructions.

It's not really an operand if it's not in the bytecode, in an instruction. So it's a bit like having two sets of instructions. The problem we'll have is keeping them in sync all the time if we modify the bytecode.

Not sure what you mean RE the struct with a pointer.

But all good, you're right, step in the right direction!

I think the idea is that rather than using a separate ip_ptr and const_ptr_ptr the code would use an ip struct that has equivalent ip_ptr and const_ptr_ptr members. When calling any instruction-related functions you would always pass the ip struct around.

In the end the semantics of "operand" are somewhat arbitrary, at the moment the constants buffer just has pointers/small bits of metadata in it and could be interleaved into the instructions buffer and everything would be quite equivalent, with slightly different performance for some tasks.

@dylanahsmith what are the reasons you ended up going with the separate constants buffer?

Yes, that is what I meant for the instruction pointer struct.

I want the constants separate, because they will require serialization and deserialization, unlike the byte code which is already serialized and doesn't require deserialization for execution. Ruby's in-memory VM code interleaves the pointers immediately into the instruction, which ended up leading to pointer size instructions (to avoid alignment issues) and requires RubyVM::InstructionSequence.load_from_binary to deserialize the instructions.

Of course, we can also have a constant table, which is ideal for constants that get re-used (e.g. filter names, variable names and object keys are common to re-use) and allows only those ruby constants to be deserialized. I want to switch to that after adding serialization support.

macournoyer · 2020-09-17T16:46:46Z

+
+    if (new_size > capacity) {
+        do {
+            capacity *= 2;


Doubling seems excessive. Max Fixnum value can be stored in 10 bytes. Lets say the output buffer is 10KB, doubling for the sake of adding 10 bytes?

Maybe this could rewritten to use rb_str_buf_cat_ascii, and let Ruby handle the resizing? Writing the fixnum to a char number_str[11] on the stack.

Or replace the whole thing w/ rb_str_buf_append(output, rb_fix2str(fixnum));?

This is unlikely to be the last 10 bytes, so doubling is meant to amortize the cost of resizing the output string.

Lets say the output buffer is 10KB, doubling for the sake of adding 10 bytes?

Maybe this could rewritten to use rb_str_buf_cat_ascii, and let Ruby handle the resizing?

You would actually have approximately the same behaviour. I created https://github.com/dylanahsmith/ruby-string-buffer a long time ago to make it easier to inspect the capacity.

irb> str = "1" * (10 * 1024); irb> StringBuffer.capacity(str) => 10240 irb> str << "2"; irb> StringBuffer.capacity(str) => 20481

which actually allocates 1 extra byte, which keeps the allocation from being page aligned.

Writing the fixnum to a char number_str[11] on the stack.

We would still have to resize the output buffer to copy it from the stack to the output buffer, so this approach avoids the copy step.

Or replace the whole thing w/ rb_str_buf_append(output, rb_fix2str(fixnum));?

That would trade-off performance for simplicity, since it would allocate a new string object to write the result to as well as having to copy the result to the output buffer. The primary reason for writing write_fixnum was to avoid that string allocation overhead.

macournoyer · 2020-09-17T17:13:07Z

+                size_t hash_size = *ip++;
+                size_t num_keys_and_values = hash_size * 2;
+                VALUE hash = rb_hash_new();
+                VALUE *args_ptr = vm_stack_pop_n_use_in_place(vm, num_keys_and_values);


This falls into place so nicely, wow! Everything about this block of code is awesome 👌

macournoyer · 2020-09-17T17:42:05Z

+    // since this could be called in the middle of parsing
+    const uint8_t *end_ip = code->instructions.data_end;
+    while (ip < end_ip) {
+        switch (*ip++) {


This type of loop is done in a few places, could be refactored using a table storing info about each instructions:

typedef struct instruction_def_t { size_t consts; size_t operands; } instruction_def_t; instructions_def_t instructions_defs[OP_MAX] = { /* OP_LEAVE */, { .consts = 0, operands = 0 }, /* OP_WRITE_RAW */, { .consts = 2, operands = 0 }, /* OP_WRITE_NODE */, { .consts = 1, operands = 0 }, /* OP_POP_WRITE_VARIABLE */, { .consts = 0, operands = 0 }, /* OP_PUSH_CONST */, { .consts = 1, operands = 0 }, /* OP_HASH_NEW */, { .consts = 0, operands = 1 }, /* OP_FILTER */, { .consts = 0, operands = 1 }, /* OP_PUSH_EVAL_EXPR */, { .consts = 1, operands = 0 }, /* OP_RENDER_VARIABLE_RESCUE */, { .consts = 0, operands = 0 }, };

Then you no longer have to review each of those loops each time an instruction is added or modified:

switch (*ip++) { // Special cases here case ...: break; default: const_ptr += instructions_defs[ip-1].consts; ip += instructions_defs[ip-1].operands; }

I introduced liquid_vm_next_instruction in this PR which is now used in 3 places to reduce duplication. The only remaining two, vm_assembler_gc_mark and vm_render_until_error have to handle all the opcodes. I think the only remaining duplication is between vm_assembler_gc_mark and liquid_vm_next_instruction, but they differ in that vm_assembler_gc_mark also needs to know which constants are ruby objects, so we would need to encode that into the instruction definition as well.

I was considering adding the table based approach to support the disassembler, so perhaps I could re-use it for other purposes if I don't switch to another approach for marking.

Yeah just an idea. I've seen it used often in VMs. Here's the one in .NET Core: https://github.com/dotnet/runtime/blob/master/src/coreclr/src/inc/opcode.def

Could be useful for building a debugger & disassembler later.

pushrax

Partially done this review - will finish tomorrow.

pushrax · 2020-09-30T00:36:33Z

+    }
+}
+
+void liquid_vm_next_instruction(const uint8_t **ip_ptr, const size_t **const_ptr_ptr)


I think the idea is that rather than using a separate ip_ptr and const_ptr_ptr the code would use an ip struct that has equivalent ip_ptr and const_ptr_ptr members. When calling any instruction-related functions you would always pass the ip struct around.

In the end the semantics of "operand" are somewhat arbitrary, at the moment the constants buffer just has pointers/small bits of metadata in it and could be interleaved into the instructions buffer and everything would be quite equivalent, with slightly different performance for some tasks.

@dylanahsmith what are the reasons you ended up going with the separate constants buffer?

pushrax · 2020-09-30T00:41:03Z

    if (nodelist != Qnil)
        return nodelist;
-    nodelist = rb_ary_new_capa(body->instructions.size / sizeof(VALUE));
+    nodelist = rb_ary_new_capa(body->render_score);


Ah this addresses #58 (comment) which you can now ignore.

pushrax · 2020-09-30T00:41:59Z

+        if (*ip == OP_WRITE_RAW) {
+            size_t *size_ptr = &const_ptr[1];
+            if (*size_ptr) {
+                *size_ptr = 0; // effectively a no-op


keeping backwards compatibility

For #nodelist? I guess this discussion will answer #58 (comment) too

pushrax · 2020-09-30T00:43:13Z

-                const char *text = (const char *)*const_ptr++;
-                size_t size = *const_ptr++;
+                const char *text = (const char *)const_ptr[0];
+                size_t size = const_ptr[1];


This is definitely a bit faster to read.

pushrax · 2020-09-30T00:48:12Z

-    buffer->data = xrealloc(buffer->data, new_capacity);
-    buffer->capacity = new_capacity;
+        capacity *= 2;
+    } while (capacity < required_capacity);


There's a non-branching bit twiddling trick for this https://graphics.stanford.edu/~seander/bithacks.html#RoundUpPowerOf2

Though that micro-optimization is irrelevant in this code given it's about to do a heap allocation :) Just a fun bit of art.

pushrax · 2020-09-30T00:59:44Z

+{
+    switch (p->cur.type) {
+        case TOKEN_IDENTIFIER:
+        case TOKEN_OPEN_SQUARE:


I think this should have TOKEN_OPEN_ROUND for ranges. Also potentially TOKEN_DOTDOT for completeness though it doesn't look necessary based on the places this is called.

I confirmed that a range is a valid argument to a filter and is a valid variable.

I wasn't too concerned about this function identifying all constants, since it is only used as part of an optimization for calling filters with constant arguments. I did add support for constant ranges in the corresponding try_parse_constant_expression function in the follow-up PR (#60), so I could backport that case to this PR.

Also potentially TOKEN_DOTDOT for completeness though it doesn't look necessary based on the places this is called.

TOKEN_DOTDOT isn't a valid token for starting a constant expression, so handling it with the default case is fine.

pushrax · 2020-09-30T01:14:26Z

+
+static inline void parse_and_compile_expression(parser_t *p, vm_assembler_t *code)
+{
+    bool is_const = will_parse_constant_expression_next(p);


Is this mostly an optimization over checking for #evaluate?

It was mostly a forward thinking approach, since we want to avoid even creating a VariableLookup or RangeLookup when parsing expressions in the future. However, I did also think it would be faster than calling rb_respond_to

…elist

dylanahsmith requested review from macournoyer and pushrax September 16, 2020 17:18

dylanahsmith mentioned this pull request Sep 16, 2020

Refactor to support liquid-c VM compilation of variables Shopify/liquid#1294

Merged

macournoyer approved these changes Sep 17, 2020

View reviewed changes

dylanahsmith force-pushed the vm-variable branch from f88d1a7 to 19be073 Compare September 23, 2020 19:32

dylanahsmith mentioned this pull request Sep 24, 2020

Implement Liquid::C::Expression to optimize expression evaluation #60

Merged

dylanahsmith force-pushed the c-block-body branch from 37dfbfa to 3540250 Compare September 25, 2020 22:26

dylanahsmith force-pushed the vm-variable branch from ea00f8b to 5a5b34a Compare September 25, 2020 22:27

dylanahsmith force-pushed the c-block-body branch 2 times, most recently from 233322a to e00cdf9 Compare September 28, 2020 18:05

dylanahsmith force-pushed the vm-variable branch from 5a5b34a to 9d8a3d7 Compare September 28, 2020 20:49

pushrax reviewed Sep 30, 2020

View reviewed changes

dylanahsmith mentioned this pull request Sep 30, 2020

Add and use Liquid::C::BlockBody for the block body #58

Merged

3 tasks

dylanahsmith force-pushed the vm-variable branch from 9d8a3d7 to 239b210 Compare September 30, 2020 19:27

dylanahsmith force-pushed the c-block-body branch from c05a7ca to 4be2f77 Compare September 30, 2020 19:54

Base automatically changed from c-block-body to master September 30, 2020 19:57

dylanahsmith force-pushed the vm-variable branch from 239b210 to eeae3a1 Compare September 30, 2020 19:57

dylanahsmith mentioned this pull request Oct 2, 2020

Implement Liquid::ResourceLimits #64

Merged

dylanahsmith force-pushed the vm-variable branch from eeae3a1 to a2c628d Compare October 5, 2020 15:29

dylanahsmith force-pushed the vm-variable branch from a2c628d to 25be6e4 Compare October 6, 2020 18:45

dylanahsmith requested a review from peterzhu2118 October 6, 2020 21:29

peterzhu2118 approved these changes Oct 7, 2020

View reviewed changes

Comment thread ext/liquid_c/vm.c

dylanahsmith force-pushed the vm-variable branch from 25be6e4 to c32997e Compare October 7, 2020 15:53

dylanahsmith added 3 commits October 7, 2020 16:09

Attach a hidden Liquid::C:VM object on the Liquid::Context

ecad3a9

Ignore most liquid unit tests to avoid fully supporting BlockBody#nod…

2c0815d

…elist

Move a couple of ID variables to liquid.c and liquid.h for sharing

9c454dd

Remove support for ruby 2.4 that is no longer supported upstream

4c1e009

dylanahsmith force-pushed the vm-variable branch from c32997e to 818b8a0 Compare October 7, 2020 20:11

Compile variable nodes into the Liquid::C::BlockBody VM code

6f561d6

dylanahsmith force-pushed the vm-variable branch from 818b8a0 to 6f561d6 Compare October 7, 2020 20:45

dylanahsmith merged commit 76e66bd into master Oct 7, 2020

dylanahsmith deleted the vm-variable branch October 7, 2020 21:28

This was referenced Oct 8, 2020

Test reporting of liquid error for filter call with wrong number of arguments Shopify/liquid#1311

Merged

Add support for serializing liquid templates along with their VM code #77

Open

Conversation

dylanahsmith commented Sep 16, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmark

Uh oh!

macournoyer left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

pushrax left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

dylanahsmith commented Sep 16, 2020 •

edited

Loading