Add and use Liquid::C::BlockBody for the block body by dylanahsmith · Pull Request #58 · Shopify/liquid-c

dylanahsmith · 2020-09-08T15:53:59Z

Problem

Currently liquid-c is mostly used to speed up parsing (evaluating expressions is the exception). However, there is a big opportunity to reduce the overhead of both parsing and rendering by moving to a C representation for block bodies, since we can represent it as a set of instructions that can be efficiently executed directly from C rather than indirectly through another VM and would also allow us to serialize and deserialize it to avoid parsing overhead for pre-compiled templates. The block body is the ideal starting place, since it can encapsulate everything it contains, allowing builtin functionality to be compiled directly into the block bodies instructions.

@macournoyer created a proof of concept liquid VM as part of hack days, which showed significant potential improvements from a liquid VM. Even after modifying it to use a Liquid::Context rather than a simple hash for the context, it was still showing a 4.4x improvement for deserializing and rendering the liquid VM code compared to parsing and rendering liquid using liquid-c. Loading a marshal dump of the Liquid::Template instead of parsing liquid only made it 4.0x slower than liquid VM code, which seems to indicate that we should serialize to a format that can (at least mostly) be executed without deserialization.

Solution

This pull request depends on Shopify/liquid#1289 to provide Block#new_body and Document#new_body to override in order to return a Liquid::C::BlockBody when liquid-c is enabled. These two methods also have access to the parse context, so we still don't need to support profiling for liquid-c's block body.

The code implementing the Liquid::BlockBody#parse override was refactored to implement Liquid::C::BlockBody#parse, so we can build the Liquid::C::BlockBody directly without the overhead of creating a Liquid::BlockBody#nodelist array. This also allows us to avoid copying the raw template strings, which was previously needed to create individual ruby strings, since we can instead just write the output directly from slices of a shared string that contains all the raw template strings.

To keep the scope of this PR more minimal, I haven't included block body serialization and have not added compilation of variables or tags. So the VM currently only has 3 instructions: OP_LEAVE, OP_WRITE_RAW and OP_WRITE_NODE.

The VM keeps track of both an instruction pointer and a constant pointer, which are incremented as instructions or constants are used, so the instructions will be easy to used directly from a serialized state and we won't have to worry about complexities of encoding a constant references (e.g. a byte constant index is too limiting and a multi-byte index can introduce either extra alignment or decoding concerns). We can always iterate on the VM design or even compile directly to machine code in the future.

To make this easier to adopt Liquid::C::BlockBody, I added Liquid::C::BlockBody#nodelist, although I will also be refactoring hot code paths using that in Shopify to not depend on it for performance reasons. As part of our storefront rewrite we have some code to debug differences in the rendered output, which is quite coupled to the liquid parse tree node ruby objects, but doesn't seem to be performance sensitive, so I added a :disable_liquid_c_nodes parse option so this can be used without disabling liquid-c globally (which wouldn't be thread-safe).

Benchmark

On master:

              parse:    146.973  (± 4.1%) i/s -      1.470k in  10.020119s
             render:    135.523  (± 3.0%) i/s -      1.365k in  10.083170s
     parse & render:     64.774  (± 3.1%) i/s -    648.000  in  10.010060s

On this PR's branch:

              parse:    145.177  (± 2.1%) i/s -      1.456k in  10.032703s
             render:    157.193  (± 5.1%) i/s -      1.575k in  10.048723s
     parse & render:     68.667  (± 2.9%) i/s -    690.000  in  10.060160s

Check List

Get Avoid direct coupling to BlockBody instances for liquid-c replacement liquid#1289 reviewed and merged
Get Fix render length resource limit so it doesn't multiply nested output liquid#1285 approved and merged (or rebase code to not depend on it)
Remove the "Temporarily use liquid branch until it has been merged" commit

macournoyer

Impressive work 👏

A few comments, no blockers, as long as we can change the bytecode later.

macournoyer · 2020-09-08T17:42:01Z

+static void block_body_add_write_raw(block_body_t *body, const char *string, size_t size)
+{
+    block_body_write_opcode(body, OP_WRITE_RAW);
+    size_t slice[2] = { (size_t)string, size };


Using slices instead of copying strings is an awesome idea!

But I think storing the address of the pointer will prevent body->constants from being serializable. Could you store the starting index instead?

There will already be constants that we won't use directly from the serialized format, such as uncompiled tags or expression constants (e.g. strings, big integers, constant ranges). In the serialized format, there will end up being a split between these constants to reduce deserialization overhead, so that at least the data for constants like raw strings can be used without deserializing.

I don't think it would be that much overhead for resolving the pointers to raw strings at deserialization, but as mentioned in #58 (comment), I do plan on revisiting the decision of using a moving constant pointer.

macournoyer · 2020-09-08T18:06:44Z

+                return;
+            case OP_WRITE_RAW:
+            {
+                const char *text = (const char *)*const_ptr++;


Very nice and simple way to avoid index overflow 👍

However, it will break once we introduce loops (for, etc.)

I do plan on revisiting the moving constant pointer when adding jumps. It could be implemented by have the jump target be a pair of instruction and constant pointer offsets, so there is a trade-off between jump overhead and constant lookup overhead.

Sounds good. First time seeing this approach...

But when you'll add a pointer offset, I think it will be equivalent to the traditional constants[i], and you'll end up needing multi-byte index too.

macournoyer · 2020-09-08T18:22:50Z

+    alias_method :ruby_new_body, :new_body
+
+    def new_body
+      if parse_context.disable_liquid_c_nodes || parse_context[:profile]


Does profiling disables this only because instrumentation is missing from C::BlockBody at the moment?

Liquid's template render profiling support already wasn't supported in liquid-c. I just chose to keep it that way. Note that this is different from stackprof, which we typically use for profiling at Shopify. I would like to revisit profiling in the future, but not in a way that ties us to calling a render_node ruby method that requires a node object to be passed to it.

felix-d · 2020-09-28T17:04:42Z

As part of our storefront rewrite we have some code to debug differences in the rendered output, which is quite coupled to the liquid parse tree node ruby objects, but doesn't seem to be performance sensitive, so I added a :disable_liquid_c_nodes parse option so this can be used without disabling liquid-c globally (which wouldn't be thread-safe).

We don't use liquid tracking anymore. It can be 🔥

…tag.

Just save and restore the internal struct state around parsing the liquid tag.

dylanahsmith · 2020-09-28T17:52:34Z

Oh right, I removed the use of :disable_liquid_c_nodes in the storefront renderer PR, but didn't completely remove the option here. I've done that now.

pushrax

Very nice, this was easy to follow and looks quite efficient. Also just had some minor non-blocking comments.

pushrax · 2020-09-29T23:07:31Z

+    VALUE obj = TypedData_Make_Struct(klass, block_body_t, &block_body_data_type, body);
+    body->instructions = c_buffer_allocate(8);
+    block_body_add_leave(body);
+    body->constants = c_buffer_allocate(8 * sizeof(VALUE));


How did you choose 8 here?

It was chosen fairly arbitrarily based on the current situation where we could end up with a lot of small block bodies for things like if tags. I tried to keep it as a multiple of 2, since that seemed like it would make it more likely that the memory allocator will re-use buffer allocations for new or expanding buffers and because it could use whole pages for larger memory allocations. However, I haven't tested any of these assumptions yet.

We will probably want to revisit the default Liquid::Document's block body as we start compiling more functionality into VM instructions, since then we will be able to compile builtin block tags into their parent block body.

pushrax · 2020-09-29T23:22:10Z

-                    tags = rb_funcall(self, intern_registered_tags, 0);
-
-                VALUE tag_class = rb_funcall(tags, intern_square_brackets, 1, tag_name);
+                VALUE tag_class = rb_funcall(tag_registry, intern_square_brackets, 1, tag_name);


Looks like you already updated Core to avoid this being a breaking change for SectionTemplate 👍

In theory external Liquid users could rely on this, but it seems pretty unlikely and we just need to note it in the changelog.

We should probably delete BlockBody#registered_tags entirely from Liquid.

pushrax · 2020-09-29T23:59:38Z

+                if (*size_ptr) {
+                    *size_ptr = 0; // effectively a no-op
+                    body->render_score--;
+                }


memmove also seems appropriate here. I guess when we serialize we can elide the extra operation there.

As mentioned in #59 (comment)

I'm not concerned about further optimizing this edge case. Primarily these blank strings are handled in this way to minimize impact on code that isn't relying on this feature while keeping backwards compatibility,

Specifically, once we have jump instructions for control flow tags, we would also be forced to update all the affected jump offsets. That will be easier to introduce without having to deal with jump offset relocation, but afterwards an extra pass could be added that is more clearly beneficial, since we could replace pessimistically large forward jump instructions (e.g. a decoded 24-bit offset or aligned 32-bit offset) with smaller jump instructions (e.g. a byte offset) where possible.

pushrax · 2020-09-30T00:02:37Z

+    VALUE nodelist = rb_attr_get(self, intern_ivar_nodelist);
+    if (nodelist != Qnil)
+        return nodelist;
+    nodelist = rb_ary_new_capa(body->instructions.size / sizeof(VALUE));


Since this is just the capacity it doesn't matter, but it looks like this is including an extra index for the OP_LEAVE.

pushrax · 2020-09-30T00:08:21Z

+    VALUE interrupts = rb_ivar_get(context, id_ivar_interrupts);
+    Check_Type(interrupts, T_ARRAY);
+
+    while (true) {


Would it make sense to do some tracking for how many bytes of the instruction buffer have been consumed, to gracefully handle a corrupted buffer with a missing OP_LEAVE? Perhaps by computing last_ip at the start and doing a single comparison per iteration.

I would prefer to not silently try to handle a corrupt buffer, since that seems likely to lead to more problems. However, I could add an assertion to help with debugging a problem like that.

The idea would be to do something like an rb_bug in that case, yes.

until we can remove this internal coupling.

dylanahsmith requested review from macournoyer and pushrax September 8, 2020 15:53

dylanahsmith mentioned this pull request Sep 8, 2020

Avoid direct coupling to BlockBody instances for liquid-c replacement Shopify/liquid#1289

Merged

dylanahsmith force-pushed the c-block-body branch from 7f1963c to ded00aa Compare September 8, 2020 18:02

dylanahsmith mentioned this pull request Sep 8, 2020

Port liquid-c bug compatible whitespace trimming Shopify/liquid#1291

Merged

macournoyer approved these changes Sep 8, 2020

View reviewed changes

dylanahsmith force-pushed the c-block-body branch from 8c40679 to 51e9c08 Compare September 11, 2020 13:49

dylanahsmith mentioned this pull request Sep 16, 2020

Compile variable nodes into the Liquid::C::BlockBody VM code #59

Merged

dylanahsmith force-pushed the c-block-body branch from 37dfbfa to 3540250 Compare September 25, 2020 22:26

dylanahsmith changed the base branch from master to avoid-expression-match-on-dynamic-variable September 25, 2020 22:26

Base automatically changed from avoid-expression-match-on-dynamic-variable to master September 28, 2020 15:20

dylanahsmith added 3 commits September 28, 2020 13:40

Refactor internal_block_parse to return rather than yield an unknown …

7f94a8b

…tag.

Re-use the same tokenizer object for parsing liquid tags

74988e9

Just save and restore the internal struct state around parsing the liquid tag.

Define a C parse_context struct to pass between C parsing functions

c1cd324

dylanahsmith force-pushed the c-block-body branch from 3540250 to 233322a Compare September 28, 2020 17:49

dylanahsmith force-pushed the c-block-body branch from 233322a to e00cdf9 Compare September 28, 2020 18:05

pushrax approved these changes Sep 30, 2020

View reviewed changes

dylanahsmith added 2 commits September 30, 2020 15:48

Implement BlockBody in C, exposed to ruby as Liquid::C::BlockBody

4d4d32e

Implement Liquid::C::BlockBody#nodelist for backward compatibility

4be2f77

until we can remove this internal coupling.

dylanahsmith force-pushed the c-block-body branch from c05a7ca to 4be2f77 Compare September 30, 2020 19:54

dylanahsmith merged commit c40913e into master Sep 30, 2020

dylanahsmith deleted the c-block-body branch September 30, 2020 19:57

dylanahsmith mentioned this pull request Oct 2, 2020

Encoding::CompatibilityError when using a binary encoded output #65

Closed

Conversation

dylanahsmith commented Sep 8, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Solution

Benchmark

Check List

Uh oh!

macournoyer left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

felix-d commented Sep 28, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dylanahsmith commented Sep 28, 2020

Uh oh!

pushrax left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

dylanahsmith commented Sep 8, 2020 •

edited

Loading

felix-d commented Sep 28, 2020 •

edited

Loading