Add and use Liquid::C::BlockBody for the block body#58
Conversation
7f1963c to
ded00aa
Compare
macournoyer
left a comment
There was a problem hiding this comment.
Impressive work 👏
A few comments, no blockers, as long as we can change the bytecode later.
| static void block_body_add_write_raw(block_body_t *body, const char *string, size_t size) | ||
| { | ||
| block_body_write_opcode(body, OP_WRITE_RAW); | ||
| size_t slice[2] = { (size_t)string, size }; |
There was a problem hiding this comment.
Using slices instead of copying strings is an awesome idea!
But I think storing the address of the pointer will prevent body->constants from being serializable. Could you store the starting index instead?
There was a problem hiding this comment.
There will already be constants that we won't use directly from the serialized format, such as uncompiled tags or expression constants (e.g. strings, big integers, constant ranges). In the serialized format, there will end up being a split between these constants to reduce deserialization overhead, so that at least the data for constants like raw strings can be used without deserializing.
I don't think it would be that much overhead for resolving the pointers to raw strings at deserialization, but as mentioned in #58 (comment), I do plan on revisiting the decision of using a moving constant pointer.
| return; | ||
| case OP_WRITE_RAW: | ||
| { | ||
| const char *text = (const char *)*const_ptr++; |
There was a problem hiding this comment.
Very nice and simple way to avoid index overflow 👍
However, it will break once we introduce loops (for, etc.)
There was a problem hiding this comment.
I do plan on revisiting the moving constant pointer when adding jumps. It could be implemented by have the jump target be a pair of instruction and constant pointer offsets, so there is a trade-off between jump overhead and constant lookup overhead.
There was a problem hiding this comment.
Sounds good. First time seeing this approach...
But when you'll add a pointer offset, I think it will be equivalent to the traditional constants[i], and you'll end up needing multi-byte index too.
| alias_method :ruby_new_body, :new_body | ||
|
|
||
| def new_body | ||
| if parse_context.disable_liquid_c_nodes || parse_context[:profile] |
There was a problem hiding this comment.
Does profiling disables this only because instrumentation is missing from C::BlockBody at the moment?
There was a problem hiding this comment.
Liquid's template render profiling support already wasn't supported in liquid-c. I just chose to keep it that way. Note that this is different from stackprof, which we typically use for profiling at Shopify. I would like to revisit profiling in the future, but not in a way that ties us to calling a render_node ruby method that requires a node object to be passed to it.
8c40679 to
51e9c08
Compare
37dfbfa to
3540250
Compare
We don't use liquid tracking anymore. It can be 🔥 |
Just save and restore the internal struct state around parsing the liquid tag.
3540250 to
233322a
Compare
|
Oh right, I removed the use of |
233322a to
e00cdf9
Compare
pushrax
left a comment
There was a problem hiding this comment.
Very nice, this was easy to follow and looks quite efficient. Also just had some minor non-blocking comments.
| VALUE obj = TypedData_Make_Struct(klass, block_body_t, &block_body_data_type, body); | ||
| body->instructions = c_buffer_allocate(8); | ||
| block_body_add_leave(body); | ||
| body->constants = c_buffer_allocate(8 * sizeof(VALUE)); |
There was a problem hiding this comment.
How did you choose 8 here?
There was a problem hiding this comment.
It was chosen fairly arbitrarily based on the current situation where we could end up with a lot of small block bodies for things like if tags. I tried to keep it as a multiple of 2, since that seemed like it would make it more likely that the memory allocator will re-use buffer allocations for new or expanding buffers and because it could use whole pages for larger memory allocations. However, I haven't tested any of these assumptions yet.
We will probably want to revisit the default Liquid::Document's block body as we start compiling more functionality into VM instructions, since then we will be able to compile builtin block tags into their parent block body.
| tags = rb_funcall(self, intern_registered_tags, 0); | ||
|
|
||
| VALUE tag_class = rb_funcall(tags, intern_square_brackets, 1, tag_name); | ||
| VALUE tag_class = rb_funcall(tag_registry, intern_square_brackets, 1, tag_name); |
There was a problem hiding this comment.
Looks like you already updated Core to avoid this being a breaking change for SectionTemplate 👍
In theory external Liquid users could rely on this, but it seems pretty unlikely and we just need to note it in the changelog.
We should probably delete BlockBody#registered_tags entirely from Liquid.
| if (*size_ptr) { | ||
| *size_ptr = 0; // effectively a no-op | ||
| body->render_score--; | ||
| } |
There was a problem hiding this comment.
memmove also seems appropriate here. I guess when we serialize we can elide the extra operation there.
There was a problem hiding this comment.
As mentioned in #59 (comment)
I'm not concerned about further optimizing this edge case. Primarily these blank strings are handled in this way to minimize impact on code that isn't relying on this feature while keeping backwards compatibility,
Specifically, once we have jump instructions for control flow tags, we would also be forced to update all the affected jump offsets. That will be easier to introduce without having to deal with jump offset relocation, but afterwards an extra pass could be added that is more clearly beneficial, since we could replace pessimistically large forward jump instructions (e.g. a decoded 24-bit offset or aligned 32-bit offset) with smaller jump instructions (e.g. a byte offset) where possible.
| VALUE nodelist = rb_attr_get(self, intern_ivar_nodelist); | ||
| if (nodelist != Qnil) | ||
| return nodelist; | ||
| nodelist = rb_ary_new_capa(body->instructions.size / sizeof(VALUE)); |
There was a problem hiding this comment.
Since this is just the capacity it doesn't matter, but it looks like this is including an extra index for the OP_LEAVE.
| VALUE interrupts = rb_ivar_get(context, id_ivar_interrupts); | ||
| Check_Type(interrupts, T_ARRAY); | ||
|
|
||
| while (true) { |
There was a problem hiding this comment.
Would it make sense to do some tracking for how many bytes of the instruction buffer have been consumed, to gracefully handle a corrupted buffer with a missing OP_LEAVE? Perhaps by computing last_ip at the start and doing a single comparison per iteration.
There was a problem hiding this comment.
I would prefer to not silently try to handle a corrupt buffer, since that seems likely to lead to more problems. However, I could add an assertion to help with debugging a problem like that.
There was a problem hiding this comment.
The idea would be to do something like an rb_bug in that case, yes.
until we can remove this internal coupling.
c05a7ca to
4be2f77
Compare
Problem
Currently liquid-c is mostly used to speed up parsing (evaluating expressions is the exception). However, there is a big opportunity to reduce the overhead of both parsing and rendering by moving to a C representation for block bodies, since we can represent it as a set of instructions that can be efficiently executed directly from C rather than indirectly through another VM and would also allow us to serialize and deserialize it to avoid parsing overhead for pre-compiled templates. The block body is the ideal starting place, since it can encapsulate everything it contains, allowing builtin functionality to be compiled directly into the block bodies instructions.
@macournoyer created a proof of concept liquid VM as part of hack days, which showed significant potential improvements from a liquid VM. Even after modifying it to use a Liquid::Context rather than a simple hash for the context, it was still showing a 4.4x improvement for deserializing and rendering the liquid VM code compared to parsing and rendering liquid using liquid-c. Loading a marshal dump of the Liquid::Template instead of parsing liquid only made it 4.0x slower than liquid VM code, which seems to indicate that we should serialize to a format that can (at least mostly) be executed without deserialization.
Solution
This pull request depends on Shopify/liquid#1289 to provide Block#new_body and Document#new_body to override in order to return a Liquid::C::BlockBody when liquid-c is enabled. These two methods also have access to the parse context, so we still don't need to support profiling for liquid-c's block body.
The code implementing the Liquid::BlockBody#parse override was refactored to implement Liquid::C::BlockBody#parse, so we can build the Liquid::C::BlockBody directly without the overhead of creating a
Liquid::BlockBody#nodelistarray. This also allows us to avoid copying the raw template strings, which was previously needed to create individual ruby strings, since we can instead just write the output directly from slices of a shared string that contains all the raw template strings.To keep the scope of this PR more minimal, I haven't included block body serialization and have not added compilation of variables or tags. So the VM currently only has 3 instructions: OP_LEAVE, OP_WRITE_RAW and OP_WRITE_NODE.
The VM keeps track of both an instruction pointer and a constant pointer, which are incremented as instructions or constants are used, so the instructions will be easy to used directly from a serialized state and we won't have to worry about complexities of encoding a constant references (e.g. a byte constant index is too limiting and a multi-byte index can introduce either extra alignment or decoding concerns). We can always iterate on the VM design or even compile directly to machine code in the future.
To make this easier to adopt Liquid::C::BlockBody, I added Liquid::C::BlockBody#nodelist, although I will also be refactoring hot code paths using that in Shopify to not depend on it for performance reasons.
As part of our storefront rewrite we have some code to debug differences in the rendered output, which is quite coupled to the liquid parse tree node ruby objects, but doesn't seem to be performance sensitive, so I added a:disable_liquid_c_nodesparse option so this can be used without disabling liquid-c globally (which wouldn't be thread-safe).Benchmark
On master:
On this PR's branch:
Check List