[WIP] Optimize RESP protocol parser and encoder for performance#2
Closed
Copilot wants to merge 1 commit into
Closed
[WIP] Optimize RESP protocol parser and encoder for performance#2Copilot wants to merge 1 commit into
Copilot wants to merge 1 commit into
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Thanks for asking me to work on this. I will get started on it and keep this PR's description up to date as I form a plan and make progress.
Original prompt
Problem
The RESP protocol parser and encoder in
crates/protocol/are too slow and need to be optimized for maximum performance (target: 5-10x faster across parsing and encoding).Current Benchmark Numbers
Files to Optimize
crates/protocol/src/encoder.rs- RESP frame encodercrates/protocol/src/parser.rs- RESP frame and command parsercrates/protocol/src/types.rs- Protocol types (BulkData, RespFrame)Key Optimization Strategies to Apply
Encoder (
encoder.rs)extend_from_slicecalls with single raw pointer writes - Eachextend_from_slicecall has overhead (bounds check, potential reallocation check). Instead, pre-calculate the total needed size, reserve once, then write directly with raw pointers usingunsafe.itoacrate for integer formatting - Theitoacrate (already in workspace dependencies) is highly optimized for integer-to-string conversion and is faster than the manualwrite_int/write_uintimplementations."$" + len + "\r\n" + data + "\r\n", build the entire thing in a stack buffer (for small lengths) and write once.Encoderstruct entirely - It has no state. Makeencodea free function or use#[inline(always)]on the method. The&mut selfparameter adds unnecessary indirection."$0\r\n\r\n",":0\r\n",":1\r\n","*0\r\n","*1\r\n","*2\r\n", etc.BufMuttrait'sput_sliceor directchunk_mut()+advance_mut()- Write directly into BytesMut's spare capacity without intermediate copies.Parser (
parser.rs)memchrfor CRLF scanning - Thememchrcrate (already in workspace dependencies) uses SIMD instructions to find\nbytes much faster than a byte-by-byte loop, especially for longer lines.checked_mul/checked_addfromparse_uint/parse_int- In RESP, integer values are small (lengths, counts). Use unchecked arithmetic or simply use wrapping arithmetic with a post-check, or just usewrapping_mul/wrapping_addsince overflow is practically impossible for valid RESP.parse_uintaggressively with a lookup-table approach - For 1-digit and 2-digit numbers (which are the vast majority in RESP), use a fast path.src.advance()instead ofsrc.split_to()-split_todoes extra work to create a newBytesMut;advancejust moves the read cursor forward (already partially done in your code but some places still usesplit_to).read_linewithmemchr::memchr- Use SIMD-accelerated\nscanning from thememchrcrate.Option<u8>wrapping inpeek()- Use a direct unsafe access pattern or return the byte with a separate bounds check.Types (
types.rs)#[inline(always)]toBulkData::as_slice()- This is called in every tight encode loop and should be fully inlined.profiler::scope()calls inas_slice(),from_vec(), etc. add overhead even when profiling is disabled (they still call the function which may or may not be inlined away).General
#[inline(always)]not just#[inline].itoa,memchr,bytes,smallvec) rather than adding new ones.Current Source Code
crates/protocol/src/encoder.rs