Skip to content

[WIP] Optimize RESP protocol parser and encoder for performance#2

Closed
Copilot wants to merge 1 commit into
mainfrom
copilot/optimize-resp-protocol-parser
Closed

[WIP] Optimize RESP protocol parser and encoder for performance#2
Copilot wants to merge 1 commit into
mainfrom
copilot/optimize-resp-protocol-parser

Conversation

Copilot AI commented Mar 10, 2026

Copy link
Copy Markdown

Thanks for asking me to work on this. I will get started on it and keep this PR's description up to date as I form a plan and make progress.

Original prompt

Problem

The RESP protocol parser and encoder in crates/protocol/ are too slow and need to be optimized for maximum performance (target: 5-10x faster across parsing and encoding).

Current Benchmark Numbers

parse_command_into/inline:       70-77 ns   (~498 MiB/s)
parse_command_into/array/3x8:    37 ns       (~1.15 GiB/s)
parse_command_into/array/8x16:   158-160 ns  (~1.10 GiB/s)
parse_command_into/array/32x24:  561-563 ns  (~1.65 GiB/s)
parse_frame/simple:              22-23 ns    (~293 MiB/s)
parse_frame/bulk:                16-17 ns    (~1.35 GiB/s)
parse_frame/nested:              155-156 ns  (~330 MiB/s)
encode/simple:                   19-20 ns    (~344 MiB/s)
encode/bulk_values_8x16:         109-114 ns  (~1.58 GiB/s)
encode/array_16x24:              247-248 ns  (~1.88 GiB/s)
encode/map:                      54-56 ns    (~695 MiB/s)

Files to Optimize

  1. crates/protocol/src/encoder.rs - RESP frame encoder
  2. crates/protocol/src/parser.rs - RESP frame and command parser
  3. crates/protocol/src/types.rs - Protocol types (BulkData, RespFrame)

Key Optimization Strategies to Apply

Encoder (encoder.rs)

  1. Replace multiple extend_from_slice calls with single raw pointer writes - Each extend_from_slice call has overhead (bounds check, potential reallocation check). Instead, pre-calculate the total needed size, reserve once, then write directly with raw pointers using unsafe.
  2. Use itoa crate for integer formatting - The itoa crate (already in workspace dependencies) is highly optimized for integer-to-string conversion and is faster than the manual write_int/write_uint implementations.
  3. Batch small writes into a single contiguous write - For encoding patterns like "$" + len + "\r\n" + data + "\r\n", build the entire thing in a stack buffer (for small lengths) and write once.
  4. Eliminate the Encoder struct entirely - It has no state. Make encode a free function or use #[inline(always)] on the method. The &mut self parameter adds unnecessary indirection.
  5. Add more pre-encoded static byte constants - For common patterns like "$0\r\n\r\n", ":0\r\n", ":1\r\n", "*0\r\n", "*1\r\n", "*2\r\n", etc.
  6. Use BufMut trait's put_slice or direct chunk_mut() + advance_mut() - Write directly into BytesMut's spare capacity without intermediate copies.

Parser (parser.rs)

  1. Use memchr for CRLF scanning - The memchr crate (already in workspace dependencies) uses SIMD instructions to find \n bytes much faster than a byte-by-byte loop, especially for longer lines.
  2. Remove checked_mul/checked_add from parse_uint/parse_int - In RESP, integer values are small (lengths, counts). Use unchecked arithmetic or simply use wrapping arithmetic with a post-check, or just use wrapping_mul/wrapping_add since overflow is practically impossible for valid RESP.
  3. Inline parse_uint aggressively with a lookup-table approach - For 1-digit and 2-digit numbers (which are the vast majority in RESP), use a fast path.
  4. Use src.advance() instead of src.split_to() - split_to does extra work to create a new BytesMut; advance just moves the read cursor forward (already partially done in your code but some places still use split_to).
  5. Specialize read_line with memchr::memchr - Use SIMD-accelerated \n scanning from the memchr crate.
  6. Avoid Option<u8> wrapping in peek() - Use a direct unsafe access pattern or return the byte with a separate bounds check.

Types (types.rs)

  1. Add #[inline(always)] to BulkData::as_slice() - This is called in every tight encode loop and should be fully inlined.
  2. Consider removing the profiler scope calls from hot-path methods - The profiler::scope() calls in as_slice(), from_vec(), etc. add overhead even when profiling is disabled (they still call the function which may or may not be inlined away).

General

  • Make sure all hot-path functions are #[inline(always)] not just #[inline].
  • Ensure the benchmarks remain working and passing after optimizations.
  • All existing tests must continue to pass.
  • Keep using existing workspace dependencies (itoa, memchr, bytes, smallvec) rather than adding new ones.

Current Source Code

crates/protocol/src/encoder.rs

use crate::types::RespFrame;
use bytes::BytesMut;

#[inline]
fn write_int(buf: &mut BytesMut, val: i64) {
    let mut tmp = [0u8; 20];
    let mut pos = 20usize;

    if val == 0 {
        buf.extend_from_slice(b"0");
        return;
    }

    let neg = val < 0;
    let mut v: u64 = if neg {
        (!(val as u64)).wrapping_add(1)
    } else {
        val as u64
    };

    while v > 0 {
        pos -= 1;
        unsafe { *tmp.get_unchecked_mut(pos) = (v % 10) as u8 + b'0' };
        v /= 10;
    }
    if neg {
        pos -= 1;
        unsafe { *tmp.get_unchecked_mut(pos) = b'-' };
    }
    buf.extend_from_slice(unsafe...

</details>



<!-- START COPILOT CODING AGENT SUFFIX -->

*This pull request was created from Copilot chat.*
>

<!-- START COPILOT CODING AGENT TIPS -->
---

💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more [Copilot coding agent tips](https://gh.io/copilot-coding-agent-tips) in the docs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants