Skip to content

snappy: optimize UnalignedCopy64 and IncrementalCopy for RISC-V#3360

Open
Felix-Gong wants to merge 1 commit into
apache:masterfrom
Felix-Gong:riscv-snappy-opt
Open

snappy: optimize UnalignedCopy64 and IncrementalCopy for RISC-V#3360
Felix-Gong wants to merge 1 commit into
apache:masterfrom
Felix-Gong:riscv-snappy-opt

Conversation

@Felix-Gong

Copy link
Copy Markdown
Contributor

Summary

  • Optimize UnalignedCopy64() with direct ld/sd inline assembly for 8-byte copy
  • Optimize IncrementalCopy() with 8-byte bulk copies when source/dest don't overlap

Performance Improvement (direct function benchmark)

Scenario Baseline Optimized Improvement
Decompress compressible-256K 728 MB/s 2205 MB/s +203%
Decompress zeros-256K 543 MB/s 1462 MB/s +169%

Changes

  • src/butil/third_party/snappy/snappy-stubs-internal.h: +11 lines
  • src/butil/third_party/snappy/snappy.cc: +24 lines

Test Plan

  • brpc_snappy_compress_unittest passed (7/7)
  • Build verified on RISC-V (SOPHGO SG2044)

Use RISC-V inline assembly (ld/sd) for 8-byte copy operations instead
of generic macro-based implementation.

Changes:
- UnalignedCopy64: direct ld/sd pair for 8-byte copy
- IncrementalCopy: 8-byte bulk copies when source/dest don't overlap

Performance improvement (direct function benchmark):
- Decompress compressible-256K: 728 MB/s -> 2205 MB/s (+203%)
- Decompress zeros-256K: 543 MB/s -> 1462 MB/s (+169%)

Tests: brpc_snappy_compress_unittest passed (7/7)

Signed-off-by: Felix-Gong <gongxiaofei24@iscas.ac.cn>
@wwbmmm

wwbmmm commented Jun 28, 2026

Copy link
Copy Markdown
Contributor

LGTM

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR aims to improve Snappy decompression performance on RV64 by accelerating 8-byte copy paths in UnalignedCopy64() and IncrementalCopy() using inline RISC-V ld/sd.

Changes:

  • Add an RV64-specific 8-byte bulk-copy loop to IncrementalCopy() when overlap is considered safe.
  • Add an RV64-specific ld/sd implementation for UnalignedCopy64().

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
src/butil/third_party/snappy/snappy.cc Adds RV64 inline-asm 8-byte copying to IncrementalCopy() for non-overlapping (or “safe overlap”) cases.
src/butil/third_party/snappy/snappy-stubs-internal.h Adds RV64 inline-asm ld/sd fast path to UnalignedCopy64().

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +101 to +116
// RISC-V optimized: use 8-byte copies when possible
if (len >= 8 && (op - src >= 8 || src - op >= 8)) {
// Non-overlapping or safe overlap: copy 8 bytes at a time
do {
uint64_t tmp;
__asm__ volatile(
"ld %0, %1\n\t"
"sd %0, %2\n\t"
: "=&r"(tmp)
: "m"(*(const uint64_t*)src), "m"(*(uint64_t*)op)
: "memory");
src += 8;
op += 8;
len -= 8;
} while (len >= 8);
}
Comment on lines +167 to +176
#if defined(__riscv) && __riscv_xlen == 64
// RISC-V optimized: single ld/sd pair for 8-byte copy
uint64_t tmp;
__asm__ volatile(
"ld %0, %1\n\t"
"sd %0, %2\n\t"
: "=&r"(tmp)
: "m"(*(const uint64_t*)src), "m"(*(uint64_t*)dst)
: "memory");
#else
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants