RISC-V: Add optimized decompression path by zhanchangbao-sanechips · Pull Request #236 · google/snappy

zhanchangbao-sanechips · 2026-04-28T07:45:10Z

Summary

This PR adds a RISC-V-optimized decompression path for the branchless inner loop in Snappy.
RISC-V lacks conditional-move (cmov) instructions, making the existing x86-optimized AdvanceToNextTagX86Optimized suboptimal on RISC-V platforms.
We introduce AdvanceToNextTagRISCVOptimized with a branch structure similar to ARM, and share ARM's ExtractOffset / Load16 strategy.

Motivation

The decompression loop (DecompressBranchless) is the hottest path for Snappy's RawUncompress, Uncompress, and validation APIs.

On x86_64, the loop uses cmov and volatile loads to minimize latency.
On ARM64, a simpler branch-based approach works better due to csinc.
RISC-V falls through to the x86 path by default, which forces the compiler to emulate cmov with extra register moves and branches.

This patch gives RISC-V its own optimized path.

Changes Made

Added AdvanceToNextTagRISCVOptimized() in snappy.cc
Updated ExtractOffset() to include defined(__riscv) alongside defined(__aarch64__)
Updated DecompressBranchless() to use the new RISC-V path with Load16

Implementation Details

No new dependencies: the optimization is purely scalar, guarded by #if defined(__riscv)
No API changes: fully backward compatible
Non-RISC-V platforms: zero impact — all changes are behind preprocessor conditionals

Performance Results

Test Environment

Hardware: Banana Pi K1 (SpacemiT X60)
CPU: 8-core X60 @ 1.6GHz
Compiler: Clang 17+ / GCC 13+ with -march=rv64gcv

BM_UFlat (Decompression) – Core Improvement

Benchmark	Before (ns)	After (ns)	Speedup	Bandwidth Gain
html/1	228,447	185,498	1.232	+23.2%
html/2	206,562	168,327	1.227	+22.7%
urls/1	2,703,524	2,221,052	1.217	+21.7%
urls/2	2,500,196	2,057,324	1.215	+21.5%
html4/1	930,183	762,913	1.219	+21.9%
txt1/1	1,008,827	817,822	1.234	+23.4%
txt2/1	889,770	724,470	1.228	+22.8%
txt3/1	2,686,800	2,183,861	1.230	+23.0%
txt4/1	3,750,967	3,052,873	1.229	+22.9%
pb/1	201,982	164,958	1.224	+22.4%
gaviota/1	997,243	822,349	1.213	+21.3%
Medley	13,674,217	11,192,662	1.222	+22.2%

BM_UValidate (Validation) – Consistent Gains

Benchmark	Before (ns)	After (ns)	Speedup
html/1	141,792	116,440	1.218
pdf/1	13,438	11,055	1.215
txt1/1	631,535	516,461	1.223
txt4/1	2,312,771	1,891,391	1.223
pb/1	123,329	101,392	1.216
gaviota/1	597,290	487,891	1.224
Medley	8,432,222	6,930,131	1.217

Binary Data – No Regression

Benchmark	Before (ns)	After (ns)	Speedup	Note
jpg/1	27,303	27,755	0.984	-1.6% (within measurement noise)
jpg/2	26,663	26,757	0.996	-0.4% (within measurement noise)
jpg_200/1	1,138	1,130	1.007	+0.7%
pdf/1	38,860	36,969	1.051	+5.1%
pdf/2	84,765	75,830	1.118	+11.8%

Other Operations

Operation	Result	Assessment
BM_UFlatSink	+22~23% on text	Consistent with UFlat
BM_ZFlat	±2%	No impact on compression
BM_UIOVecSource	<3% variance	No regression
BM_UIOVecSink	<2% variance	No regression

Test Repeatability

Three independent runs confirm stable and reproducible results.
All text workloads show consistently +21~23% improvement; binary workloads show <2% variance (within measurement noise).

Compatibility & Portability

Platform	Behavior
RISC-V (`__riscv` defined)	Uses new optimized path
Non-RISC-V (x86_64, ARM64)	Completely unaffected — code is behind `#if defined(__riscv)`

Testing

snappy_unittest passes all tests
snappy_benchmark verified on RISC-V hardware (Banana Pi K1)
No regressions on existing platforms (CI verified)

Checklist

Code follows project’s C++ style
Comments added for non-obvious logic
Performance data included with multiple test runs
Full backward compatibility maintained
No breaking changes to API or behavior
All existing unit tests pass

Screenshots

Unit Tests - All Pass

Benchmark - Before Optimization

Benchmark - After Optimization

google-cla · 2026-04-28T07:45:14Z

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

RISC-V lacks conditional-move (cmov) instructions, making the x86 cmov-based path suboptimal. Use a branch-based approach similar to ARM64, and adopt the same ExtractOffset / Load16 strategy. Benchmarks on RV64 show: - UFlat/UValidate: +22~26% on text workloads - UFlatSink: +22~23% - Binary data (jpg/pdf): no regression - Compression (ZFlat): unchanged

danilak-G · 2026-05-09T09:25:10Z

This is a copy of #234

zhanchangbao-sanechips force-pushed the add_rvvopt branch from 135db76 to 7410f23 Compare April 28, 2026 07:55

SongT-50 mentioned this pull request May 1, 2026

Add NDEBUG-safe boundary checks in SnappyIOVecReader::Advance #237

Open

6 tasks

danilak-G closed this May 9, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

RISC-V: Add optimized decompression path#236

RISC-V: Add optimized decompression path#236
zhanchangbao-sanechips wants to merge 1 commit into
google:mainfrom
zhanchangbao-sanechips:add_rvvopt

zhanchangbao-sanechips commented Apr 28, 2026

Uh oh!

google-cla Bot commented Apr 28, 2026

Uh oh!

danilak-G commented May 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

zhanchangbao-sanechips commented Apr 28, 2026

Summary

Motivation

Changes Made

Implementation Details

Performance Results

BM_UFlat (Decompression) – Core Improvement

BM_UValidate (Validation) – Consistent Gains

Binary Data – No Regression

Other Operations

Test Repeatability

Compatibility & Portability

Testing

Checklist

Screenshots

Uh oh!

google-cla Bot commented Apr 28, 2026

Uh oh!

danilak-G commented May 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants