simd_json::from_slice silently produces a wrong f64 when a decimal number starts near the end of a 64-byte SIMD chunk and enough of the number's tail (≥ 13 bytes) spills into the next chunk. No parse error is returned.
Minimal reproducer
use serde::Deserialize;
#[derive(Debug, Deserialize)]
struct Level { price: f64, amount: f64 }
#[derive(Debug, Deserialize)]
struct Book { asks: Vec<Level> }
fn main() {
// "3077999.0000000000000000" (24 bytes) starts at byte offset 56.
// Bytes 56-63 (8 bytes) are in chunk 0; the remaining 16 bytes spill into chunk 1.
let json: &[u8] =
br#"{"asks":[{"price":3077990.0,"amount":0.111111},{"price":3077999.0000000000000000,"amount":0.5}]}"#;
assert_eq!(json[56], b'3');
assert_eq!(&json[56..80], b"3077999.0000000000000000");
let mut buf = json.to_vec();
let book: Book = simd_json::from_slice(&mut buf).unwrap();
let simd_price = book.asks[1].price;
let serde: Book = serde_json::from_slice(json).unwrap();
let serde_price = serde.asks[1].price;
println!("simd-json : {simd_price}"); // 1082.0885052467904 ← WRONG
println!("serde_json: {serde_price}"); // 3077999 ← correct
assert_eq!(simd_price, serde_price, "simd-json parsed the wrong value");
}
Output (v0.14.3 and v0.17.0):
simd-json : 1082.0885052467904
serde_json: 3077999
thread 'main' panicked: simd-json parsed the wrong value
Exact trigger condition
The bug fires when a number starts at offset N within a 64-byte chunk and
(N + number_length) - 64 >= 13 — i.e. at least 13 bytes of the number extend
into the next chunk. Sweeping number lengths at offset 56 (8 bytes remaining in chunk):
| Length |
Bytes past boundary |
simd-json result |
| 9–20 |
1–12 |
✅ correct |
| 21 |
13 |
❌ 1233324.59 (wrong) |
| 22 |
14 |
❌ 126519.95 (wrong) |
| 23 |
15 |
❌ 15839.48 (wrong) |
| 24 |
16 |
❌ 1082.09 (wrong) |
Non-deterministic: bug does not fire at every boundary crossing
The trigger condition above is necessary but not always sufficient. In a multi-level JSON
payload (dozens of entries after the buggy number), the same number at mod-64 = 57
(7 bytes of the integer visible before the boundary) sometimes parses correctly even
though 17 bytes extend past the chunk boundary (17 ≥ 13).
Observed in a reproducer with 8 levels where two entries landed at mod-64 = 57
(offset % 64 == 57). The snippet below shows this side-by-side with mod-64 = 58,
which fails consistently:
use serde::Deserialize;
#[derive(Debug, Deserialize)]
struct Level { price: f64, amount: f64 }
#[derive(Debug, Deserialize)]
struct Book { asks: Vec<Level> }
fn check(label: &str, json: &[u8], idx: usize, expected: f64) {
let mut buf = json.to_vec();
let book: Book = simd_json::from_slice(&mut buf).unwrap();
let got = book.asks[idx].price;
let status = if got == expected { "✅ correct" } else { "❌ WRONG" };
println!("{label}: got {got} ({status})");
}
fn main() {
// mod-64 = 57: complete integer part "3077999" (7 bytes) is in chunk 0,
// the dot is the first byte of chunk 1 — 17 bytes spill (≥ 13), yet
// simd-json parses correctly when the payload is long enough.
//
// Prefix is 57 bytes: {"asks":[{"price":3077990.0,"amount":0.1111111},{"price":
// ^------- 57 bytes --------^
let json_mod57: &[u8] =
br#"{"asks":[{"price":3077990.0,"amount":0.1111111},{"price":3077999.0000000000000000,"amount":0.5},{"price":3077990.0,"amount":0.1},{"price":3077990.0,"amount":0.1},{"price":3077990.0,"amount":0.1}]}"#;
assert_eq!(json_mod57[57], b'3', "number must start at offset 57");
assert_eq!(57 % 64, 57);
check("mod-64=57 (self-corrects)", json_mod57, 1, 3077999.0);
// mod-64 = 58: only 6 bytes visible ("307799", truncated), 18 bytes spill —
// fails reliably regardless of surrounding payload length.
//
// Prefix is 58 bytes: {"asks":[{"price":3077990.0,"amount":0.11111111},{"price":
// ^-------- 58 bytes ---------^
let json_mod58: &[u8] =
br#"{"asks":[{"price":3077990.0,"amount":0.11111111},{"price":3077999.0000000000000000,"amount":0.5},{"price":3077990.0,"amount":0.1},{"price":3077990.0,"amount":0.1},{"price":3077990.0,"amount":0.1}]}"#;
assert_eq!(json_mod58[58], b'3', "number must start at offset 58");
assert_eq!(58 % 64, 58);
check("mod-64=58 (always wrong) ", json_mod58, 1, 3077999.0);
}
Output:
mod-64=57 (self-corrects): got 3077999 (✅ correct)
mod-64=58 (always wrong) : got 1082.0885052467904 (❌ WRONG)
| Level index |
mod-64 offset |
Bytes visible before boundary |
Bytes past boundary |
simd-json result |
| 4 |
57 |
3077999 (7 — complete int) |
17 |
✅ correct |
| 7 |
57 |
3077999 (7 — complete int) |
17 |
✅ correct |
Yet in an isolated single-level JSON at offset 57, the same 24-byte number fails.
Why this matters:
- At mod-64 = 57 the complete integer part (
3077999) is visible in chunk 0; the dot
is the first byte of chunk 1. When the surrounding JSON is long enough that simd-json
has already loaded the next chunk as part of structural analysis, the number parser
may land on a different internal code path and get the right answer.
- At mod-64 = 56 the integer part plus the dot are visible (8 bytes =
3077999.),
but 16 bytes still spill — so the bug fires reliably regardless.
- At mod-64 = 58 only 6 bytes are visible (
307799, truncated integer); 18 bytes spill
and the bug fires reliably.
The net effect is that whether a given payload triggers the bug depends on:
- The start offset of the number within its 64-byte chunk.
- How many bytes extend past the boundary.
- The total size and layout of the surrounding JSON — which affects whether simd-json's
structural analysis has already prefetched the next chunk.
This makes the bug latent and intermittent in production: a payload that has always
worked can silently break after an innocuous change to an earlier field (longer symbol
name, extra field, whitespace change) shifts a price string to a different offset.
Visual
chunk 0 (bytes 0–63):
{"asks":[{"price":3077990.0,"amount":0.111111},{"price":3077999
^^^^^^^^
only 8 bytes of the number visible here
chunk 1 (bytes 64–127):
.000000000000000,"amount":0.5}]}
^^^^^^^^^^^^^^^^
remaining 16 bytes — parser reads these incorrectly
Environment
- simd-json: 0.14.3 and 0.17.0
- OS: Linux x86_64
- Rust: stable (1.96.0)
simd_json::from_slicesilently produces a wrongf64when a decimal number starts near the end of a 64-byte SIMD chunk and enough of the number's tail (≥ 13 bytes) spills into the next chunk. No parse error is returned.Minimal reproducer
Output (v0.14.3 and v0.17.0):
Exact trigger condition
The bug fires when a number starts at offset
Nwithin a 64-byte chunk and(N + number_length) - 64 >= 13— i.e. at least 13 bytes of the number extendinto the next chunk. Sweeping number lengths at offset 56 (8 bytes remaining in chunk):
1233324.59(wrong)126519.95(wrong)15839.48(wrong)1082.09(wrong)Non-deterministic: bug does not fire at every boundary crossing
The trigger condition above is necessary but not always sufficient. In a multi-level JSON
payload (dozens of entries after the buggy number), the same number at
mod-64 = 57(7 bytes of the integer visible before the boundary) sometimes parses correctly even
though 17 bytes extend past the chunk boundary (17 ≥ 13).
Observed in a reproducer with 8 levels where two entries landed at mod-64 = 57
(
offset % 64 == 57). The snippet below shows this side-by-side with mod-64 = 58,which fails consistently:
Output:
3077999(7 — complete int)3077999(7 — complete int)Yet in an isolated single-level JSON at offset 57, the same 24-byte number fails.
Why this matters:
3077999) is visible in chunk 0; the dotis the first byte of chunk 1. When the surrounding JSON is long enough that simd-json
has already loaded the next chunk as part of structural analysis, the number parser
may land on a different internal code path and get the right answer.
3077999.),but 16 bytes still spill — so the bug fires reliably regardless.
307799, truncated integer); 18 bytes spilland the bug fires reliably.
The net effect is that whether a given payload triggers the bug depends on:
structural analysis has already prefetched the next chunk.
This makes the bug latent and intermittent in production: a payload that has always
worked can silently break after an innocuous change to an earlier field (longer symbol
name, extra field, whitespace change) shifts a price string to a different offset.
Visual
Environment