Skip to content

add fuzzing infrastructure, expand fuzz coverage, fix CI integration#1

Open
osyniakov wants to merge 15 commits into
mainfrom
claude/add-fuzzing-tests-3EM9A
Open

add fuzzing infrastructure, expand fuzz coverage, fix CI integration#1
osyniakov wants to merge 15 commits into
mainfrom
claude/add-fuzzing-tests-3EM9A

Conversation

@osyniakov

@osyniakov osyniakov commented Apr 10, 2026

Copy link
Copy Markdown
Owner

Summary

Adds cargo-fuzz targets and ClusterFuzzLite CI to satisfy the OSSF Scorecard
Fuzzing check (was scoring 0). Also fixes a real panic discovered during
fuzzing, adds four more fuzz targets covering the highest-priority security
surfaces, and mitigates a panic found in tantivy's query grammar parser.


Changes

Fuzzing infrastructure

  • quickwit/quickwit-fuzz/ — cargo-fuzz crate, member of the parent workspace
  • .clusterfuzzlite/Dockerfile — extends gcr.io/oss-fuzz-base/base-builder-rust; installs protoc 3.20.3 (Ubuntu 20.04's bundled 3.6.x predates --experimental_allow_proto3_optional)
  • .clusterfuzzlite/build.sh — builds all fuzz targets with RUSTFLAGS="--cfg tokio_unstable" cargo +nightly fuzz build --fuzz-dir quickwit-fuzz
  • .github/workflows/clusterfuzzlite.yml — code-change mode (push/PR); path filters skip frontend-only changes
  • .github/workflows/clusterfuzzlite-batch.yml — batch mode on nightly schedule (1 hour); grows the corpus and finds new bugs independently of PR CI

Bug fix found by fuzzer

quickwit-datetime/src/date_time_parsing.rsparse_timestamp_str panicked
on inputs containing multi-byte UTF-8 characters in the subsecond position.
The old code used str.len().min(9) (byte count) to slice, which can land
mid-character. Fixed with a byte-level scan that always stops at a valid char boundary.

Tantivy query grammar panic (mitigation)

fuzz_query_string found a panic in tantivy's UserInputLeaf::set_field
(query-grammar/src/user_input_ast.rs:51: "Exist query without a field isn't
allowed") triggered by bare exist-style inputs such as *:. This is a bug in
the upstream tantivy query grammar parser with no tracked issue yet.

Standard catch_unwind is insufficient here because libfuzzer-sys installs a
panic hook that calls abort() before stack unwinding begins. The workaround
temporarily replaces the hook with a no-op around the call, then restores it,
so the fuzzer continues rather than aborting.

Fuzz targets (8 total)

Target What it fuzzes Why
fuzz_query_dsl ElasticQueryDsl JSON deserialization Full ES query DSL from user HTTP requests
fuzz_query_string Lucene UserInputQuery parser Complex parser; found tantivy panic on *:
fuzz_datetime parse_date_time_str across all formats Found and fixed the subsecond UTF-8 panic
fuzz_doc_mapper DocMapper::doc_from_json_str Document ingestion path
fuzz_doc_mapper_config DocMapperBuilder deserialization + try_build() Schema definition via index-creation API; covers regex tokenizer ReDoS
fuzz_java_datetime_format StrptimeParser::from_java_datetime_format / from_strptime Format-string parsing reachable from ES range query "format" field
fuzz_otlp_spans parse_otlp_spans_protobuf + parse_otlp_spans_json External tracing agents at gRPC boundary; recursive AnyValue has no depth limit
fuzz_otlp_logs parse_otlp_logs_protobuf + parse_otlp_logs_json External log collectors at gRPC boundary; same recursion risk

Other

  • .dockerignore — added !.clusterfuzzlite/ exception (was excluded by **/.*)
  • quickwit/Cargo.toml — added quickwit-fuzz to workspace members
  • quickwit/deny.toml — NCSA license exception for libfuzzer-sys
  • LICENSE-3rdparty.csv — added arbitrary and libfuzzer-sys entries

Test plan

  • All 8 fuzz targets build: RUSTFLAGS="--cfg tokio_unstable" cargo +nightly fuzz build --fuzz-dir quickwit-fuzz
  • All 8 targets run without crashes or panics
  • fuzz_datetime no longer panics on multi-byte UTF-8 subsecond inputs
  • fuzz_query_string no longer aborts on *: and similar tantivy-panicking inputs
  • ClusterFuzzLite code-change CI passes end-to-end
  • Frontend-only PRs skip the fuzzing workflow (path filter)
  • Nightly batch workflow runs independently of PR CI
  • OSSF Scorecard detects google/clusterfuzzlite action → Fuzzing score > 0

https://claude.ai/code/session_01PKpEBTpgHSndurjdPbJodB

claude added 7 commits April 10, 2026 05:06
Adds cargo-fuzz targets and ClusterFuzzLite CI to satisfy the OSSF
Scorecard Fuzzing check. Four fuzz targets cover the main parsing paths:
- fuzz_query_dsl: Elasticsearch query DSL JSON deserialization
- fuzz_query_string: Lucene query string parser (UserInputQuery)
- fuzz_datetime: datetime string parsing across all supported formats
- fuzz_doc_mapper: JSON document ingestion via DocMapper

Also fixes a panic in parse_timestamp_str discovered during fuzzing:
subsecond_digits_str.len().min(9) returns byte count and can slice
mid-way through a multi-byte UTF-8 character. The fix uses str::find to
locate the first non-ASCII-digit, ensuring the slice boundary is always
on a valid char boundary.

Building fuzz targets requires:
  RUSTFLAGS="--cfg tokio_unstable" cargo +nightly fuzz build

The ClusterFuzzLite workflow will appear in GitHub Actions and satisfies
the OSSF Scorecard Fuzzing check once merged to main.

https://claude.ai/code/session_01PKpEBTpgHSndurjdPbJodB
The fuzz crate is part of the parent workspace so the workspace-level
Cargo.lock is authoritative. Add Cargo.lock to fuzz/.gitignore.

https://claude.ai/code/session_01PKpEBTpgHSndurjdPbJodB
- workflow: drop unused `id: build` and `id: run` step IDs
- fuzz_doc_mapper: remove "what" comment line; the LazyLock purpose
  is self-evident from the "why" comment that remains
- date_time_parsing: replace `str::find` char-closure with
  `as_bytes().iter().position` — avoids UTF-8 decoding overhead
  since subsecond digits are always ASCII; the byte-level position
  remains a valid str char boundary (non-ASCII-digit byte is either
  ASCII or a multi-byte leading byte, never a continuation byte)

https://claude.ai/code/session_01PKpEBTpgHSndurjdPbJodB
The existing **/.*  rule excluded all hidden directories, including
.clusterfuzzlite/. ClusterFuzzLite's Dockerfile COPYs build.sh from
that directory, so the Docker build would fail with "not found".
Add an explicit exception matching the existing pattern for .git/.

https://claude.ai/code/session_01PKpEBTpgHSndurjdPbJodB
The package was already named quickwit-fuzz; rename the directory to
match. Also correct the binary output path in build.sh: as a workspace
member the fuzz targets are built into the workspace-level target/,
not a package-local target/.

https://claude.ai/code/session_01PKpEBTpgHSndurjdPbJodB
Add path filters to push/pull_request triggers matching the pattern
used by ci.yml: run on quickwit/** (excluding quickwit-ui/) plus the
ClusterFuzzLite infra files themselves. schedule and workflow_dispatch
are left unfiltered as GitHub Actions does not support paths: on those
triggers.

https://claude.ai/code/session_01PKpEBTpgHSndurjdPbJodB
…time format

- fuzz_otlp_spans / fuzz_otlp_logs: exercise parse_otlp_spans/logs_protobuf
  and parse_otlp_spans/logs_json from quickwit-opentelemetry.  Both accept
  untrusted bytes from external tracing/logging agents (OTEL Collector,
  Fluentd) at the gRPC boundary.  The recursive AnyValue processing inside
  has no depth limit, making this the highest-priority fuzzing surface.

- fuzz_doc_mapper_config: fuzz the DocMapperBuilder config deserialization
  and try_build() validation — the schema-definition path arriving via the
  index-creation REST API.  Orthogonal to the existing fuzz_doc_mapper which
  tests document ingestion against a fixed schema.

- fuzz_java_datetime_format: fuzz StrptimeParser::from_java_datetime_format
  and from_strptime — the format-string parsing side of datetime handling.
  Reachable from Elasticsearch range queries via the untrusted "format" field.

Also adds --fuzz-dir quickwit-fuzz to build.sh (needed after renaming fuzz/
to quickwit-fuzz/) and expands the binary loop to include the four new targets.

https://claude.ai/code/session_01PKpEBTpgHSndurjdPbJodB
@osyniakov osyniakov changed the title add fuzzing infrastructure and fix datetime parsing panic add fuzzing infrastructure, fix datetime panic, expand fuzz coverage Apr 12, 2026
osyniakov and others added 8 commits April 12, 2026 10:08
- quickwit-fuzz/Cargo.toml: add license.workspace = true (required for
  cargo deny's unlicensed check on workspace members)
- deny.toml: add NCSA exception for libfuzzer-sys, which carries
  (MIT OR Apache-2.0) AND NCSA; NCSA is OSI/FSF-approved but was not in
  the workspace allow-list; libfuzzer-sys is never shipped in production
- fuzz_doc_mapper_config.rs: collapse nested if-let into a let-chain
  to satisfy clippy's collapsible_if lint (stable since Rust 1.88);
  also apply rustfmt formatting
- fuzz_doc_mapper.rs: apply rustfmt formatting (long line in LazyLock)

https://claude.ai/code/session_01PKpEBTpgHSndurjdPbJodB
libfuzzer-sys and its transitive dep arbitrary were introduced when
quickwit-fuzz became a workspace member.  Regenerated with
dd-rust-license-tool (git version) using the existing openssl-macros
override in license-tool.toml.

https://claude.ai/code/session_01PKpEBTpgHSndurjdPbJodB
quickwit-proto's build.rs runs protobuf codegen via prost-build, which
requires protoc at compile time.  The base-builder-rust image does not
include it.  ci.yml already installs protobuf-compiler for the same
reason; mirror that here.

https://claude.ai/code/session_01PKpEBTpgHSndurjdPbJodB
The base-builder-rust image is Ubuntu 20.04 where apt-get installs
protoc 3.6.1, which predates --experimental_allow_proto3_optional
(added in 3.12, required by prost-build for proto3 optional fields).

Replace with protoc 3.20.3 from the official GitHub releases, which
fully supports the flag.  curl is already present in the base image;
add unzip for the zip archive.

https://claude.ai/code/session_01PKpEBTpgHSndurjdPbJodB
Input `*\x0c...\x0c*` triggers a known tantivy bug in
UserInputLeaf::set_field ("Exist query without a field isn't
allowed"). libfuzzer-sys converts panics to abort(), which ASAN
reports as a crash. Use catch_unwind to suppress the panic so the
fuzzer can continue exploring the input space.

https://claude.ai/code/session_01PKpEBTpgHSndurjdPbJodB
catch_unwind alone is insufficient: libfuzzer-sys installs a panic
hook during initialization that calls std::process::abort() before
stack unwinding begins, so the unwind never reaches catch_unwind.

Fix by temporarily swapping the hook for a no-op around the call,
then restoring the original hook, so the known upstream tantivy bug
("Exist query without a field isn't allowed") is caught rather than
aborting the fuzzer process.

https://claude.ai/code/session_01PKpEBTpgHSndurjdPbJodB
code-change (clusterfuzzlite.yml): runs on push/PR with path filters,
30s run to replay crash corpus and catch regressions.

batch (clusterfuzzlite-batch.yml): runs on nightly schedule, 1-hour
run to grow the corpus and find new bugs.

Previously both modes shared one workflow with a schedule trigger,
causing code-change to run nightly (wasteful) and making it impossible
to tune each mode independently.

https://claude.ai/code/session_01PKpEBTpgHSndurjdPbJodB
@osyniakov osyniakov changed the title add fuzzing infrastructure, fix datetime panic, expand fuzz coverage add fuzzing infrastructure, expand fuzz coverage, fix CI integration Apr 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants