D-047: test(fuzz): signature extraction fuzzers for non-Rust languages#10
D-047: test(fuzz): signature extraction fuzzers for non-Rust languages#10Sephyi wants to merge 1 commit intodevelopmentfrom
Conversation
Extend the signature-extraction fuzz coverage from Rust-only to all ten supported grammars (Rust, TypeScript, JavaScript, Python, Go, Java, C, C++, Ruby, C#). A new unified `fuzz_signature_multilang` target dispatches on `data[0] % 10` to pick a language, then feeds the remaining bytes to the matching `extract_<lang>_signature` helper. This mirrors the byte-dispatch pattern already used by `fuzz_classify_span` and keeps boilerplate minimal. To expose the dispatcher, `lib.rs` grows one public wrapper per language, each delegating to a small private `extract_signature_for_ language(source, language)` helper that centralises the `Parser::new() -> set_language -> parse -> root.child(0) -> AnalyzerService::extract_signature` pipeline. Each wrapper is gated by its language feature so the crate still builds with arbitrary subsets of `lang-*` features. `fuzz/Cargo.toml` now pulls `commitbee` with `default-features = false` plus every `lang-*` feature, and registers the new `fuzz_signature_multilang` binary. Turning off default features also drops the keyring transitive dependency from the fuzz build, which is pure build-time savings for a workload that never touches secure storage. Verified via `cargo check --manifest-path fuzz/Cargo.toml` plus the standard `cargo fmt --check`, `cargo clippy --all-targets --all-features -- -D warnings`, and `cargo test --all-targets`. The fuzzer itself does not need to run to completion — the guarantee is "never panic on any input," and `cargo-fuzz` will exercise that as part of the normal fuzzing workflow. Closes audit entry D-047 from #3.
There was a problem hiding this comment.
Pull request overview
Adds multi-language fuzz coverage for CommitBee’s signature extraction by exposing language-specific signature wrappers and introducing a new fuzz target that dispatches across supported tree-sitter grammars.
Changes:
- Refactors signature extraction into a shared
extract_signature_for_languagehelper insrc/lib.rsand adds per-language public wrapper functions behindlang-*feature flags. - Introduces a new
fuzz_signature_multilangfuzz target that selects among 10 languages based on the first input byte. - Updates fuzz crate configuration to enable all language features (while disabling default features) and refreshes
fuzz/Cargo.lockaccordingly.
Reviewed changes
Copilot reviewed 3 out of 4 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
src/lib.rs |
Adds a language-parameterized signature extraction helper and new extract_*_signature wrappers for non-Rust languages. |
fuzz/fuzz_targets/fuzz_signature_multilang.rs |
New fuzz target that dispatches input to the appropriate language-specific signature extractor. |
fuzz/Cargo.toml |
Enables all language features for fuzzing (and disables default features) and registers the new fuzz binary target. |
fuzz/Cargo.lock |
Lockfile updates reflecting changed feature/dependency resolution (e.g., dropping secure-storage/keyring from fuzz). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| @@ -76,6 +72,66 @@ pub fn extract_rust_signature(source: &str) -> Option<String> { | |||
| services::analyzer::AnalyzerService::extract_signature(first_child, source) | |||
| } | |||
There was a problem hiding this comment.
extract_signature_for_language is always compiled, but it’s only referenced from the language-specific wrappers that are all #[cfg(feature = "lang-*")]. In a --no-default-features build with no language features enabled, this helper becomes unused and will trigger a dead_code warning. Consider gating the helper with the same cfg(any(feature = "lang-rust", ...)) set, or explicitly allowing dead_code with a short rationale so minimal-feature builds stay warning-free.
| let Ok(source) = std::str::from_utf8(&data[1..]) else { | ||
| return; | ||
| }; | ||
| match selector { | ||
| 0 => { | ||
| let _ = commitbee::extract_rust_signature(source); | ||
| } | ||
| 1 => { | ||
| let _ = commitbee::extract_typescript_signature(source); | ||
| } | ||
| 2 => { | ||
| let _ = commitbee::extract_javascript_signature(source); | ||
| } | ||
| 3 => { | ||
| let _ = commitbee::extract_python_signature(source); | ||
| } | ||
| 4 => { | ||
| let _ = commitbee::extract_go_signature(source); | ||
| } | ||
| 5 => { | ||
| let _ = commitbee::extract_java_signature(source); | ||
| } | ||
| 6 => { | ||
| let _ = commitbee::extract_c_signature(source); | ||
| } | ||
| 7 => { | ||
| let _ = commitbee::extract_cpp_signature(source); | ||
| } | ||
| 8 => { | ||
| let _ = commitbee::extract_ruby_signature(source); | ||
| } | ||
| 9 => { | ||
| let _ = commitbee::extract_csharp_signature(source); |
There was a problem hiding this comment.
The fuzzer drops any input where data[1..] is not valid UTF-8, which can significantly reduce fuzz coverage since many byte sequences will exit early. To exercise the parsers more thoroughly, consider converting with a lossless-to-&str strategy like String::from_utf8_lossy(&data[1..]) (or switching the target input type to (&str, u8) via a structured fuzzer input) so every input still drives the extractor while staying within the &str API contract.
| let Ok(source) = std::str::from_utf8(&data[1..]) else { | |
| return; | |
| }; | |
| match selector { | |
| 0 => { | |
| let _ = commitbee::extract_rust_signature(source); | |
| } | |
| 1 => { | |
| let _ = commitbee::extract_typescript_signature(source); | |
| } | |
| 2 => { | |
| let _ = commitbee::extract_javascript_signature(source); | |
| } | |
| 3 => { | |
| let _ = commitbee::extract_python_signature(source); | |
| } | |
| 4 => { | |
| let _ = commitbee::extract_go_signature(source); | |
| } | |
| 5 => { | |
| let _ = commitbee::extract_java_signature(source); | |
| } | |
| 6 => { | |
| let _ = commitbee::extract_c_signature(source); | |
| } | |
| 7 => { | |
| let _ = commitbee::extract_cpp_signature(source); | |
| } | |
| 8 => { | |
| let _ = commitbee::extract_ruby_signature(source); | |
| } | |
| 9 => { | |
| let _ = commitbee::extract_csharp_signature(source); | |
| let source = String::from_utf8_lossy(&data[1..]); | |
| match selector { | |
| 0 => { | |
| let _ = commitbee::extract_rust_signature(source.as_ref()); | |
| } | |
| 1 => { | |
| let _ = commitbee::extract_typescript_signature(source.as_ref()); | |
| } | |
| 2 => { | |
| let _ = commitbee::extract_javascript_signature(source.as_ref()); | |
| } | |
| 3 => { | |
| let _ = commitbee::extract_python_signature(source.as_ref()); | |
| } | |
| 4 => { | |
| let _ = commitbee::extract_go_signature(source.as_ref()); | |
| } | |
| 5 => { | |
| let _ = commitbee::extract_java_signature(source.as_ref()); | |
| } | |
| 6 => { | |
| let _ = commitbee::extract_c_signature(source.as_ref()); | |
| } | |
| 7 => { | |
| let _ = commitbee::extract_cpp_signature(source.as_ref()); | |
| } | |
| 8 => { | |
| let _ = commitbee::extract_ruby_signature(source.as_ref()); | |
| } | |
| 9 => { | |
| let _ = commitbee::extract_csharp_signature(source.as_ref()); |
Summary
test(fuzz): signature extraction fuzzers for non-Rust languages.
Audit context
Closes audit entry D-047 from #3.
Verification
cargo fmt --checkcargo clippy --all-targets --all-features -- -D warningscargo test --all-targets