Skip to content

fix: denylist adversarial fixtures and wrap parsing with catch_unwind#86

Merged
developer0hye merged 3 commits intomainfrom
ralph/phase14-denylist-and-infra
Mar 1, 2026
Merged

fix: denylist adversarial fixtures and wrap parsing with catch_unwind#86
developer0hye merged 3 commits intomainfrom
ralph/phase14-denylist-and-infra

Conversation

@developer0hye
Copy link
Owner

@developer0hye developer0hye commented Mar 1, 2026

Summary

  • Expand the bulk test denylist to 49 corrupted/adversarial fixtures across all three formats (15 DOCX, 11 PPTX, 23 XLSX), so fuzzer-generated, XML bomb, and OOM-inducing files are skipped and do not skew quality metrics
  • Wrap upstream parser.parse() calls with std::panic::catch_unwind in convert_bytes() and convert_bytes_streaming_xlsx() so that panics from upstream libraries (umya-spreadsheet, docx-rs) are caught and returned as ConvertError::Parse instead of propagating
  • Add upstream_panics_return_error integration test verifying known-panicking fixtures return Err

Key changes

  • lib.rs: catch_unwind around parse calls + extract_panic_message() helper
  • bulk_conversion.rs: Expanded denylist from 9 → 49 entries with per-format coverage assertions
  • xlsx_fixtures.rs: New upstream_panics_return_error test

Related: #77, #85, #83, #84

🤖 Generated with Claude Code

Add all fuzzer-generated, crash reporter, XML bomb, and OOM-inducing
test fixtures to the bulk conversion denylist. This covers:
- 15 DOCX: clusterfuzz, crash reporter, deep-table-cell, truncated
- 11 PPTX: clusterfuzz, Divino_Revelado (OOM/hang)
- 23 XLSX: clusterfuzz, crash reporters, 58616, XML bombs, OOM

The test_denylist_filtering test now also verifies minimum counts
per format (>=14 DOCX, >=10 PPTX, >=15 XLSX).

Related: #85, #77

Signed-off-by: Yonghye Kwon <developer.0hye@gmail.com>
Wrap parser.parse() with catch_unwind in convert_bytes() so panics
from umya-spreadsheet / docx-rs become ConvertError::Parse instead
of propagating to the caller. Applies to both normal and streaming
XLSX paths.

Add extract_panic_message() helper to extract human-readable messages
from caught panic payloads.

Add upstream_panics_return_error test verifying that known-panicking
fixtures (chart_hyperlink.xlsx, check-boolean.xlsx) return Err
instead of panicking.

Related: #77

Signed-off-by: Yonghye Kwon <developer.0hye@gmail.com>
This file causes a CI timeout in the bulk third-party fixture test.
Add it to the denylist to prevent blocking CI.

Refs: #85

Signed-off-by: Yonghye Kwon <developer.0hye@gmail.com>
@developer0hye developer0hye merged commit c4fed54 into main Mar 1, 2026
14 checks passed
@developer0hye developer0hye deleted the ralph/phase14-denylist-and-infra branch March 1, 2026 08:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant