Skip to content

fix: catch upstream panics, add zip pre-flight, and guard against OOM#79

Closed
developer0hye wants to merge 4 commits intomainfrom
fix/catch-upstream-panics
Closed

fix: catch upstream panics, add zip pre-flight, and guard against OOM#79
developer0hye wants to merge 4 commits intomainfrom
fix/catch-upstream-panics

Conversation

@developer0hye
Copy link
Owner

Summary

  • catch_unwind: Wrap parser.parse() and parse_streaming() with std::panic::catch_unwind so upstream panics in umya-spreadsheet / docx-rs return ConvertError::Parse instead of crashing the process
  • cond_fmt range guard: Skip conditional formatting when ranges exceed 1M cells, preventing OOM from triple-nested loops on extreme-dimension spreadsheets
  • xlsx dimension cap: Cap rendered sheet dimensions to 10K rows x 1K cols to bound memory usage
  • zip pre-flight check: Validate all zip entries before parsing — reject invalid archives and entries exceeding 512 MB uncompressed size (blocks shared-string bombs)
  • denylist update: Add bug62181.xlsx (upstream OOM in umya-spreadsheet)
  • DENYLIST.md: Document all 10 denylisted fixtures with source, size, and specific failure reason

Test plan

  • upstream_panics_return_error — verifies chart_hyperlink.xlsx and check-boolean.xlsx return Err instead of panicking
  • invalid_zip_returns_error — verifies non-zip data is rejected
  • ranges_exceed_limit_* — 3 unit tests for the cond_fmt guard
  • test_denylist_filtering — verifies denylist matching logic
  • Full test suite: 1,076 tests pass (0 failures)
  • Perf validation: all small/medium tier tests pass within budget

Related: #77

🤖 Generated with Claude Code

developer0hye and others added 4 commits March 1, 2026 11:53
- Wrap parser.parse() with catch_unwind in convert_bytes() so panics
  from umya-spreadsheet / docx-rs become ConvertError::Parse instead
  of propagating to the caller. Applies to both normal and streaming
  XLSX paths.

- Add MAX_COND_FMT_CELLS guard in cond_fmt.rs to skip conditional
  formatting rules with ranges exceeding 1M cells, preventing OOM
  from billion-cell iteration loops.

- Cap sheet dimensions to 10,000 rows × 1,000 cols in xlsx.rs to
  prevent unbounded row iteration for extreme-dimension spreadsheets.

- Add upstream_panics_return_error integration test verifying that
  known-panicking fixtures return Err instead of panicking.

Related: #77

Signed-off-by: Yonghye Kwon <developer.0hye@gmail.com>
Validates all zip entries before handing data to upstream parsers.
Rejects archives with entries exceeding 512 MB uncompressed size
(blocks shared-string bombs) and invalid zip data (not OOXML).

Also warm font cache in small-tier perf tests to eliminate flaky
failures from cold-cache overhead in parallel test execution.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Yonghye Kwon <developer.0hye@gmail.com>
Upstream umya-spreadsheet hangs/OOMs on this complex workbook,
causing CI timeouts. Denylisting prevents it from blocking bulk tests.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Yonghye Kwon <developer.0hye@gmail.com>
Documents each denylisted fixture with source, size, and the specific
reason it is excluded from bulk testing (fuzzer corruption, XML bombs,
shared-string OOM, extreme dimensions, upstream hangs).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Yonghye Kwon <developer.0hye@gmail.com>
@developer0hye
Copy link
Owner Author

Superseded by #86 which expanded the denylist and added catch_unwind.

@developer0hye developer0hye deleted the fix/catch-upstream-panics branch March 1, 2026 08:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant