Skip to content

feat: integrate patched docx-rs fork to fix DOCX parse failures#91

Merged
developer0hye merged 1 commit intomainfrom
ralph/phase19-fix-docx-rs
Mar 1, 2026
Merged

feat: integrate patched docx-rs fork to fix DOCX parse failures#91
developer0hye merged 1 commit intomainfrom
ralph/phase19-fix-docx-rs

Conversation

@developer0hye
Copy link
Owner

Summary

  • Fork and patch docx-rs to fix 1 panic and 15 parse errors on valid DOCX files
  • Integrate the patched fork into office2pdf via [patch.crates-io]
  • Add 30 new integration tests (15 smoke + 15 structure) for previously-failing files

Changes in docx-rs fork (developer0hye/docx-rs, branch fix/parse-tolerance)

Fix 1: Width parsing panic (US-350)

The width parser called unwrap() on f64::from_str(), which panicked on Strict OOXML documents using 'dxa' unit values (e.g. 1440dxa). Fixed by parsing the numeric prefix and ignoring trailing unit suffixes.

Fix 2: Missing document rels tolerance (US-351)

read_document_rels() returned ZipError(FileNotFound) when word/_rels/document.xml.rels was missing. Many minimal DOCX files (especially from LibreOffice's test suite) omit this optional file. Fixed by returning empty rels instead of failing.

Fix 3: Missing optional parts tolerance (US-351)

Styles, numberings, settings, and web settings reads now skip gracefully when referenced files are missing or unparseable, rather than failing the entire document. This fixes tdf129659.docx where numbering.xml was referenced in rels but absent from the zip.

Fix 4: Font size unit suffix tolerance (US-351)

Font size values like "20pt" (with unit suffix) caused ParseFloatError. Fixed by stripping known suffixes before parsing.

Results

Before After Delta
1 panic 0 panics -1
16 parse errors 2 expected errors -14 fixed
0/19 files parse 15/19 files parse +15

Remaining expected errors (4 files)

  • tdf171025_pageAfter.docx, tdf171038_pageAfter.docx — ODF files with .docx extension (not real DOCX)
  • math-malformed_xml.docx — intentionally malformed XML (expected error)
  • deep-table-cell.docx — on denylist (stack overflow risk from deeply nested tables)

Test plan

  • cargo test --workspace passes
  • cargo fmt --all -- --check passes
  • cargo clippy --workspace -- -D warnings passes
  • 30 new integration tests for previously-failing DOCX files all pass
  • docx-rs fork's own test suite passes (pre-existing snapshot failures unchanged)

Related: #84

🤖 Generated with Claude Code

Add integration tests for 15 previously-failing DOCX files that now
parse and convert successfully with the patched docx-rs fork.

Signed-off-by: Yonghye Kwon <developer.0hye@gmail.com>
@developer0hye developer0hye merged commit d120b3a into main Mar 1, 2026
14 checks passed
@developer0hye developer0hye deleted the ralph/phase19-fix-docx-rs branch March 1, 2026 10:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant