Skip to content

feat: detect encrypted/password-protected OOXML files#88

Merged
developer0hye merged 4 commits intomainfrom
ralph/phase16-encrypted-file-detection
Mar 1, 2026
Merged

feat: detect encrypted/password-protected OOXML files#88
developer0hye merged 4 commits intomainfrom
ralph/phase16-encrypted-file-detection

Conversation

@developer0hye
Copy link
Owner

Summary

  • Add ConvertError::UnsupportedEncryption variant for encrypted/password-protected files
  • Detect OLE2 Compound Binary File magic bytes (D0 CF 11 E0 A1 B1 1A E1) before zip extraction, returning a clear error instead of misleading "invalid zip archive" messages
  • Add expected-error categorization in bulk tests so encrypted files don't count against the conversion success rate

Related: #82

Changes

ConvertError::UnsupportedEncryption (US-320)

  • New error variant with descriptive message: "file is encrypted/password-protected and cannot be converted"

OLE2 magic byte detection (US-321)

  • is_ole2() function checks the first 8 bytes of input data
  • Check runs in convert_bytes() before parser dispatch — applies to all formats (DOCX, PPTX, XLSX)
  • All 7 encrypted fixture files now return the correct error:
    • Encrypted_LO_Standard_abc.docx
    • Encrypted_MSO2007_abc.docx
    • Encrypted_MSO2010_abc.docx
    • Encrypted_MSO2013_abc.docx
    • bug53475-password-is-pass.docx
    • bug53475-password-is-solrcell.docx
    • protected_passtika.xlsx

Bulk test expected-error list (US-322)

  • Added EXPECTED_ERRORS list separate from the denylist
  • Encrypted files are tagged as ExpectedError outcome
  • Expected errors are excluded from the success rate denominator
  • Report distinguishes expected errors from unexpected errors

Test plan

  • Unit tests for is_ole2() with OLE2 magic, ZIP magic, short data, and empty data
  • Unit tests for UnsupportedEncryption error display and debug formatting
  • Integration tests for all 7 encrypted fixture files (6 DOCX + 1 XLSX)
  • Unit tests for convert_bytes() returning UnsupportedEncryption for all 3 formats
  • Unit tests for expected-error filtering and success rate calculation
  • cargo test --workspace passes
  • cargo fmt --all -- --check passes
  • cargo clippy --workspace -- -D warnings passes

🤖 Generated with Claude Code

developer0hye and others added 4 commits March 1, 2026 17:37
Add a new error variant to clearly communicate when a file is
encrypted/password-protected and cannot be converted, instead of
returning misleading generic parse errors.

Signed-off-by: Yonghye Kwon <developer.0hye@gmail.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Yonghye Kwon <developer.0hye@gmail.com>
Check for OLE2 Compound Binary File magic bytes (D0 CF 11 E0 A1 B1 1A E1)
at the start of input data before attempting zip extraction. Encrypted
OOXML files use OLE2 containers, which previously caused misleading
"invalid zip archive" errors.

All 7 encrypted fixture files now return a clear UnsupportedEncryption
error instead of confusing parse failures.

Signed-off-by: Yonghye Kwon <developer.0hye@gmail.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Yonghye Kwon <developer.0hye@gmail.com>
Introduce an EXPECTED_ERRORS list for encrypted/password-protected
fixture files so they are tracked separately from unexpected errors in
bulk conversion tests. Expected errors do not count against the
conversion success rate, since they exercise a valid code path
(OLE2 detection -> UnsupportedEncryption).

Signed-off-by: Yonghye Kwon <developer.0hye@gmail.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Yonghye Kwon <developer.0hye@gmail.com>
The Test CI job does not enable lfs: true, so encrypted fixture files
are Git LFS pointers instead of actual OLE2 binaries. Add is_lfs_pointer()
check to skip these tests gracefully on CI.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Yonghye Kwon <developer.0hye@gmail.com>
@developer0hye developer0hye force-pushed the ralph/phase16-encrypted-file-detection branch from 495d6e9 to 7e70dc9 Compare March 1, 2026 08:38
@developer0hye developer0hye merged commit 391d5fa into main Mar 1, 2026
14 checks passed
@developer0hye developer0hye deleted the ralph/phase16-encrypted-file-detection branch March 1, 2026 08:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant