feat: detect encrypted/password-protected OOXML files#88
Merged
developer0hye merged 4 commits intomainfrom Mar 1, 2026
Merged
Conversation
Add a new error variant to clearly communicate when a file is encrypted/password-protected and cannot be converted, instead of returning misleading generic parse errors. Signed-off-by: Yonghye Kwon <developer.0hye@gmail.com> Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Yonghye Kwon <developer.0hye@gmail.com>
Check for OLE2 Compound Binary File magic bytes (D0 CF 11 E0 A1 B1 1A E1) at the start of input data before attempting zip extraction. Encrypted OOXML files use OLE2 containers, which previously caused misleading "invalid zip archive" errors. All 7 encrypted fixture files now return a clear UnsupportedEncryption error instead of confusing parse failures. Signed-off-by: Yonghye Kwon <developer.0hye@gmail.com> Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Yonghye Kwon <developer.0hye@gmail.com>
Introduce an EXPECTED_ERRORS list for encrypted/password-protected fixture files so they are tracked separately from unexpected errors in bulk conversion tests. Expected errors do not count against the conversion success rate, since they exercise a valid code path (OLE2 detection -> UnsupportedEncryption). Signed-off-by: Yonghye Kwon <developer.0hye@gmail.com> Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Yonghye Kwon <developer.0hye@gmail.com>
The Test CI job does not enable lfs: true, so encrypted fixture files are Git LFS pointers instead of actual OLE2 binaries. Add is_lfs_pointer() check to skip these tests gracefully on CI. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Yonghye Kwon <developer.0hye@gmail.com>
495d6e9 to
7e70dc9
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
ConvertError::UnsupportedEncryptionvariant for encrypted/password-protected filesD0 CF 11 E0 A1 B1 1A E1) before zip extraction, returning a clear error instead of misleading "invalid zip archive" messagesRelated: #82
Changes
ConvertError::UnsupportedEncryption(US-320)OLE2 magic byte detection (US-321)
is_ole2()function checks the first 8 bytes of input dataconvert_bytes()before parser dispatch — applies to all formats (DOCX, PPTX, XLSX)Encrypted_LO_Standard_abc.docxEncrypted_MSO2007_abc.docxEncrypted_MSO2010_abc.docxEncrypted_MSO2013_abc.docxbug53475-password-is-pass.docxbug53475-password-is-solrcell.docxprotected_passtika.xlsxBulk test expected-error list (US-322)
EXPECTED_ERRORSlist separate from the denylistExpectedErroroutcomeTest plan
is_ole2()with OLE2 magic, ZIP magic, short data, and empty dataUnsupportedEncryptionerror display and debug formattingconvert_bytes()returningUnsupportedEncryptionfor all 3 formatscargo test --workspacepassescargo fmt --all -- --checkpassescargo clippy --workspace -- -D warningspasses🤖 Generated with Claude Code