Skip to content

ORC-2167: [C++] Fix integer overflow in PostScript footer length validation#2629

Open
luffy-zh wants to merge 2 commits into
apache:mainfrom
luffy-zh:ORC-2167
Open

ORC-2167: [C++] Fix integer overflow in PostScript footer length validation#2629
luffy-zh wants to merge 2 commits into
apache:mainfrom
luffy-zh:ORC-2167

Conversation

@luffy-zh
Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

Added overflow-safe bounds checking in three locations in c++/src/Reader.cc using __builtin_add_overflow to prevent integer overflow when parsing malformed
ORC files with extremely large length values in PostScript:

  1. readMetadata() - Use __builtin_add_overflow to safely compute total tail size (footerLength + metadataSize + postscriptLength + 1) before comparing
    against fileLength
  2. createReader() - Use __builtin_add_overflow to safely compute tailSize (1 + postscriptLength + footerSize) before bounds validation
  3. startNextStripe() - Use __builtin_add_overflow to safely sum stripe length components (offset + indexLength + dataLength + footerLength) before
    comparing against fileLength

Why are the changes needed?

When an ORC file's PostScript is crafted with footer_length or metadata_length set to UINT64_MAX, the bounds check using unsigned addition (e.g.,
metadataSize + footerLength + postscriptLength_ + 1) can overflow and wrap around to a small value, bypassing validation.

For example, given a 58-byte file with footerLength = UINT64_MAX, metadataSize = 0, and postscriptLength_ = 23:

  • The sum wraps to 23 (mod 2^64)
  • The check 58 < 23 evaluates to false, allowing invalid offset calculation
  • This can cause SIGBUS crash due to out-of-bounds memory access

Using __builtin_add_overflow (a GCC/Clang builtin) provides a standard, efficient way to detect overflow without manual subtraction logic.

How was this patch tested?

Added two unit tests in c++/test/TestReader.cc:

  1. testMalformedFooterLengthOverflow - Constructs a minimal malformed ORC file with footer_length = UINT64_MAX and verifies that createReader() throws
    ParseError
  2. testMalformedMetadataLengthOverflow - Constructs a malformed file with metadata_length = UINT64_MAX and verifies ParseError is thrown

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude code

shouzhi and others added 2 commits May 22, 2026 17:17
…dation

Added overflow-safe bounds checking in three locations in Reader.cc:
1. readMetadata() - check footer/metadata/postscript lengths before bounds validation
2. createReader() - check footer/postscript lengths before calculating tailSize
3. startNextStripe() - check stripe offset/length values

Added unit tests to verify malformed ORC files with extremely large
footer_length or metadata_length values throw ParseError instead of
causing integer overflow or crash:
- testMalformedFooterLengthOverflow
- testMalformedMetadataLengthOverflow
Use compiler built-in for safer and cleaner overflow detection in
Reader.cc metadata, stripe info, and postscript footer validations.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant