Skip to content

Avoid EndOfStreamException with empty pages#95

Merged
aloneguid merged 2 commits intoaloneguid:masterfrom
ishepherd:endofstreamexception
Feb 28, 2021
Merged

Avoid EndOfStreamException with empty pages#95
aloneguid merged 2 commits intoaloneguid:masterfrom
ishepherd:endofstreamexception

Conversation

@ishepherd
Copy link
Copy Markdown
Contributor

Fixes

Fixes #88

Description

Allows for there to be no RLE or Bitpacked values on a data page. On occasion I find Spark creates these 'empty' pages. They have a valid header, and accurate byte counts, but no values.

I'm not clear whether these are allowed by the spec... but they are 'de facto' correct: Spark writes them; Python/pandas can read them ok.

There is no test in this PR. This Gist contains the test I use locally, unfortunately I cannot share the file that goes with it.
⚠️ Help needed. @peteriehl can you help to provide a repro file?

  • I have included unit tests validating this fix.
  • I have updated markdown documentation where required. not applicable
  • I understand that successful approval of my pull request requires reproducible tests as per Contribution Guideline.

@aloneguid aloneguid added this to the 3.8.6 milestone Feb 28, 2021
@aloneguid aloneguid merged commit 88f37d2 into aloneguid:master Feb 28, 2021
@ishepherd ishepherd deleted the endofstreamexception branch March 2, 2021 01:20
@ishepherd
Copy link
Copy Markdown
Contributor Author

@aloneguid My first OSS contribution ✨😄
Thanks for all your work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

EndOfStreamException - can't read empty data page

2 participants