-
-
Notifications
You must be signed in to change notification settings - Fork 180
Open
Labels
🔮future improvementThis issue will take some time to integrate.This issue will take some time to integrate.
Description
Library Version
4.5.1
OS
Windows
OS Architecture
64 bit
How to reproduce?
Create a very large parquet file with a single rowgroup, For example I have a parquet file with 50M rows and a dozen columns.
Attempt to read the file using
using var fileReader = await ParquetReader.CreateAsync(fileStream);
var rowGroup = await fileReader.ReadEntireRowGroupAsync(0);
foreach (var column in rowGroup
{
..get data using column.Data
}The following exception is thrown..
sourceIndex ('-2147483520') must be greater than or equal to '0'. (Parameter 'sourceIndex')
Actual value was -2147483520
-2147483520 is 0x80000080 so it looks like sourceIndex is an Int32 that has wrapped around.
It's possible to workaround this issue by saving the file in multiple rowgroups so the assumption seems to be that no single rowgroup will be larger than 0x80000000 bytes(?).
If the reader wants to maintain this assumption it would be useful if the Write function could throw when attempting to write too large a rowgroup so as to avoid accidentally building up a library of unreadable files!
Failing test
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
🔮future improvementThis issue will take some time to integrate.This issue will take some time to integrate.