Skip to content

Parquet crashing on date column #147

@maxigit

Description

@maxigit

Describe the bug
Trying to load a parquet file with some date in it. Get this

ghci> D.readParquet "/home/max/Stats/Parquet/0_debtor_trans.parquet" 
*** Exception: UNKNOWN field ID for TimestampType3
CallStack (from HasCallStack):
  error, called at src/DataFrame/IO/Parquet/Thrift.hs:1151:18 in dataframe-0.4.1.0-49JzeyNWMOsLQT4sWvaYDS:DataFrame.IO.Parquet.Thrift

To Reproduce
Unfortunately, the data comes from production data which I can't give away and I don't really have time now to try to anonymize and/or extract data to reproduce the problem.

If it helps the parquet file has been created dumping a mysql database using python and duckdb. The table schema is

Field	Type	Null	Key	Default	Extra
trans_id	int(11)	NO	PRI	NULL	auto_increment
trans_no	int(11)	NO		0	
stock_id	char(20)	NO	MUL		
type	smallint(6)	NO	MUL	0	
loc_code	char(5)	NO			
tran_date	date	NO		0000-00-00	
person_id	int(11)	YES		NULL	
price	double	NO		0	
reference	char(40)	NO			
qty	double	NO		1	
discount_percent	double	NO		0	
standard_cost	double	NO		0	
visible	tinyint(1)	NO		1	

The issue seems to come from tran_date : date as every table with a date crashes.
readParquet seem to be expecting a timestamp where a simple date (possibly stored as a string) is given.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions