Currently, you get when path is an existing but empty directory:
>>> dataset = pq.ParquetDataset(path)
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-16-346f72ae525e> in <module>
----> 1 dataset = pq.ParquetDataset(path)
~/scipy/repos/arrow/python/pyarrow/parquet.py in __init__(self, path_or_paths, filesystem, schema, metadata, split_row_groups, validate_schema, filters, metadata_nthreads, memory_map)
989
990 if validate_schema:
--> 991 self.validate_schemas()
992
993 if filters is not None:
~/scipy/repos/arrow/python/pyarrow/parquet.py in validate_schemas(self)
1025 self.schema = self.common_metadata.schema
1026 else:
-> 1027 self.schema = self.pieces[0].get_metadata().schema
1028 elif self.schema is None:
1029 self.schema = self.metadata.schema
IndexError: list index out of range
That could be a nicer error message.
Unless we actually want to allow this? (although I am not sure there are good use cases of empty directories to support this, because from an empty directory we cannot get any schema or metadata information?)
It is only failing when validating the schemas, so with validate_schema=False it actually returns a ParquetDataset object, just with an empty list for pieces and no schema. So it would be easy to not error when validating the schemas as well for this empty-directory case.
Reporter: Joris Van den Bossche / @jorisvandenbossche
Assignee: Joris Van den Bossche / @jorisvandenbossche
Related issues:
Note: This issue was originally created as ARROW-5310. Please see the migration documentation for further details.
Currently, you get when
pathis an existing but empty directory:That could be a nicer error message.
Unless we actually want to allow this? (although I am not sure there are good use cases of empty directories to support this, because from an empty directory we cannot get any schema or metadata information?)
It is only failing when validating the schemas, so with
validate_schema=Falseit actually returns a ParquetDataset object, just with an empty list forpiecesand no schema. So it would be easy to not error when validating the schemas as well for this empty-directory case.Reporter: Joris Van den Bossche / @jorisvandenbossche
Assignee: Joris Van den Bossche / @jorisvandenbossche
Related issues:
Note: This issue was originally created as ARROW-5310. Please see the migration documentation for further details.