Skip to content

[C++][Python][Parquet] We should allow for overriding to types by providing a schema #27249

Description

@asfimport

The following shouldn't throw

>>> import pyarrow as pa
>>> import pyarrow.parquet as pq
>>> import pyarrow.dataset as ds
>>> pa.__version__
'2.0.0'
>>> schema = pa.schema([pa.field("utf8", pa.utf8())])
>>> table = pa.Table.from_pydict(\{"utf8": ["foo", "bar"]}, schema)
>>> pq.write_table(table, "/tmp/example.parquet")
>>> large_schema = pa.schema([pa.field("utf8", pa.large_utf8())])
>>> ds.dataset("/tmp/example.parquet", schema=large_schema,
format="parquet").to_table()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "pyarrow/_dataset.pyx", line 405, in
pyarrow._dataset.Dataset.to_table
  File "pyarrow/_dataset.pyx", line 2262, in
pyarrow._dataset.Scanner.to_table
  File "pyarrow/error.pxi", line 122, in
pyarrow.lib.pyarrow_internal_check_status
  File "pyarrow/error.pxi", line 107, in pyarrow.lib.check_status
pyarrow.lib.ArrowTypeError: fields had matching names but differing types.
From: utf8: string To: utf8: large_string

Reporter: Micah Kornfield / @emkornfield

Related issues:

Note: This issue was originally created as ARROW-11353. Please see the migration documentation for further details.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions