GH-36593: [Python] Add rename_columns method to pyarrow datasets#48289
GH-36593: [Python] Add rename_columns method to pyarrow datasets#48289JonatanMartens wants to merge 8 commits intoapache:mainfrom
Conversation
|
|
raulcd
left a comment
There was a problem hiding this comment.
I don't think the examples where bad you just have to add the imports for doctest to work:
>>> import pyarrow.dataset as ds as seen here:
arrow/python/pyarrow/_dataset.pyx
Lines 371 to 391 in b2e8f25
|
I am not sure about the changes in this PR, mainly because I am not very knowledgable when it comes to Acero and datasets. The functionality seems great to have, but modifying What do you think @rok ? |
|
The change looks good to me in principle. |
Rationale for this change
See #36593
In particular this change is convenient when the column names stored in a file are different from the logical names associated with the columns (see deltalake column mapping feature as an example).
What changes are included in this PR?
Adds the
rename_columnsmethod to datasets in pyarrow.This mehod allows a user to rename columns in the data returned from a scan before actually creating a scanner object.
Are these changes tested?
This PR also add a test for the new
rename_columnsmethod using an InMemoryDataset.Are there any user-facing changes?
Adds the
rename_columnsmethod to pyarrow datasets.