Right now, I can:
ds <- open_dataset("some.parquet")
ds %>%
mutate(
o_orderdate = cast(o_orderdate, date32())
) %>%
write_dataset(path = "new.parquet")
but I can't:
tab <- read_parquet("some.parquet", as_data_frame = FALSE)
tab %>%
mutate(
o_orderdate = cast(o_orderdate, date32())
) %>%
write_parquet("new.parquet")
In this case, I can cast the column as a separate command and then write_parquet() after, but it would be nice to be able to us write_parquet() in a pipeline.
This will require a libarrow addition to / another version of WriteParquet that takes a RecordBatchReader instead of a fully-instantiated Table
Reporter: Jonathan Keane / @jonkeane
Related issues:
Note: This issue was originally created as ARROW-14428. Please see the migration documentation for further details.
Right now, I can:
but I can't:
In this case, I can cast the column as a separate command and then
write_parquet()after, but it would be nice to be able to uswrite_parquet()in a pipeline.This will require a libarrow addition to / another version of WriteParquet that takes a RecordBatchReader instead of a fully-instantiated Table
Reporter: Jonathan Keane / @jonkeane
Related issues:
Note: This issue was originally created as ARROW-14428. Please see the migration documentation for further details.