Skip to content

[R] [C++] Allow me to write_parquet() from an arrow_dplyr_query  #29992

Description

@asfimport

Right now, I can:

ds <- open_dataset("some.parquet")
ds %>% 
  mutate(
    o_orderdate = cast(o_orderdate, date32())  
  ) %>% 
  write_dataset(path = "new.parquet")

but I can't:

tab <- read_parquet("some.parquet", as_data_frame = FALSE)
tab %>% 
  mutate(
    o_orderdate = cast(o_orderdate, date32())  
  ) %>% 
  write_parquet("new.parquet")

In this case, I can cast the column as a separate command and then write_parquet() after, but it would be nice to be able to us write_parquet() in a pipeline.

This will require a libarrow addition to / another version of WriteParquet that takes a RecordBatchReader instead of a fully-instantiated Table

Reporter: Jonathan Keane / @jonkeane

Related issues:

Note: This issue was originally created as ARROW-14428. Please see the migration documentation for further details.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions