Skip to content

[C++] The C++ API for writing datasets could be improved #30891

Description

@asfimport

I was working on write dataset testing in the C++ API today and ran into a number of things that were not very intuitive. All of these are abstracted away / hidden by the python / R interface so this really only applies to anyone using the C++ API directly.

  • If no partitioning is specified the write will segfault. Instead it should us a default (no-op) partitioning.
  • The min_rows_per_group option should probably default to something higher than 0
  • It's not clear how to specify the format (you do it by creating a format, then setting the file write options, which sets the format privately)
  • There is no default for basename_template
  • There is no default for filesystem (should be local filesystem)

Reporter: Weston Pace / @westonpace

Related issues:

PRs and other links:

Note: This issue was originally created as ARROW-15409. Please see the migration documentation for further details.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions