ARROW-9665 (#7913) added amongst other things a head method for Dataset in R:
|
std::shared_ptr<arrow::Table> dataset___Scanner__head( |
|
const std::shared_ptr<ds::Scanner>& scanner, int n) { |
|
// TODO: make this a full Slice with offset > 0 |
|
std::vector<std::shared_ptr<arrow::RecordBatch>> batches; |
|
std::shared_ptr<arrow::RecordBatch> current_batch; |
|
|
|
for (auto st : ValueOrStop(scanner->Scan())) { |
|
for (auto b : ValueOrStop(ValueOrStop(st)->Execute())) { |
|
current_batch = ValueOrStop(b); |
|
batches.push_back(current_batch->Slice(0, n)); |
|
n -= current_batch->num_rows(); |
|
if (n < 0) break; |
|
} |
|
if (n < 0) break; |
|
} |
|
return ValueOrStop(arrow::Table::FromRecordBatches(std::move(batches))); |
|
} |
It might be nice to move this to C++ and expose it on the python side as well (and since it's written already in C++ on the R side, it should be relatively straightforward to port I assume)
Reporter: Joris Van den Bossche / @jorisvandenbossche
Assignee: David Li / @lidavidm
PRs and other links:
Note: This issue was originally created as ARROW-9731. Please see the migration documentation for further details.
ARROW-9665 (#7913) added amongst other things a
headmethod for Dataset in R:arrow/r/src/dataset.cpp
Lines 266 to 282 in 586c060
It might be nice to move this to C++ and expose it on the python side as well (and since it's written already in C++ on the R side, it should be relatively straightforward to port I assume)
Reporter: Joris Van den Bossche / @jorisvandenbossche
Assignee: David Li / @lidavidm
PRs and other links:
Note: This issue was originally created as ARROW-9731. Please see the migration documentation for further details.