ARROW-11787: [R] Implement write csv#10141
Conversation
…ll WriteCSV functions
3401d83 to
3239575
Compare
| #' @docType class | ||
| #' @usage NULL | ||
| #' @format NULL | ||
| #' @description `CsvReadOptions`, `CsvParseOptions`, `CsvConvertOptions`, |
There was a problem hiding this comment.
This description doesn't look quite correct.
There was a problem hiding this comment.
An alternative to documenting this here (and cleaning up the bad copy-paste) would be to document it with CsvReadOptions et al.
nealrichardson
left a comment
There was a problem hiding this comment.
Nicely done. Some suggestions/leading questions.
| #' @docType class | ||
| #' @usage NULL | ||
| #' @format NULL | ||
| #' @description `CsvReadOptions`, `CsvParseOptions`, `CsvConvertOptions`, |
There was a problem hiding this comment.
An alternative to documenting this here (and cleaning up the bad copy-paste) would be to document it with CsvReadOptions et al.
| #' } | ||
| #' @include arrow-package.R | ||
| write_csv_arrow <- function(x, | ||
| sink, |
There was a problem hiding this comment.
Looks like indentation is slightly off here
| assert_that(length(include_header) == 1) | ||
| assert_that(is.logical(include_header)) |
There was a problem hiding this comment.
What happens if you remove these--will the C++ static typing validate this enough?
What happens if include_header = NA?
There was a problem hiding this comment.
Removed those as totally sensible errors from the C++ as you say. If include_header = NA with the assert_that removed, no header is written.
| assert_that(length(include_header) == 1) | ||
| assert_that(is.logical(include_header)) | ||
|
|
||
| write_options = CsvWriteOptions$create(include_header, batch_size) |
There was a problem hiding this comment.
| write_options = CsvWriteOptions$create(include_header, batch_size) | |
| write_options <- CsvWriteOptions$create(include_header, batch_size) |
| x <- Table$create(x) | ||
| } | ||
|
|
||
| assert_is(x, c("Table", "RecordBatch")) |
There was a problem hiding this comment.
| assert_is(x, c("Table", "RecordBatch")) | |
| assert_is(x, "ArrowTabular") |
|
|
||
| }) | ||
|
|
||
|
|
There was a problem hiding this comment.
You should add tests for handling bad inputs too. Also might make more sense to put the writing tests at the bottom of the test file instead of the top.
|
|
||
| expect_identical(tbl_in, tbl_expected) | ||
|
|
||
| skip("Doesn't yet work with date columns due to ARROW-12540") |
There was a problem hiding this comment.
I don't think you need to test the file-with-dates in every combination of parameters, just the first one is sufficient.
| expect_identical(tbl_in, tbl_expected) | ||
| }) | ||
|
|
||
| test_that("Write a CSV file with different batch sizes", { |
There was a problem hiding this comment.
What is this testing? What does batch_size do? It doesn't look like there is an observable difference in the output.
There was a problem hiding this comment.
batch size dictates how much data is buffered when translating to CSV
There was a problem hiding this comment.
So the output will be the same, but what's happening internally will be different. I included it as I wanted to make sure I could pass through the param, but I guess it's C++ functionality. Should I remove the tests for the different batch sizes and just make sure I can pass through the param once?
nealrichardson
left a comment
There was a problem hiding this comment.
Looks good! Let's try moving the validation like this, and assuming the tests pass I'll merge (or someone else can)
No description provided.