ARROW-14942: [R] Bindings for lubridate's dpicoseconds, dnanoseconds, desconds, dmilliseconds, dmicroseconds#12855
ARROW-14942: [R] Bindings for lubridate's dpicoseconds, dnanoseconds, desconds, dmilliseconds, dmicroseconds#12855AlenkaF wants to merge 23 commits into
Conversation
|
@dragosmg mind reviewing this one? |
|
I've been thinking a bit about this. Do you think it's worth having a helper function (to avoid all the repetition), something like make_duration <- function(x, unit) {
x <- build_expr("cast", x, options = cast_options(to_type = int64()))
x$cast(duration(unit))
} |
|
Sure, makes sense 👍 Will do. |
dragosmg
left a comment
There was a problem hiding this comment.
Great work. Many thanks. @thisisnic @jonkeane would you mind taking a look and merging the PR.
jonkeane
left a comment
There was a problem hiding this comment.
This looks great, I have one substantive comment about test additions, one small suggestion, and a comment.
There was a problem hiding this comment.
Should we also test what happens when we pass floats here too?
> lubridate::dseconds(1.5)
[1] "1.5s"
Seems to work, so we should ensure we can do that (or error helpfully if we can't for some reason)
There was a problem hiding this comment.
Of course, thanks for this!
Will search for discussions Dragos already had about casting float -> duration, then test and see =)
There was a problem hiding this comment.
As duration type in Arrow is int64 and we can't pass floats here I will go with erroring helpfully. Will add it in the next commit.
There was a problem hiding this comment.
I added a test for the error when the argument multiplied with the value of the multiplication factor of the duration helper function is float (went with easier solution - didn't go forward with forcing evaluation to check for type of an argument or try catching C++ error).
|
@jonkeane I tried to address all the comments and I think the PR is ready for another review. |
|
The errors do not look related ... |
jonkeane
left a comment
There was a problem hiding this comment.
This is fantastic, thank you so much for the work on this.
I have one small question about possibly adding a comment — let me know if you want to add that and I'll wait to merge
| duration_helpers_map_factory <- function(value, unit) { | ||
| force(value) | ||
| force(unit) | ||
| function(x = 1) make_duration(x * value, unit) | ||
| } | ||
|
|
||
| for (name in names(.helpers_function_map)) { | ||
| register_binding( | ||
| name, | ||
| duration_helpers_map_factory( | ||
| .helpers_function_map[[name]][[1]], | ||
| .helpers_function_map[[name]][[2]] | ||
| ) | ||
| ) | ||
| } |
There was a problem hiding this comment.
Nice! This is actually even shorter than I though it would be!
| # double -> duration not supported in Arrow. | ||
| # Error is generated in the C++ code | ||
| expect_error( | ||
| test_df %>% | ||
| arrow_table() %>% | ||
| mutate(r_obj_dminutes = dminutes(1.12345)) %>% | ||
| collect() | ||
| ) |
There was a problem hiding this comment.
Thanks for this comment about why we are expect_error() but not actually asserting it (since this is all C++). 💯
| ) %>% | ||
| collect(), | ||
| example_d, | ||
| ignore_attr = TRUE |
There was a problem hiding this comment.
I didn't see this in the PR (though might have missed something), what attr are we ignoring? Maybe we should add a comment about what we're using that for
There was a problem hiding this comment.
I will add a comment, you can wait with merging. But I have to remember, if I am honest =) Will do it tomorrow morning and add the comment then.
Thank you for the review!
There was a problem hiding this comment.
We are using ignore_attr = TRUE due to the diff in attributes package, units and class: (difftime vs Duration). I added a comment about it in the beginning of both tests.
|
Errors do not seem to be related to this PR. |
|
Benchmark runs are scheduled for baseline = 0ce8ce8 and contender = c4b646e. c4b646e is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
|
['Python', 'R'] benchmarks have high level of regressions. |
This PR adds bindings for lubridate's
dseconds,dmilliseconds,dmicrosecondsanddnanoseconds.As
picosecondsare not supported by duration in Arrow and duration is of integer type, the call topicoseconds()raises a warning.