Skip to content

ARROW-14942: [R] Bindings for lubridate's dpicoseconds, dnanoseconds, desconds, dmilliseconds, dmicroseconds#12855

Closed
AlenkaF wants to merge 23 commits into
apache:masterfrom
AlenkaF:ARROW-14942
Closed

ARROW-14942: [R] Bindings for lubridate's dpicoseconds, dnanoseconds, desconds, dmilliseconds, dmicroseconds#12855
AlenkaF wants to merge 23 commits into
apache:masterfrom
AlenkaF:ARROW-14942

Conversation

@AlenkaF

@AlenkaF AlenkaF commented Apr 11, 2022

Copy link
Copy Markdown
Member

This PR adds bindings for lubridate's dseconds, dmilliseconds, dmicroseconds and dnanoseconds.

As picoseconds are not supported by duration in Arrow and duration is of integer type, the call to picoseconds() raises a warning.

@github-actions

Copy link
Copy Markdown

@amol-

amol- commented Apr 13, 2022

Copy link
Copy Markdown
Member

@dragosmg mind reviewing this one?

Comment thread r/R/dplyr-funcs-datetime.R Outdated
Comment thread r/R/dplyr-funcs-datetime.R Outdated
Comment thread r/R/dplyr-funcs-datetime.R Outdated
@dragosmg

Copy link
Copy Markdown
Contributor

I've been thinking a bit about this. Do you think it's worth having a helper function (to avoid all the repetition), something like make_duration(x, unit)?
Where:

make_duration <- function(x, unit) {
  x <- build_expr("cast", x, options = cast_options(to_type = int64()))
  x$cast(duration(unit))
}

@AlenkaF

AlenkaF commented Apr 14, 2022

Copy link
Copy Markdown
Member Author

Sure, makes sense 👍 Will do.

Comment thread r/R/dplyr-funcs-datetime.R Outdated

@dragosmg dragosmg left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work. Many thanks. @thisisnic @jonkeane would you mind taking a look and merging the PR.

@thisisnic thisisnic self-requested a review April 14, 2022 16:56

@jonkeane jonkeane left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great, I have one substantive comment about test additions, one small suggestion, and a comment.

Comment thread r/tests/testthat/test-dplyr-funcs-datetime.R Outdated
Comment thread r/R/dplyr-funcs-datetime.R Outdated

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also test what happens when we pass floats here too?

> lubridate::dseconds(1.5)
[1] "1.5s"

Seems to work, so we should ensure we can do that (or error helpfully if we can't for some reason)

@AlenkaF AlenkaF Apr 15, 2022

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course, thanks for this!
Will search for discussions Dragos already had about casting float -> duration, then test and see =)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As duration type in Arrow is int64 and we can't pass floats here I will go with erroring helpfully. Will add it in the next commit.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ARROW-16253 might be relevant here too.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a test for the error when the argument multiplied with the value of the multiplication factor of the duration helper function is float (went with easier solution - didn't go forward with forcing evaluation to check for type of an argument or try catching C++ error).

@AlenkaF

AlenkaF commented Apr 21, 2022

Copy link
Copy Markdown
Member Author

@jonkeane I tried to address all the comments and I think the PR is ready for another review.

@AlenkaF

AlenkaF commented Apr 21, 2022

Copy link
Copy Markdown
Member Author

The errors do not look related ...

@jonkeane jonkeane left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fantastic, thank you so much for the work on this.

I have one small question about possibly adding a comment — let me know if you want to add that and I'll wait to merge

Comment on lines +373 to +387
duration_helpers_map_factory <- function(value, unit) {
force(value)
force(unit)
function(x = 1) make_duration(x * value, unit)
}

for (name in names(.helpers_function_map)) {
register_binding(
name,
duration_helpers_map_factory(
.helpers_function_map[[name]][[1]],
.helpers_function_map[[name]][[2]]
)
)
}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! This is actually even shorter than I though it would be!

Comment on lines +1307 to +1314
# double -> duration not supported in Arrow.
# Error is generated in the C++ code
expect_error(
test_df %>%
arrow_table() %>%
mutate(r_obj_dminutes = dminutes(1.12345)) %>%
collect()
)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this comment about why we are expect_error() but not actually asserting it (since this is all C++). 💯

) %>%
collect(),
example_d,
ignore_attr = TRUE

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't see this in the PR (though might have missed something), what attr are we ignoring? Maybe we should add a comment about what we're using that for

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will add a comment, you can wait with merging. But I have to remember, if I am honest =) Will do it tomorrow morning and add the comment then.

Thank you for the review!

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are using ignore_attr = TRUE due to the diff in attributes package, units and class: (difftime vs Duration). I added a comment about it in the beginning of both tests.

@AlenkaF

AlenkaF commented Apr 22, 2022

Copy link
Copy Markdown
Member Author

Errors do not seem to be related to this PR.

@AlenkaF AlenkaF requested a review from jonkeane April 22, 2022 06:10
@jonkeane jonkeane closed this in c4b646e Apr 22, 2022
@AlenkaF AlenkaF deleted the ARROW-14942 branch April 22, 2022 14:45
@ursabot

ursabot commented Apr 25, 2022

Copy link
Copy Markdown

Benchmark runs are scheduled for baseline = 0ce8ce8 and contender = c4b646e. c4b646e is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Finished ⬇️0.0% ⬆️0.0%] ec2-t3-xlarge-us-east-2
[Failed] test-mac-arm
[Failed ⬇️1.13% ⬆️0.0%] ursa-i9-9960x
[Finished ⬇️0.25% ⬆️0.08%] ursa-thinkcentre-m75q
Buildkite builds:
[Finished] <https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ec2-t3-xlarge-us-east-2/builds/586| c4b646e7 ec2-t3-xlarge-us-east-2>
[Failed] <https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-test-mac-arm/builds/574| c4b646e7 test-mac-arm>
[Failed] <https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-i9-9960x/builds/572| c4b646e7 ursa-i9-9960x>
[Finished] <https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-thinkcentre-m75q/builds/584| c4b646e7 ursa-thinkcentre-m75q>
[Finished] <https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ec2-t3-xlarge-us-east-2/builds/585| 0ce8ce8b ec2-t3-xlarge-us-east-2>
[Failed] <https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-test-mac-arm/builds/573| 0ce8ce8b test-mac-arm>
[Failed] <https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-i9-9960x/builds/571| 0ce8ce8b ursa-i9-9960x>
[Finished] <https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-thinkcentre-m75q/builds/583| 0ce8ce8b ursa-thinkcentre-m75q>
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

@ursabot

ursabot commented Apr 26, 2022

Copy link
Copy Markdown

['Python', 'R'] benchmarks have high level of regressions.
ursa-i9-9960x

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants