Skip to content

Conversation

@jdblischak
Copy link
Collaborator

The profiling indicated that dplyr::group_by() and dplyr::arrange() were two of the biggest bottlenecks:

image

I confirmed these were the ones in pw_info():

gsDesign2/R/pw_info.R

Lines 207 to 210 in 6d4371a

ans <- ans %>%
group_by(time, stratum) %>%
arrange(t, .by_group = TRUE) %>%
ungroup()

In my previous PR #295, I had delayed converting this to data.table. Once I remembered that group_by() also orders the columns, I was able to figure it out. Here is example code to show that the two approaches are identical:

set.seed(1)
d <- data.frame(
  x = sample(letters[1:3], size = 100, replace = TRUE),
  y = sample(letters[4:5], size = 100, replace = TRUE),
  z = sample(1:5, size = 100, replace = TRUE)
)

library("dplyr")
x1 <- d %>% group_by(x, y) %>% arrange(z, .by_group = TRUE) %>% ungroup()
x1 <- as.data.frame(x1)

library("data.table")
x2 <- as.data.table(d)
setorder(x2, x, y)
x2 <- x2[order(z), .SD, by = .(x, y)]
setDF(x2)

all.equal(x1, x2)

In repeated benchmarking of gs_power_ahr(), this update removes reduces the runtime by 70-140 ms

I also removed an isolated dplyr::transmute() from gs_info_rd() and updated the documentation to reflect that now a data frame is returned

@jdblischak jdblischak self-assigned this Dec 19, 2023
@jdblischak
Copy link
Collaborator Author

xref: #219

Copy link
Collaborator

@keaven keaven left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update

@nanxstats nanxstats merged commit 9ef6050 into Merck:main Dec 22, 2023
@jdblischak jdblischak deleted the replace-grouped-sort branch December 22, 2023 16:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants