Use data.table for grouped sort in `pw_info()` #300

jdblischak · 2023-12-19T16:32:55Z

The profiling indicated that dplyr::group_by() and dplyr::arrange() were two of the biggest bottlenecks:

I confirmed these were the ones in pw_info():

Lines 207 to 210 in 6d4371a

    
           ans <- ans %>% 
        
             group_by(time, stratum) %>% 
        
             arrange(t, .by_group = TRUE) %>% 
        
             ungroup()

In my previous PR #295, I had delayed converting this to data.table. Once I remembered that group_by() also orders the columns, I was able to figure it out. Here is example code to show that the two approaches are identical:

set.seed(1)
d <- data.frame(
  x = sample(letters[1:3], size = 100, replace = TRUE),
  y = sample(letters[4:5], size = 100, replace = TRUE),
  z = sample(1:5, size = 100, replace = TRUE)
)

library("dplyr")
x1 <- d %>% group_by(x, y) %>% arrange(z, .by_group = TRUE) %>% ungroup()
x1 <- as.data.frame(x1)

library("data.table")
x2 <- as.data.table(d)
setorder(x2, x, y)
x2 <- x2[order(z), .SD, by = .(x, y)]
setDF(x2)

all.equal(x1, x2)

In repeated benchmarking of gs_power_ahr(), this update removes reduces the runtime by 70-140 ms

I also removed an isolated dplyr::transmute() from gs_info_rd() and updated the documentation to reflect that now a data frame is returned

jdblischak · 2023-12-20T18:58:06Z

xref: #219

keaven

Thanks for the update

Use data.table for grouped sort in pw_info()

c926182

jdblischak requested review from keaven and nanxstats December 19, 2023 16:32

jdblischak self-assigned this Dec 19, 2023

keaven approved these changes Dec 20, 2023

View reviewed changes

nanxstats merged commit 9ef6050 into Merck:main Dec 22, 2023

jdblischak deleted the replace-grouped-sort branch December 22, 2023 16:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Use data.table for grouped sort in `pw_info()` #300

Use data.table for grouped sort in `pw_info()` #300

Uh oh!

jdblischak commented Dec 19, 2023

Uh oh!

jdblischak commented Dec 20, 2023

Uh oh!

keaven left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	ans <- ans %>%
	group_by(time, stratum) %>%
	arrange(t, .by_group = TRUE) %>%
	ungroup()

Use data.table for grouped sort in pw_info() #300

Use data.table for grouped sort in pw_info() #300

Uh oh!

Conversation

jdblischak commented Dec 19, 2023

Uh oh!

jdblischak commented Dec 20, 2023

Uh oh!

keaven left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Use data.table for grouped sort in `pw_info()` #300

Use data.table for grouped sort in `pw_info()` #300