Skip to content

Conversation

@jdblischak
Copy link
Collaborator

This is a follow-up to #427:

  • I converted the dplyr merges to data.table (https://github.com/Merck/gsDesign2/pull/427/files#r1653205459). The column order was slightly different when merging with data.table (the column t was the first column). I wasn't sure if this mattered, but just in case, I enforced the identical column order
  • I did my best to update the documentation of the object returned by pw_info(). Since the column returned is hr and not ahr , I also removed "average" from its description. Please let me know if I should add it back. Also, if someone can provide brief descriptions of the columns stratum and t, I could also add these
  • I renamed the test file for pw_info() to follow the naming convention (https://github.com/Merck/gsDesign2/pull/427/files#r1653206548)

@jdblischak jdblischak self-assigned this Jun 26, 2024
@jdblischak
Copy link
Collaborator Author

I also added the new NSE variables in pw_info() to the list of global variables in globals.R

Copy link
Collaborator

@LittleBeannie LittleBeannie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, @jdblischak ! Nice to learn the left join of data.table can be realized by merge.

I did a minor commit by ordering the column n and event together, since they are usually reported jointly.

@LittleBeannie LittleBeannie merged commit 3792965 into Merck:main Jun 27, 2024
@jdblischak
Copy link
Collaborator Author

Nice to learn the left join of data.table can be realized by merge.

@LittleBeannie A quick follow-up. I realized that merge.data.table performs an inner merge by default (only keeps rows that match between x and y). To perform a left join (keep all rows of x regardless of whether they have a match in y), I would need to add the argument all.x = TRUE. Is this required? Will there ever be the case where there will be values in ans that do not have a matching row in tbl_time and tbl_n?

@jdblischak jdblischak deleted the follow-up-427 branch June 27, 2024 17:42
@LittleBeannie
Copy link
Collaborator

Nice to learn the left join of data.table can be realized by merge.

@LittleBeannie A quick follow-up. I realized that merge.data.table performs an inner merge by default (only keeps rows that match between x and y). To perform a left join (keep all rows of x regardless of whether they have a match in y), I would need to add the argument all.x = TRUE. Is this required? Will there ever be the case where there will be values in ans that do not have a matching row in tbl_time and tbl_n?

Hi @jdblischak , there should NOT be a case where there will be values in ans that do not have a matching row in tbl_time and tbl_n. The table ans is commonly a long table, while the tbl_time and tbl_n are commonly short. I plan to keep all the records of ans through left join.

@jdblischak
Copy link
Collaborator Author

jdblischak commented Jun 27, 2024

there should NOT be a case where there will be values in ans that do not have a matching row in tbl_time and tbl_n. The table ans is commonly a long table, while the tbl_time and tbl_n are commonly short. I plan to keep all the records of ans through left join.

I feel like we might still not be 100% on the same page. The default behavior of merge() is to perform an inner join, not a left join. However, given your description above, I don't think performing an inner join is a problem.

If all the records of ans are present in tbl_time and tbl_n, then an inner join is equivalent to a left join. They would only differ if there was a record in ans that wasn't present in the other tables. In that case, an inner join would drop that row, whereas a left join would keep the row while inserting NAs into the new columns.

Here is a quick demo to highlight the difference between inner and left joins:

library("dplyr")

x <- data.frame(key = rep(1:3, times = 3), x = rnorm(9))
y <- data.frame(key = 1:3, y = letters[1:3])

# Inner and left joins are equivalent when all records in x have a match in y
identical(inner_join(x, y), left_join(x, y))
## [1] TRUE

# Inner and left joins differ when record(s) in x are missing from y
x <- rbind(data.frame(key = 4, x = rnorm(1)), x)
identical(inner_join(x, y), left_join(x, y))
## [1] FALSE

Thanks for the clear explanation. I double checked with ans, tbl_n, tbl_time and I could not come up with case when ans has records not covered in tbl_n or tbl_time. So I guess our current code is good!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants