feature/DAT-1729#17
Conversation
2f9ed25 to
31bbc5f
Compare
… operations and error handling
michalhuryn-montrose
left a comment
There was a problem hiding this comment.
@parmatys please respond to following findings 😄
| remaining <- partial$input[to_redo] | ||
| args <- list(...) | ||
| args[[input_arg]] <- remaining | ||
|
|
||
| new_rows <- do.call(fn, args) |
There was a problem hiding this comment.
When a resumed run fast-fails, $partial only has the rows we just retried, not the originals. So if someone wants to fix-and-resume in a loop, they can't just keep passing $partial back in, they'd quietly lose the rows that already succeeded the first time. Could we mention that in the README resume section (and ?resume_bulk) with the splice one-liner?
There was a problem hiding this comment.
Behavior fix: when the nested bulk call fast-fails, we now merge that inner partial into the outer tibble and re-throw databraryr_bulk_error with partial set to that full tibble, so repeated e$partial in a loop no longer drops rows that already succeeded
| preflight_session_duplicates <- function(state, vol_id, session_id, vb, rq) { | ||
| filenames <- basename(state$input) | ||
| dupes <- check_duplicate_files_in_session( | ||
| vol_id = vol_id, | ||
| session_id = session_id, | ||
| filenames = filenames, | ||
| vb = vb, | ||
| rq = rq | ||
| ) | ||
| if (is.null(dupes)) { | ||
| return(state) | ||
| } | ||
|
|
||
| exists_lookup <- stats::setNames(dupes$exists, dupes$filename) | ||
| is_dupe <- !is.na(exists_lookup[filenames]) & | ||
| unname(exists_lookup[filenames]) | ||
| is_dupe[is.na(is_dupe)] <- FALSE | ||
|
|
||
| state$status[is_dupe] <- "skipped" | ||
| state$reason[is_dupe] <- "duplicate" | ||
| state | ||
| } | ||
|
|
||
| # Internal: duplicate filenames in folder preflight (mirrors session). | ||
| #' @noRd | ||
| preflight_folder_duplicates <- function(state, vol_id, folder_id, vb, rq) { | ||
| filenames <- basename(state$input) | ||
| dupes <- check_duplicate_files_in_folder( | ||
| vol_id = vol_id, | ||
| folder_id = folder_id, | ||
| filenames = filenames, | ||
| vb = vb, | ||
| rq = rq | ||
| ) | ||
| if (is.null(dupes)) { | ||
| return(state) | ||
| } | ||
|
|
||
| exists_lookup <- stats::setNames(dupes$exists, dupes$filename) | ||
| is_dupe <- !is.na(exists_lookup[filenames]) & | ||
| unname(exists_lookup[filenames]) | ||
| is_dupe[is.na(is_dupe)] <- FALSE | ||
|
|
||
| state$status[is_dupe] <- "skipped" | ||
| state$reason[is_dupe] <- "duplicate" | ||
| state | ||
| } |
There was a problem hiding this comment.
Aside from the call to check_duplicate_files_in_session vs check_duplicate_files_in_folder and the id arg name, these two are line-for-line identical. If we tweak the "duplicate detection" semantics later, say to also stash the existing asset id in reason, we'd have to remember to change both. Could we collapse them into a single preflight_duplicates(state, checker, ...) that takes the check function as an argument and have bulk_files.R pick the right checker, what do you think?
There was a problem hiding this comment.
Collapsed the two helpers into preflight_duplicates(state, checker, ...) and wired bulk_upload_files to pass the appropriate checker
…rations; refactor duplicate checking functions
|
LGMT |
merge after 1725 and 1728,
needed decision on scaffold_session()