Skip to content

feature/DAT-1729#17

Merged
michalhuryn-montrose merged 3 commits into
developmentfrom
feature/DAT-1729
May 19, 2026
Merged

feature/DAT-1729#17
michalhuryn-montrose merged 3 commits into
developmentfrom
feature/DAT-1729

Conversation

@michalhuryn-montrose

Copy link
Copy Markdown
Collaborator

merge after 1725 and 1728,
needed decision on scaffold_session()

@parmatys parmatys force-pushed the feature/DAT-1729 branch from 2f9ed25 to 31bbc5f Compare May 18, 2026 10:33
@parmatys parmatys marked this pull request as ready for review May 18, 2026 10:42

@michalhuryn-montrose michalhuryn-montrose left a comment

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@parmatys please respond to following findings 😄

Comment thread R/bulk_resume.R Outdated
Comment on lines +67 to +71
remaining <- partial$input[to_redo]
args <- list(...)
args[[input_arg]] <- remaining

new_rows <- do.call(fn, args)

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When a resumed run fast-fails, $partial only has the rows we just retried, not the originals. So if someone wants to fix-and-resume in a loop, they can't just keep passing $partial back in, they'd quietly lose the rows that already succeeded the first time. Could we mention that in the README resume section (and ?resume_bulk) with the splice one-liner?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Behavior fix: when the nested bulk call fast-fails, we now merge that inner partial into the outer tibble and re-throw databraryr_bulk_error with partial set to that full tibble, so repeated e$partial in a loop no longer drops rows that already succeeded

Comment thread R/bulk_apply_internals.R Outdated
Comment on lines +114 to +160
preflight_session_duplicates <- function(state, vol_id, session_id, vb, rq) {
filenames <- basename(state$input)
dupes <- check_duplicate_files_in_session(
vol_id = vol_id,
session_id = session_id,
filenames = filenames,
vb = vb,
rq = rq
)
if (is.null(dupes)) {
return(state)
}

exists_lookup <- stats::setNames(dupes$exists, dupes$filename)
is_dupe <- !is.na(exists_lookup[filenames]) &
unname(exists_lookup[filenames])
is_dupe[is.na(is_dupe)] <- FALSE

state$status[is_dupe] <- "skipped"
state$reason[is_dupe] <- "duplicate"
state
}

# Internal: duplicate filenames in folder preflight (mirrors session).
#' @noRd
preflight_folder_duplicates <- function(state, vol_id, folder_id, vb, rq) {
filenames <- basename(state$input)
dupes <- check_duplicate_files_in_folder(
vol_id = vol_id,
folder_id = folder_id,
filenames = filenames,
vb = vb,
rq = rq
)
if (is.null(dupes)) {
return(state)
}

exists_lookup <- stats::setNames(dupes$exists, dupes$filename)
is_dupe <- !is.na(exists_lookup[filenames]) &
unname(exists_lookup[filenames])
is_dupe[is.na(is_dupe)] <- FALSE

state$status[is_dupe] <- "skipped"
state$reason[is_dupe] <- "duplicate"
state
}

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aside from the call to check_duplicate_files_in_session vs check_duplicate_files_in_folder and the id arg name, these two are line-for-line identical. If we tweak the "duplicate detection" semantics later, say to also stash the existing asset id in reason, we'd have to remember to change both. Could we collapse them into a single preflight_duplicates(state, checker, ...) that takes the check function as an argument and have bulk_files.R pick the right checker, what do you think?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Collapsed the two helpers into preflight_duplicates(state, checker, ...) and wired bulk_upload_files to pass the appropriate checker

…rations; refactor duplicate checking functions
@michalhuryn-montrose

Copy link
Copy Markdown
Collaborator Author

LGMT

@michalhuryn-montrose michalhuryn-montrose merged commit 316cd0e into development May 19, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants