Summary
Expose Lance's UpdateBuilder through the C ABI so C/C++ consumers can apply per-column SQL expression updates against rows matching a predicate, without round-tripping through write/overwrite.
Motivation
Following lance_dataset_delete (#30 / #31), update is the next predicate-driven mutation primitive. Typical use cases:
- Backfill a derived column from a literal or expression (
UPDATE ... SET status = 'archived' WHERE event_ts < ...).
- Re-key rows after a model migration (
SET tier = CASE ...).
- Bulk normalize bad values without a full rewrite.
It shares the with_mut + block_on plumbing already hardened for delete; merge-insert (upsert) is the natural follow-on.
Proposed API
/**
* Update rows matching `predicate` by applying per-column SQL expressions.
* Mutates `dataset` in place; the same handle sees the new version.
*
* `predicate` may be NULL to update every row; `columns` and `values` are
* parallel arrays of length `num_updates` (must be >= 1). Each `values[i]`
* is an SQL scalar expression evaluated per row (literals, column refs,
* arithmetic, CASE, etc.).
*/
int32_t lance_dataset_update(
LanceDataset* dataset,
const char* predicate, /* optional; NULL = all rows */
const char* const* columns, /* length = num_updates */
const char* const* values, /* length = num_updates */
size_t num_updates, /* must be >= 1 */
uint64_t* out_num_updated /* optional, may be NULL */
);
C++ wrapper as lance::Dataset::update(predicate, std::vector<std::pair<std::string, std::string>> updates) -> uint64_t; throws lance::Error on failure.
Error semantics
LANCE_ERR_INVALID_ARGUMENT for NULL dataset, NULL/empty predicate-when-non-NULL (i.e. empty string), num_updates == 0, NULL columns / values when num_updates > 0, NULL or empty entries inside the arrays. Also surfaced from the upstream parser for unknown columns and malformed SQL filters/expressions, since UpdateBuilder::update_where and set wrap parser errors as Error::InvalidInput.
LANCE_ERR_COMMIT_CONFLICT for a concurrent writer.
(Exact error-code mapping to be verified empirically, same way as the delete patch.)
Out of scope (future PRs)
lance_dataset_merge_insert (upsert)
- Schema mutation (add/drop/alter columns)
- Compaction
Summary
Expose Lance's
UpdateBuilderthrough the C ABI so C/C++ consumers can apply per-column SQL expression updates against rows matching a predicate, without round-tripping through write/overwrite.Motivation
Following
lance_dataset_delete(#30 / #31),updateis the next predicate-driven mutation primitive. Typical use cases:UPDATE ... SET status = 'archived' WHERE event_ts < ...).SET tier = CASE ...).It shares the
with_mut+block_onplumbing already hardened for delete; merge-insert (upsert) is the natural follow-on.Proposed API
C++ wrapper as
lance::Dataset::update(predicate, std::vector<std::pair<std::string, std::string>> updates) -> uint64_t; throwslance::Erroron failure.Error semantics
LANCE_ERR_INVALID_ARGUMENTfor NULL dataset, NULL/empty predicate-when-non-NULL (i.e. empty string),num_updates == 0, NULLcolumns/valueswhennum_updates > 0, NULL or empty entries inside the arrays. Also surfaced from the upstream parser for unknown columns and malformed SQL filters/expressions, sinceUpdateBuilder::update_whereandsetwrap parser errors asError::InvalidInput.LANCE_ERR_COMMIT_CONFLICTfor a concurrent writer.(Exact error-code mapping to be verified empirically, same way as the delete patch.)
Out of scope (future PRs)
lance_dataset_merge_insert(upsert)