Skip to content

Add lance_dataset_update() for predicate + expression-based row updates #32

@LuciferYang

Description

@LuciferYang

Summary

Expose Lance's UpdateBuilder through the C ABI so C/C++ consumers can apply per-column SQL expression updates against rows matching a predicate, without round-tripping through write/overwrite.

Motivation

Following lance_dataset_delete (#30 / #31), update is the next predicate-driven mutation primitive. Typical use cases:

  • Backfill a derived column from a literal or expression (UPDATE ... SET status = 'archived' WHERE event_ts < ...).
  • Re-key rows after a model migration (SET tier = CASE ...).
  • Bulk normalize bad values without a full rewrite.

It shares the with_mut + block_on plumbing already hardened for delete; merge-insert (upsert) is the natural follow-on.

Proposed API

/**
 * Update rows matching `predicate` by applying per-column SQL expressions.
 * Mutates `dataset` in place; the same handle sees the new version.
 *
 * `predicate` may be NULL to update every row; `columns` and `values` are
 * parallel arrays of length `num_updates` (must be >= 1). Each `values[i]`
 * is an SQL scalar expression evaluated per row (literals, column refs,
 * arithmetic, CASE, etc.).
 */
int32_t lance_dataset_update(
    LanceDataset* dataset,
    const char* predicate,                 /* optional; NULL = all rows */
    const char* const* columns,            /* length = num_updates      */
    const char* const* values,             /* length = num_updates      */
    size_t num_updates,                    /* must be >= 1              */
    uint64_t* out_num_updated              /* optional, may be NULL     */
);

C++ wrapper as lance::Dataset::update(predicate, std::vector<std::pair<std::string, std::string>> updates) -> uint64_t; throws lance::Error on failure.

Error semantics

  • LANCE_ERR_INVALID_ARGUMENT for NULL dataset, NULL/empty predicate-when-non-NULL (i.e. empty string), num_updates == 0, NULL columns / values when num_updates > 0, NULL or empty entries inside the arrays. Also surfaced from the upstream parser for unknown columns and malformed SQL filters/expressions, since UpdateBuilder::update_where and set wrap parser errors as Error::InvalidInput.
  • LANCE_ERR_COMMIT_CONFLICT for a concurrent writer.

(Exact error-code mapping to be verified empirically, same way as the delete patch.)

Out of scope (future PRs)

  • lance_dataset_merge_insert (upsert)
  • Schema mutation (add/drop/alter columns)
  • Compaction

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions