Skip to content

feat: Action-based transactions#6924

Draft
wjones127 wants to merge 111 commits into
lance-format:mainfrom
wjones127:ralph/milestone-11
Draft

feat: Action-based transactions#6924
wjones127 wants to merge 111 commits into
lance-format:mainfrom
wjones127:ralph/milestone-11

Conversation

@wjones127
Copy link
Copy Markdown
Contributor

No description provided.

wjones127 and others added 30 commits May 18, 2026 16:48
…agments

Implement `Action::apply(&mut Manifest)` for the `AddFragments` variant,
mirroring `Transaction::fragments_with_ids` semantics (id 0 = unassigned,
allocated from `max_fragment_id + 1`). Add `Action::conflicts_with` —
two `AddFragments` actions never conflict.

Row-id assignment and version metadata still live in `build_manifest` and
will migrate as later actions in milestone 11 are ported.

Refs lance-format#6449
…apply

The Append branch in `Transaction::build_manifest` now drives fragment-id
assignment through `Action::apply` on a scratch clone of the current
manifest, then slices off the newly-appended tail for the existing row-id
and version-metadata handling. Behaviour is unchanged (cloning the
manifest is cheap because fragments sit behind an Arc).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Verifies fmt/clippy clean and ticks off the issue-6449 checklist.

The Action-based transaction tracer-bullet now covers AddFragments end-to-end
internally: Append <-> Vec<Action> translation, Action::apply for fragment-id
assignment, AddFragments conflict detection, and routing build_manifest's
Append branch through the apply loop while preserving existing row-id /
version-meta handling.

Closes lance-format#6449
Extends the action-based transaction scaffolding with two new variants:

- `RemoveFragments` — drop fragments by ID; max_fragment_id stays monotonic.
- `UpdateFragment` — replace a single fragment in-place by ID.

Implements the full pairwise conflict matrix and lands the
`Delete <-> [UpdateFragment*, RemoveFragments]` lossless round-trip. The
predicate is carried on `UserAction::description`, which required changing
`actions_from_operation` to return `UserAction` rather than `Vec<Action>`
(single internal call site in `build_manifest` updated).

Overwrite, Rewrite, Update, and DataReplacement translations remain
unimplemented — they need action types still to be introduced by later
issues in this milestone (schema/config/indices) or apply-time manifest
context (DataReplacement).

Refs lance-format#6450

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ypes

Remaining Operation translations (Overwrite/Rewrite/Update/DataReplacement)
require action vocabulary defined in lance-format#6451-lance-format#6453 plus two design decisions.
Documents concrete unblock criteria in the issue checklist.
Spike lance-format#6448 split the omnibus UpdateFragment action into three actions.
This lands the rename for the only one lance-format#6450 shipped: UpdateFragment
becomes UpdateDeletionVector, and its payload narrows from a full
Fragment to { fragment_id, new_deletion_file, affected_rows }.

- apply now only swaps fragment.deletion_file rather than replacing the
  whole fragment.
- conflicts_with treats two UpdateDeletionVector actions on the same
  fragment as compatible when both enumerate disjoint affected_rows, and
  conflicting otherwise (unknown or overlapping rows).
- The Delete translation emits UpdateDeletionVector per updated fragment;
  it falls back to the legacy path if an updated fragment carries no
  deletion file.
- operation_from_actions no longer reconstructs Operation::Delete: the
  narrowed payload cannot rebuild the full updated_fragments without the
  manifest. Delete translation is verified forward + apply instead.

Closes lance-format#6450

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add the four schema actions from issue lance-format#6451 to the action vocabulary:

- AddFields: append new fields to the schema and place their per-fragment
  data files into existing fragments.
- DropFields: drop fields by id (including nested children) and prune data
  files left covering no live field, mirroring the Project path.
- UpdateSchemaMetadata: apply an UpdateMap to schema-level metadata.
- ChangeSchema: replace the schema wholesale.

Implements Action::apply for all four and extends the conflict matrix:
ChangeSchema conflicts with every other schema action; concurrent field
changes do not commute; field changes and metadata changes are disjoint;
two metadata updates are ambiguous to merge; AddFields conflicts with a
RemoveFragments only when it writes into a removed fragment.

Translation (Merge/Project) and AddFields::validate land in follow-ups.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`AddFields` appends new data files to existing fragments rather than
replacing them, so fragment row counts are preserved structurally. The
legacy `Operation::Merge` carried whole replacement fragments and needed
`merge_fragments_valid` to catch row-count drift; `AddFields::validate`
instead enforces the metadata-level preconditions:

- every targeted fragment id exists in the manifest and is unique
- every new field id (including nested children) is fresh
- new data files reference only the fields being added

`Action::apply` runs `validate` first, so a malformed `AddFields` fails
without mutating the manifest.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add the two operation translations that the schema actions unlock:

- `Project -> [ChangeSchema]`: pure translation in `actions_from_operation`,
  round-trips both ways via `operation_from_actions`.
- `Merge -> [AddFields]`: needs the prior manifest to diff added fields and
  new per-fragment data files, so it goes through the new
  `actions_from_operation_with_manifest` entry point. Only a pure column add
  translates; a merge that removes/modifies a field, changes the fragment
  set, or rewrites a data file returns `None` and falls back to the legacy
  path.

`ChangeSchema::apply` now also prunes data files that reference no live
field, mirroring the legacy `Operation::Project` arm and `DropFields::apply`;
without it the translated `Project` would leave an inconsistent manifest.
The pruning is extracted into the shared `prune_dead_data_files` helper.

Closes lance-format#6451

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add the seven index actions from spike lance-format#6448 to the action vocabulary:
AddIndex, RemoveIndex, RewriteIndex, InvalidateIndexCoverage,
RebindIndexCoverage, RegisterFragReuse, and UpdateMergedGenerations.

`Action::apply` now takes the dataset's `Vec<IndexMetadata>` alongside the
manifest, since `Manifest` stores only a file offset for the index
section. Fragment- and schema-level actions ignore it.

AddIndex coverage is criterion-based: the payload carries a
`FragmentCoverage` (`All` or `Bitmap`) resolved against the manifest at
apply time, and conflict detection implements the criterion-vs-concrete
overlap rule (AddIndex over field X vs InvalidateIndexCoverage on X).

Operation translations land in a follow-up.
Add operation translations for the index actions:

- CreateIndex -> [RemoveIndex* (removed), AddIndex* (new)]. RemoveIndex
  precedes AddIndex so a UUID in both sets ends up added, matching
  build_manifest's retain-then-extend. AddIndex coverage is taken from
  the new index's fragment_bitmap; an index without one returns None.
- UpdateMemWalState -> [UpdateMergedGenerations], lossless both ways.
- Rewrite stays untranslated: its fragment half needs the RewriteFragments
  action from the lance-format#6450 UpdateFragment split, which does not exist yet, and
  translating only the index half would silently drop the fragment rewrite.

operation_from_actions reverses a removal-free CreateIndex (a pure AddIndex
list) and UpdateMemWalState. A CreateIndex with removals is forward-only,
since RemoveIndex keeps only a UUID.

Closes lance-format#6452

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds the config and metadata actions for action-based transactions:

- `UpdateConfig`: set/remove table config (`lance.*`) and table-metadata
  key-value pairs. Two concurrent `UpdateConfig`s conflict iff their key
  sets overlap or either is a `replace`, reusing `update_maps_conflict`.
- `AddBases`: append base paths, allocating unset ids from the high-water
  mark and rejecting duplicate id/name/path. Conflicts only on a shared
  id, name, or path.
- `ReserveFragmentIds`: advance `max_fragment_id` without adding fragments;
  commutes with every other action.

Translations: `UpdateConfig` -> `[UpdateConfig, UpdateSchemaMetadata?]`,
`UpdateBases` -> `[AddBases]`, `ReserveFragments` -> `[ReserveFragmentIds]`,
each round-tripping losslessly. An `UpdateConfig` carrying per-field
metadata updates has no action form and falls back to the legacy path.

Closes lance-format#6453

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add the issue-6454 checklist with a scope reality check: 4 operations
(Rewrite, DataReplacement, Update, Overwrite) have no action translation
yet, so lance-format#6454 does a partial rewire of the 9 translatable operations and
carves the rest out.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add `Transaction::build_manifest_via_actions`, an early-return router in
`build_manifest`. For CreateIndex, UpdateMemWalState, ReserveFragments,
UpdateBases, and UpdateConfig it clones the current manifest, applies the
operation's translated actions via `Action::apply`, and rebuilds the result
manifest from the mutated clone — so schema, fragments, config, schema
metadata, base paths, and the fragment-id high-water mark all flow from the
action pipeline rather than the legacy per-Operation arms.

The legacy arms still run for variants not yet ported and as a fallback when
translation declines (CreateIndex without a fragment bitmap, UpdateConfig with
per-field metadata updates); removing the now-dead arms is later lance-format#6454 work.

Part of lance-format#6454.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…d ops

ReserveFragments, UpdateBases, and UpdateMemWalState always translate to
actions, so `build_manifest_via_actions` handles them whenever a current
manifest exists. Their legacy `build_manifest` arms (and the post-match
base-path / max_fragment_id blocks) were therefore dead code.

`build_manifest_via_actions` now returns an internal error instead of
falling through to the legacy path when a routed operation has no current
manifest — these operations all mutate an existing dataset. The legacy
match collapses the three always-routed ops into a single arm that errors
if reached. CreateIndex and UpdateConfig keep their arms as genuine
fallbacks (index without a fragment bitmap; per-field metadata updates).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`build_manifest_via_actions` now handles `Delete` and `Project` in
addition to the metadata/index operations. It runs `retain_relevant_indices`
as a shared post-step, gated on the action list containing a
schema-narrowing or fragment-removing action (`ChangeSchema`, `DropFields`,
`RemoveFragments`), reproducing the index pruning the legacy Delete/Project
arms performed.

The legacy `Project` arm is deleted — `Project` always translates, so it
joins the always-routed `Error::internal` arm. The legacy `Delete` arm is
kept as a fallback: a `Delete` whose updated fragment carries no deletion
file has no action representation.

Refs lance-format#6454

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Route a pure column-add `Merge` through `build_manifest_via_actions` by
switching the helper to `actions_from_operation_with_manifest` (the
manifest-aware translation) and adding `Merge` to its routed-operation set.

`merge_to_add_fields` now declines translation on a stable-row-id dataset:
the legacy `Operation::Merge` arm refreshes per-row
`last_updated_at_version_meta` for every fragment that gains a data file,
and that refresh has no `Action` representation. Such merges fall back to
the legacy arm, which is kept (like `Delete`'s) as a documented fallback.

Refs lance-format#6454
…esolution audit

Audited conflict_resolver.rs against Action::conflicts_with. The legacy
per-Operation conflict methods cannot be removed within lance-format#6454: conflicts_with
returns a bool and so cannot carry the retryable/incompatible classification
the resolver depends on, there is no Action::rebase to replace the resolver's
fragment-rebasing, and the action vocabulary is incomplete for conflict
detection (UpdateConfig schema/field metadata; 4 untranslatable ops).

Phases 1-2 (build_manifest single-apply-path) stay landed and verified;
acceptance criterion #2 needs a human scoping decision (see ## Blocker).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds the spike lance-format#6448 design doc at
rust/lance-table/design/action_based_conflict_resolution.md. Folds in all
ten resolved open questions from the spike thread, freezes the §4 action
catalog at 19 actions (matching the in-tree `Action` enum), and leaves the
§5/§6 conflict matrix as explicit incremental TODOs for the per-operation
issues to fill.

First deliverable of lance-format#6892.
Phase 1 of the action-based conflict-resolution framework (lance-format#6892). New
module `transaction/conflict.rs` defines the §7 lifecycle support types:

- `ManifestMask` — an 8-region `u16` bitset (fragment list, fragment-id
  counter, schema, schema metadata, indices, config, table metadata,
  base paths). It is the cheap conflict pre-screen: a non-intersection of
  one action's write-set with another's read-or-write-set proves the two
  commute without inspecting payloads.
- `ActionOutput` / `TxnContext` — the late-bound-id accumulator the driver
  will thread through a transaction's action list.

`Action::reads()` / `Action::writes()` return the regions each of the 17
in-tree variants touches, taken from the §4 catalog in the design doc.

Additive only — the legacy `conflict_resolver.rs` is untouched.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Completes Phase 1 of the action-based conflict resolution framework
(lance-format#6892). Adds the two remaining spike §7 lifecycle methods to `Action`:

- `validate(&self, manifest)` — the pure structural pre-screen.
  Dispatches by variant; `AddFields` delegates to the existing
  `AddFields::validate`, every other variant is intrinsically
  well-formed and returns `Ok(())`. Per-operation issues fill in the
  structural checks their actions need.
- `rebase(&mut self, current, concurrent)` — async; `unimplemented!()`
  stub for every variant. Phase 3 fills in the Append/Delete actions.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Implement `conflict::commit`, the transaction driver from the
action-based conflict-resolution design (§7). It replays an ordered
action list onto a working clone of the latest manifest, rebasing each
action only when its read/write mask intersects the intervening
commits' write-set — a disjoint mask proves commutativity and skips the
rebase call. The TxnContext and index list are threaded through;
TxnContext stays a write-only accumulator until an action consumes
another's output.

Part of lance-format#6892.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds the action-based conflict resolution entry point (spike §7 Phase 2):

- `commit` now returns `CommitOutput { manifest, indices, actions }` so
  conflict resolution can read back the rebased action list — the rebased
  transaction — without duplicating the driver's mask-gated loop.
- `ActionRebase` mirrors the legacy `TransactionRebase` lifecycle
  (`try_new`/`check_txn`/`finish`) so the lance-format#6454 cutover is a near-one-line
  call-site swap. `check_txn` accumulates concurrent commits; per-action
  conflicts are deferred to the driver's `rebase` calls.
- `Restore`/`Clone` are handled as top-level non-action transaction kinds:
  a committing `Restore`/`Clone` resets the baseline and never conflicts; a
  concurrent `Restore` is an incompatible conflict.

Not wired into the production commit path; `conflict_resolver.rs` is
unmodified and remains the production path and differential oracle.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 3 of the action-based conflict resolver (lance-format#6892): fill in the §7
`rebase` lifecycle method for the three actions Append and Delete use.

- `AddFragments::rebase` is a no-op — Append commutes with every action;
  `apply` reallocates fragment ids from the rebased high-water mark.
- `RemoveFragments::rebase` is a no-op for the Append/Delete catalog — a
  whole-fragment removal subsumes a concurrent partial delete, is
  idempotent against a concurrent removal, and is disjoint from appends.
- `UpdateDeletionVector::rebase` reports a retryable conflict when a
  concurrent commit removed its target fragment or rewrote that
  fragment's deletion vector.

All other variants keep the `unimplemented!()` stub for their
per-operation issues. `conflict_resolver.rs` is unmodified.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`RemoveFragments::rebase` was an unconditional no-op, but the legacy
`check_delete_txn` oracle reports two `Delete`s touching the same
fragment as a retryable conflict. The shadow resolver must return
identical verdicts to the oracle until the cutover (lance-format#6454).

Add `rebase_remove_fragments`: a retryable conflict when a concurrent
`RemoveFragments` overlaps a target fragment or a concurrent
`UpdateDeletionVector` targets one; a concurrent `AddFragments` still
commutes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add `conflict::tests::differential`: a harness that resolves each input
pair through both the legacy `TransactionRebase` oracle and the new
`ActionRebase`, classifies each into a verdict (Compatible/Retryable/
Incompatible), and asserts the two agree. `m_by_m_append_delete` runs
the full M×M matrix over Append, partial Delete, and whole-fragment
Delete operations.

Also close the gap where `ActionRebase::finish` could not rebuild a
rebased `Delete`: when `commit` leaves every action byte-identical the
operation is unchanged, so `finish` returns the original transaction
instead of routing through `operation_from_actions` (which drops the
`Delete` fragment payload).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…tests

Phase 4 of the action-based conflict resolver. Adds property-based
transaction generators for Append/Delete and ports the legacy
`test_conflicts` Append/Delete fixtures onto the shadow resolver.

`generators::operation_strategy` produces Append (1-2 fresh fragments)
and Delete (disjoint partial/whole fragment sets over the `0..6`
universe) operations. `prop_append_delete_resolvers_agree` generates a
committing op plus 0-2 concurrent ops and asserts the shadow resolver
reaches the same verdict as the legacy oracle.

`ported_legacy_append_delete_conflicts` runs the Append/Delete rows of
the legacy `test_conflicts` matrix through the differential harness and
additionally pins each verdict to the class the legacy unit test
documented, catching a case where both resolvers drift together.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…amework done

Phase 5 verification: conflict_resolver.rs unmodified, workspace clippy
clean, transaction + conflict_resolver test suites green.

Closes lance-format#6892

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds the `ReplaceFragmentColumns` action (spike lance-format#6448 §4 A4) for the
DataReplacement vertical slice of milestone 11.

The payload carries `{ fragment_id, field_ids, new_data_files,
updated_row_offsets }`. `apply` swaps the matching data files on the
target fragment — appending instead when filling an all-NULL column —
and invalidates the coverage bitmap of every index over a replaced
field, preserving the legacy `DataReplacement` index-pruning behavior
under an RFC-only translation. `validate` enforces the same-layout rule
against `&Manifest`: each new file must align with the fragment's
current layout by field set + file version, or be a disjoint all-NULL
add.

`reads`/`writes`/`validate` dispatch and the dead `conflicts_with`
predicate get their arms; `rebase` is deferred. The index-invalidation
loop is extracted into a shared `invalidate_index_coverage` helper used
by both `ReplaceFragmentColumns` and `InvalidateIndexCoverage`.

Translation, `build_manifest` routing, and conflict resolution follow in
later commits.

Issue lance-format#6842

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
wjones127 and others added 24 commits May 22, 2026 13:57
…ctions

A stable-row-id compaction group can produce more fragments than it
consumes (splitting a large deletion-heavy fragment) or none at all
(a fully-deleted fragment). The `IndexCoverageRemap::Pairs` 1:1 swap
could not express that, so `rewrite_to_actions` declined such a
rewrite and the cutover would have regressed it onto the legacy arm.

Replace `Pairs` with `RewriteGroups`, a per-group remap that stores
each group's old and new fragment ids in full. `apply` performs the
all-or-nothing recalculation `recalculate_fragment_bitmap` does:
an index covering all of a group's old ids has them swapped for all
the new ids; partial coverage is rejected as a split of indexed and
non-indexed data. This expresses arbitrary group cardinality, so the
translation no longer declines (closes audit gap D-R3).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… Rewrite

Add `m_by_m_stable_rewrite`, the rewrite differential matrix replayed on
a stable-row-id base and including a group that splits one fragment into
two. A stable-row-id rewrite translates to `RewriteFragments` plus a
`RebindIndexCoverage`; the matrix confirms the per-group remap routes a
`|new| > |old|` group through the action resolver in agreement with the
legacy oracle. Before D-R3 this rewrite hit `NotSupported` in
`ActionRebase::try_new`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…Merge

Add `recast_merge_op` and include it in `m_by_m_schema` and
`m_by_m_stable_merge`. An `alter_columns` data-type cast emits an
`Operation::Merge` that recasts a column under a fresh field id and
drops its prior data file — not a pure column add, so it translates
through `merge_recast_columns` to `[AddFields, ChangeSchema]` (plus a
`RefreshRowVersionMetadata` on a stable-row-id base). The matrices
confirm that translation resolves in agreement with the legacy oracle
(audit gap D-M1).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…atrix confirmed

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The `ColumnReplacement::Tombstone` action re-derived the post-rewrite file
layout in `apply`, dropping data files left fully tombstoned. But the two
`RewriteColumns` producers disagree: `FileFragment::update_columns` drops
emptied files, while `merge_insert::update_fragments` tombstones moved fields
in place and keeps the emptied file as a `[-2,-2]` entry. Re-derivation could
only match one, so a partial-schema `merge_insert` `Update` declined
translation and would hard-fail (`NotSupported`) once the commit path routes
through the action resolver.

`Tombstone` now carries the fragment's complete post-rewrite file list and
`apply` installs it verbatim — producer-agnostic, and exactly what the legacy
`build_manifest` arm does. `tombstone_column_diff` is removed; the two
shape-decline tests (dropped-file, relayout) become positive translate-verbatim
tests since those shapes now translate.

Prep for the lance-format#6454 action-resolver cutover.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…write group

A compaction Rewrite group whose old fragments are straddled by a
concurrently created index (some covered, some not) would, once the
group is spliced out, leave that index covering a mix of indexed and
non-indexed data — the broken state load_indices rejects. The legacy
check_rewrite_txn reports this as a retryable conflict; the action-level
resolver was missing the check.

Add FragmentCoverage::straddles_fragments and an AddIndex arm in
rebase_rewrite_fragments. Widen RewriteFragments.reads() to include
INDICES so the mask gate invokes rebase when the only intervening
writes are to indices (an AddIndex writes only INDICES). Full overlap
remains a commute, matching the legacy deferred-remap rule.

Closes Phase 2a item R2 on issue lance-format#6454.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…an for R3 logged

Re-ran the swap diagnostic this iteration. Under the swap, only R3 and R4
fail; R5 passes (likely closed by iter 5's R1 tombstone fix). R4 is
already-documented accepted divergence — just needs a test rewrite at
swap time. R3 is the only resolver-logic gap left: legacy smart-merges
frag-reuse index version lists on cleanup×compaction rebase, action
resolver rejects the collision. Logged the implementation plan
(eager-embed FRI version list, replace reject arms with merge,
post-driver hook for the rewritten details file). No production code
changed.
…ayload field

Adds Option<Vec<FragReuseVersion>> field to AddIndex and RegisterFragReuse
payloads, defaulted None at every construction site. Mechanical setup for
R3 — embedding the FRI version list eagerly so the rebase rule can
smart-merge against a concurrent registration without async object-store
I/O. Population logic (async enrichment step), the smart-merge rebase
arms, and the post-driver materialisation hook follow in the next
iteration.

345 dataset::transaction + 60 io::commit tests green; clippy clean on
-p lance.
…lpers + tests

Adds the two pure sync helpers that mirror the legacy
finish_create_index/finish_rewrite frag-reuse-index smart merges:

- merge_cleanup_with_concurrent_compactions: cleanup absorbs each
  concurrent compaction's max FragReuseVersion.
- merge_compaction_with_concurrent_cleanups: compaction filters its
  versions by the max cleanup cutoff (empty cleanup = keep only own max).

Both operate on Vec<FragReuseVersion> only, so they can be invoked from
sync Action::rebase. 8 new unit tests cover identity, max-of-cutoffs,
empty-cleanup, and the invalid-input internal-error cases. Marked
#[allow(dead_code)] with a pointer at the next iteration's wiring into
rebase_add_index/rebase_register_frag_reuse + async enrichment +
post-driver materialisation hook.

19/19 action.rs tests green; clippy clean on -p lance --tests --benches.
…ge into rebase

Replace the two reject-on-collision arms in `rebase_add_index` and
`rebase_register_frag_reuse` (frag-reuse cleanup vs concurrent compaction,
and vice versa) with the smart-merge helpers landed in step 2. Both
rebase fns now take `&mut` payloads so the merged version list can be
written back onto the action. When either side lacks an embedded
`frag_reuse_versions` payload the rebase falls back to the legacy
retryable-conflict reject. `RegisterFragReuse` vs `RegisterFragReuse`
still rejects — two parallel compactions are out of scope for these
helpers. Adds five unit tests for the wiring.

End-to-end tests still fail until the eager translation step populates
the payload and a post-driver hook materialises the merged details file.

Refs lance-format#6454.
…enrichment

Adds ActionRebase::enrich_frag_reuse_payload(&Dataset) plus the standalone
helper enrich_action_frag_reuse_payload. Walks both the committing
transaction's UserOperation actions (via new UserOperation::actions_mut)
and the intervening_actions; for each AddIndex of the frag-reuse system
index and each RegisterFragReuse with no payload, loads the
FragReuseIndexDetails via load_frag_reuse_index_details and stores the
version list on the action. Already-Some payloads and non-frag-reuse
AddIndexes are skipped — re-enrichment on retry is a no-op.

This is the eager translation step the iter-10 smart-merge wiring needs:
rebase_add_index and rebase_register_frag_reuse smart-merge only when
both sides carry payloads; without this step they fall back to the
legacy reject-as-retryable arm. Next iteration wires this call into
commit_transaction and adds the post-driver materialisation hook.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…terialisation hook

Adds `materialise_frag_reuse_details_files` and `materialise_action_frag_reuse_details`
helpers in `dataset::transaction::conflict`. Walks an action list and, for
each `AddIndex`/`RegisterFragReuse` of the frag-reuse system index whose
`frag_reuse_versions` payload was populated by the smart-merge rebase rule,
rebuilds a `FragReuseIndexDetails` from the list, writes a fresh details
file via `build_frag_reuse_index_metadata`, and replaces the action's
`IndexMetadata`. Pure helper — not yet wired into `commit_transaction`;
that wiring lands with the Phase 2 cutover.

Two integration tests cover the rewrite path (mutated version list →
fresh uuid + index_details + on-disk details file matches mutated list)
and the no-op paths (None payload, non-frag-reuse `AddIndex`).

Refs lance-format#6454.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… ActionRebase::finish

`ActionRebase::finish` now takes `dataset: Option<&Dataset>` and calls
`materialise_frag_reuse_details_files` on the rebased action list before
the reduction to `Operation`. The smart-merge rebase rules added in step
3 mutate the embedded `frag_reuse_versions` list on `AddIndex` /
`RegisterFragReuse`; the materialisation hook rebuilds the
`FragReuseIndexDetails` from the merged list and writes a fresh details
file via `build_frag_reuse_index_metadata` so the resulting
`IndexMetadata` points at it. Tests pass `None` when no frag-reuse
action is touched; production (next iter) passes `Some(&dataset)`.

`#[allow(dead_code)]` removed from the three materialisation helpers
(they are load-bearing now).
…egressions logged

Attempted the cutover swap in `commit_transaction` (TransactionRebase →
ActionRebase + enrich + finish(Some(&ds))). Compiles clean, but 14 lib
tests regress. Categorised the regressions into 5 clusters (R3
reconstruction, cleanup, mem-wal, UpdateBases, index_section) and one
already-accepted divergence (R4). Each cluster gets its own follow-up
fix commit before the swap re-lands.

Swap reverted; tree green. Diagnostic-only.
`ActionRebase::finish` previously round-tripped the rebased actions through
`operation_from_actions`. That fails for `Operation::Rewrite` (whose
`RewriteFragments` action drops the full `Fragment` payloads) and for
`CreateIndex` with `removed_indices` (whose `RemoveIndex` keeps only a uuid)
— so the frag-reuse smart-merge rebase regresses the corresponding
`test_concurrent_cleanup_and_compaction_rebase_*` tests under the call-site
swap.

Add `splice_rebased_into_operation` that, for those two shapes, carries the
original operation's structure and only threads in the rebase-mutated
`IndexMetadata` — the trailing `RegisterFragReuse.frag_reuse_index` for a
`Rewrite`, the per-uuid `AddIndex.index` for a `CreateIndex`. The `Rewrite`
arm bails to the existing `operation_from_actions` path if any non-
`RegisterFragReuse` action changed, so a rebase that also splices fragments
out of a `RewriteFragments` does not silently misreconstruct. Five unit
tests cover the happy paths, the bail, and the unhandled-op pass-through.

Refs lance-format#6454.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The CreateIndex arm of `splice_rebased_into_operation` paired rebased
`AddIndex` actions to the original `new_indices` entries by uuid. That
broke under the frag-reuse smart-merge path: `materialise_frag_reuse_details_files`
writes a fresh details file under a brand-new uuid, so the rebased
`AddIndex.index.uuid` no longer matches the pre-rebase entry — splice
returned `None` and the cleanup-vs-compaction rebase test failed with
"rebased CreateIndex transaction cannot be translated back to an operation".

Pair positionally instead: translation emits `[RemoveIndex×N, AddIndex×M]`
in `new_indices` order, and rebase preserves that order. New unit test
`create_index_pairs_rebased_add_by_position_not_uuid` covers the case.

Refs lance-format#6454.
…or stale FRI singleton

In the cleanup→rebased-against-concurrent-compactions path, `RemoveIndex`
matches only by UUID. When concurrent compactions rewrote the frag-reuse
index to a fresh UUID, the cleanup's `RemoveIndex(original_FRI_uuid)`
became a no-op and the concurrent FRI survived alongside the merged one
— tripping `detect_overlapping_fragments` plus a `migrate_indices`
debug assert that expects single-field indices.

`splice_rebased_into_operation` now takes `latest_indices` and, when a
spliced `new_indices` entry is a system index (FRI / MEM_WAL), extends
`removed_indices` with same-name entries from `latest_indices` whose
UUID isn't already covered. Mirrors legacy `finish_create_index`
(`io/commit/conflict_resolver.rs:1568-1585`). Two new unit tests cover
the extend behaviour and the no-self-cancel invariant.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…cted (5 of 14 pre-existing)

Iter 17: re-applied the call-site swap to verify iter-16's splice fix,
ran the iter-13 regression list against both swapped HEAD and clean
baseline as control. R3 (cleanup + compaction) confirmed closed. Five
of the iter-13 "regressions" actually fail on baseline without the
swap (pre-existing, unrelated). Net swap regressions reduce to 5 tests
in 3 clusters (B1 mem_wal CreateIndex, B2 add_bases wording,
B3 R4 accepted divergence). Logged in Phase 2b/2c; swap reverted.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…Unbound

The decline audit said `CreateIndex` without a `fragment_bitmap` only
occurs for legacy on-disk indices. That was wrong: the MemWAL system
index sets `fragment_bitmap: None` by design (it's inline and covers
no data fragments), so `create_index_to_actions` declined every MemWAL
init and the action resolver would hard-fail with NotSupported under
the planned cutover.

Add a third `FragmentCoverage` variant — `Unbound` — for indices not
bound to any fragments. `resolve` returns `None` for it (apply keeps
`IndexMetadata.fragment_bitmap = None`); `intersects_fragments` and
`straddles_fragments` return false. `create_index_to_actions` emits
`Unbound` when the index has no bitmap and is a system index per
`lance_index::is_system_index`; regular column indices without a
bitmap still decline.

Issue lance-format#6454, Phase 2b regression B1.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Production commit path now goes through ActionRebase (action-level
resolver) instead of TransactionRebase. Closes the two test-only
regressions surfaced by the iter-17 swap survey:

* B2 — broaden `test_concurrent_add_bases_{name,path}_conflict` to
  accept the action resolver's wording ("collide on id, name, or
  path") alongside the legacy "incompatible with concurrent
  transaction". Same outcome (commit fails), different message.
* B3/R4 — rewrite `test_delete_execute_uncommitted_preserves_affected_rows_for_rebase`
  to assert `Error::RetryableCommitConflict`, reflecting the accepted
  divergence in design doc §5: file-level deletion-vector auto-merge
  is out of scope for the manifest-only `Action::rebase`.

The `affected_rows` parameter on `commit_transaction` becomes
`_affected_rows`; the deletion-vector auto-merge it fed is gone but
the parameter stays for ABI stability until the legacy resolver
plumbing comes out.

The remaining checklist items (delete TransactionRebase + check_*_txn,
delete the differential test harness, delete legacy build_manifest
arms and the vestigial Action::conflicts_with) follow in later
commits.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The differential matrix in dataset/transaction/conflict.rs ran every input
pair through both the legacy TransactionRebase oracle and the new
ActionRebase and asserted agreement. With the cutover landed in iter 7
the legacy oracle is no longer the production path, so the harness's
purpose is satisfied.

Delete `mod differential` (2428 lines) and promote its inner `mod splice`
to a sibling of `mod resolver` with inlined `index_meta` / `rewrite_op`
helpers. The 8 splice tests are permanent coverage of
`splice_rebased_into_operation` and stay green.

Also fixes 5 pre-existing clippy redundant-clone / `&[x.clone()]`-to-
`from_ref` errors the splice tests inherited from their old home.

Refs lance-format#6454.
The action-based resolver (ActionRebase) has been the production commit
path since iter 7 of lance-format#6454; remove the now-dead legacy resolver.

- Delete rust/lance/src/io/commit/conflict_resolver.rs (3655 lines:
  TransactionRebase + check_*_txn + finish_* + internal unit tests).
- Drop the module from rust/lance/src/io/commit.rs.
- Remove four Operation helpers that only served check_*_txn:
  get_upsert_config_keys, get_delete_config_keys,
  modifies_same_metadata, upsert_key_conflict, plus the test that
  exercised modifies_same_metadata. Operation::update_maps_conflict
  stays — config_actions_conflict in action.rs still uses it.
- Fix doc-comment intra-doc links in conflict.rs that pointed at the
  deleted TransactionRebase type.

Progress on lance-format#6454.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The conflict resolver swap (iter 7) made `Action::conflicts_with` dead
production code — the live conflict rules now sit in `reads`/`writes` +
`rebase`, evaluated by `ActionRebase` against the manifest mask. The
predicate stayed only because of its 14 unit tests.

Delete:
- `Action::conflicts_with` and its 220-line doc-commented match,
- `index_actions_conflict` and `config_actions_conflict` (only called
  by `conflicts_with`),
- `UpdateDeletionVector::rows_disjoint_from` (only called by the
  index helper),
- the 14 unit tests that exercise it directly,
- three stale doc-comment links.

`fields_overlap` and `bases_conflict` remain — both still feed live
`rebase_*` paths.

Refs lance-format#6454
The legacy `Operation::Append` arm in `build_manifest` was a redundant
duplicate of `build_manifest_via_actions`: `actions_from_operation`
already translates `Append` to `[AddFragments]`, and `build_manifest_via_actions`
already stamps row ids and version metadata on every new fragment whose
`row_id_meta` is `None`. Add `Append` to the routed-set match and make
the legacy arm an `internal` "router was bypassed" error, matching the
treatment of the other always-routed operations
(`ReserveFragments`/`UpdateBases`/`UpdateMemWalState`/`Project`/`DataReplacement`).

Concludes the lance-format#6454 work: the legacy `TransactionRebase` resolver and
the vestigial `Action::conflicts_with` were deleted in earlier
iterations (iters 9, 10); this commit removes the last always-hit
legacy `build_manifest` arm. The remaining per-operation arms
(`Delete`, `Update`, CREATE-mode `Overwrite`, `Rewrite`, `CreateIndex`,
`UpdateConfig`, `Merge`) stay as documented fallbacks for declined
translations and the no-prior-manifest CREATE path — per the iteration-2
exhaustive decline audit they are unreachable or error→error for valid
production transactions.

Closes lance-format#6454
@github-actions github-actions Bot added the enhancement New feature or request label May 23, 2026
@github-actions
Copy link
Copy Markdown
Contributor

ACTION NEEDED
Lance follows the Conventional Commits specification for release automation.

The PR title and description are used as the merge commit message. Please update your PR title and description to match the specification.

For details on the error please inspect the "PR Title Check" action.

wjones127 and others added 4 commits May 22, 2026 21:58
…ressions

Iter 1 of the QA loop:

- `cargo clippy --all --tests --benches -- -D warnings`: clean.
- `cargo fmt --all`: clean.
- `cargo test -p lance --lib dataset::transaction`: 324/324 green.
- `cargo test -p lance --lib`: 1858 passed, 9 failed. 7 match the
  `issue-6454.md` Phase 2c pre-existing baseline list. The remaining 2 are
  newly-identified swap regressions logged as F1 in `.ralph/checklists/qa.md`
  for iter 2:
    * `test_create_index_rebase_against_update_mem_wal_state` — the action
      resolver rejects a MemWAL `AddIndex` rebased against a concurrent
      `UpdateMergedGenerations` instead of smart-merging the generations.
    * `test_update_mem_wal_state_against_create_index_lower_generation` —
      the action resolver silently commutes an `UpdateMemWalState` with a
      lower generation past a concurrent `CreateIndex` of MemWalIndex instead
      of rejecting with `IncompatibleTransaction`.

Drops `proptest` from `rust/lance` dev-deps — added in b30adc4 for the
differential property tests, which were removed in 2dd0d8d. `grep -rn
proptest rust/lance/` confirms no remaining references.
…teMemWalState

The action-resolver swap left two MemWAL bidirectional smart-merges as
hard rejects, regressing against the legacy `check_create_index_txn`
MemWAL branch:

- `rebase_add_index` now folds each concurrent `UpdateMergedGenerations`'s
  generations into the new MemWAL index's inline `MemWalIndexDetails`
  (higher generation per shard), instead of rejecting.
- `rebase_update_merged_generations` now also consults concurrent
  `AddIndex` actions for the MemWAL system index, applying the same
  `committed >= ours` => `IncompatibleTransaction`, `<` => retryable rule
  it already applied against concurrent `UpdateMergedGenerations`.
- `splice_rebased_into_operation` now handles `CreateIndex` with empty
  `removed_indices`, so the smart-merged MemWAL `CreateIndex` picks up
  the stale concurrent MemWAL entry by name dedup and the manifest
  ends with exactly one MemWAL entry.

Unlike the frag-reuse smart-merge, MemWAL details live inline in the
protobuf Any so no async post-driver materialisation hook is required;
the merge is a pure in-memory rewrite of `index_details`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
clean-diff sub-agent: empty. understandability: 8 low/medium findings;
only F7 acted on (added justification comment on `.unwrap()` in
`merge_compaction_with_concurrent_cleanups` — the preceding `.min()?` on
the same iterator already guarantees non-empty). The other 7 are
doc-polish recommendations, recorded with triage. test-coverage:
16 gaps, deferred to iter 4 for verification before adding tests.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…rebase

rebase_rewrite_fragments already rejected when a concurrent
UpdateDeletionVector or ReplaceFragmentColumns touched a fragment in the
rewrite group, but the reverse rebase was not symmetric: when the rewrite
committed first, the UDV / column replacement passed rebase and then failed
at apply with a non-retryable "fragment id not present" internal error.
Add the missing Action::RewriteFragments match arms to both helpers so the
losing side gets a principled RetryableCommitConflict and retries. The bug
applied to both preserves_row_ids = false and = true rewrites.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant