Skip to content

extradoc: simple-table cell edits cause whole row to be deleted+recreated, inheriting bold from header row #64

@sripathikrishnan

Description

@sripathikrishnan

Repro

Test doc: https://docs.google.com/document/d/1YKyqqH8wZa3kSnoBEdlwAumI94gRivSsZB1qvc9y4CA (title: sample3)

The doc contains a simple 6-row markdown table with a bold header row:

| **Screen Reader** | **Responses** | **Share** |

Data rows (JAWS / NVDA / etc.) are plain text.

Edit applied to the pulled markdown (JAWS row only, no bold added):

  • Before: | JAWS | 853 | 49% |
  • After: | JAWS | 900 | 52% |

Push, then re-pull.

Expected

Only the two edited cells (853 -> 900, 49% -> 52%) update. JAWS and the formatting of the whole row stay plain text.

Actual

After re-pull, the entire JAWS row is bold:

| **JAWS** | **900** | **52%**  |

The unchanged JAWS cell has become bold too. This breaks the table's visual layout and silently mutates content that wasn't edited.

Investigation / root cause mechanism

The reconciler does have an in-place cell-content diff path — _diff_table in extradoc/src/extradoc/diffmerge/diff.py:1131 calls _diff_tables_structural (row/col inserts/deletes) and then iterates get_matched_rows to emit per-cell UpdateBodyContentOps for matched rows. So in principle, edits inside a cell should become deleteContentRange + insertText against the existing cell, not a row replacement.

The bug is in the fuzzy row matcher. _fuzzy_lcs_indices in extradoc/src/extradoc/diffmerge/table_diff.py:188 matches rows by Recall similarity on the set of cell-text hashes with match_threshold=0.5:

def _recall(b_set, d_set):
    if not b_set:
        return 1.0
    return len(b_set & d_set) / len(b_set)

For the JAWS row, base cell texts = {\"JAWS\", \"853\", \"49%\"}, desired cell texts = {\"JAWS\", \"900\", \"52%\"}. Intersection = {\"JAWS\"}, so recall = 1/3 ≈ 0.333, which is below the 0.5 threshold. The row fails to match.

Consequence:

  1. diff_tables treats the base JAWS row as deleted and the desired JAWS row as a new insert (table_diff.py:325-331).
  2. DeleteTableRowOp is emitted, followed by InsertTableRowOp with new_cell_texts=[\"JAWS\", \"900\", \"52%\"] (table_diff.py:454-526).
  3. At lower time (extradoc/src/extradoc/reconcile_v3/lower.py:626, 3552), this becomes an insertTableRow request, which creates an empty row. Per Google Docs API behavior, cells in a newly-inserted row inherit text style from the adjacent row — which in this doc is the bold header row. Then insertText populates the empty cells, and the inserted text picks up the inherited bold styling.
  4. Because there is no UpdateBodyContentOp for the deleted row (it is no longer a matched row), nothing clears the inherited bold.

In summary: cell edits that touch 2 of 3 cells in a 3-column row (or any edit where < 50% of the base cell-text hashes survive) silently fall off the fuzzy-LCS path and get rewritten as a delete-row + insert-row pair. The inserted row then inherits styling from whatever row happens to be adjacent in the base table.

Notes for a fix (not in scope for this issue)

Options worth considering:

  • Lower / remove the fuzzy LCS recall threshold for simple tables, or fall back to positional row matching when row counts are unchanged on both sides.
  • When emitting InsertTableRowOp, explicitly clear text style on the newly-created cells before running insertText, so they don't inherit from the adjacent row.
  • Prefer emitting per-cell UpdateBodyContentOps over row delete+insert whenever base and desired row counts are equal — the structural change is not needed.

Environment

  • main @ f252136
  • File: extradoc/src/extradoc/diffmerge/table_diff.py (_fuzzy_lcs_indices, threshold 0.5)
  • File: extradoc/src/extradoc/diffmerge/diff.py:1131 (_diff_table)
  • File: extradoc/src/extradoc/reconcile_v3/lower.py:3552 (insertTableRow lowering)

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions