Skip to content

fix(paste): detect heading levels from Google Docs styled paragraphs#2178

Open
ErickPetru wants to merge 2 commits intosuperdoc-dev:mainfrom
ErickPetru:fix/google-docs-heading-detection
Open

fix(paste): detect heading levels from Google Docs styled paragraphs#2178
ErickPetru wants to merge 2 commits intosuperdoc-dev:mainfrom
ErickPetru:fix/google-docs-heading-detection

Conversation

@ErickPetru
Copy link

@ErickPetru ErickPetru commented Feb 25, 2026

Problem

When pasting content with H1, H3-H6 headings from Google Docs, they lose their heading level and paste as plain paragraphs. Only H2 is reliably preserved as a heading tag.

Closes #2152

SuperDoc with pasted Google Docs headings after the fix is applied

Cause

Google Docs converts most heading levels to <p> tags with inline font-size/font-weight styling instead of semantic <h1><h6> tags. ProseMirror's paragraph.parseDOM has rules for h1h6 but they never fire for these styled <p> elements, so the heading level is silently dropped.

Fix

  • Added convertStyledHeadings to the Google Docs paste pipeline, running before ProseMirror parses the pasted DOM
  • Detects <p> elements carrying bold + a recognised font-size (checked on both the <p> itself and its first child <span>, covering both Google Docs style placements) and replaces them with the corresponding <h1><h5> element
  • Font-size thresholds accommodate the default Google Docs theme and alternate themes (e.g. 20 pt and 24 pt both resolve to h1)

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 167f5aa820

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Copy link
Contributor

@caio-pizzol caio-pizzol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice work on this, thanks for picking it up! the code is clean, tests are thorough, and the prior bot feedback (bold false positives + list-item corruption) is both addressed in the current version.

two things worth discussing:

multi-span headings won't be detected

getHeadingStyleProps only reads styles from a single child <span> (children.length === 1). a heading that contains a link, bookmark, or line break will produce multiple child elements — and since Google Docs usually puts the styles on the spans (not the <p>), those headings will be silently skipped.

e.g. this wouldn't be detected:

<p>
  <span style="font-size:20pt;font-weight:700">Heading with </span>
  <a href="..."><span style="font-size:20pt;font-weight:700">a link</span></a>
</p>

one option: check that all child elements share the same font-size and bold status instead of requiring exactly one span. up to you whether that's worth the added complexity for this PR or a follow-up.

minor: closest('li') instead of parentElement

the list-item filter only checks the immediate parent. the rest of the pipeline makes the same assumption (<li> > <p> direct child), so it's consistent. but !p.closest('li') would be a zero-cost safety net in case Google Docs ever wraps list content differently.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Detect heading levels from Google Docs styled paragraphs

2 participants