feat(markdown): polish — sub/sup, HTML entities, blockquote, pre-code by rainxchzed · Pull Request #620 · OpenHub-Store/GitHub-Store

rainxchzed · 2026-05-16T15:33:04Z

Preprocessor bundle for the long tail of HTML-in-markdown:

`
```
…
```
` → ```XYZ … ``` fence (preserves lang for the syntax-highlighter PR's slot).
`
…
` → line-prefixed `> ` markdown blockquote.
`^X` / `_X` → Unicode superscript / subscript per-char (H₂O, x², ⁿ). Unmappable chars stay literal; markdown lib has no inline BaselineShift hook so this is the best we can do without a custom span renderer.
Expanded HTML entity decoder: ©, ®, ™, —, –, …, ‘curly’ + “quotes”, «guillemets», ×, ÷, ±, °, µ, €, £, ¥, ¢, ←→↑↓, ≤≥≠, etc. Plus hex `&#xHHHH;` entities.

Test plan

Compile clean
Device: open repo with footnoted `¹`, copyright `©`, em-dashes, embedded `
```
` — verify all render correctly.
```



Summary by CodeRabbit

Improved

Enhanced Markdown rendering: better HTML blockquote and code-block formatting with language hints
Added Unicode subscripts/superscripts (e.g., H₂O, x²)
Expanded HTML entity decoding for more symbols (©, ™, em/en dashes, curly quotes, etc.)

greptile-apps

rainxchzed has reached the 50-review limit for trial accounts. To continue receiving code reviews, upgrade your plan.

coderabbitai · 2026-05-16T15:33:16Z

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 03b9187c-028e-40cb-8fca-6d6a3253d390

📥 Commits

Reviewing files that changed from the base of the PR and between 3eb2ffb and 485b5d1.

📒 Files selected for processing (14)

core/presentation/src/commonMain/composeResources/files/whatsnew/18.json
core/presentation/src/commonMain/composeResources/files/whatsnew/ar/18.json
core/presentation/src/commonMain/composeResources/files/whatsnew/bn/18.json
core/presentation/src/commonMain/composeResources/files/whatsnew/es/18.json
core/presentation/src/commonMain/composeResources/files/whatsnew/fr/18.json
core/presentation/src/commonMain/composeResources/files/whatsnew/hi/18.json
core/presentation/src/commonMain/composeResources/files/whatsnew/it/18.json
core/presentation/src/commonMain/composeResources/files/whatsnew/ja/18.json
core/presentation/src/commonMain/composeResources/files/whatsnew/ko/18.json
core/presentation/src/commonMain/composeResources/files/whatsnew/pl/18.json
core/presentation/src/commonMain/composeResources/files/whatsnew/ru/18.json
core/presentation/src/commonMain/composeResources/files/whatsnew/tr/18.json
core/presentation/src/commonMain/composeResources/files/whatsnew/zh-CN/18.json
feature/details/data/src/commonMain/kotlin/zed/rainxch/details/data/utils/preprocessMarkdown.kt

Walkthrough

This PR extends Markdown rendering capabilities by implementing HTML-to-Markdown conversions in the preprocessor and expands HTML entity decoding to support a wider Unicode range and named entity references. The feature is documented across twelve localized release notes describing improved blockquote/code-fence handling, Unicode sub/superscript rendering, and HTML entity decoding.

Changes

Markdown Rendering Polish and Release Notes

Layer / File(s)	Summary
HTML-to-Markdown block conversions `feature/details/data/src/commonMain/kotlin/zed/rainxch/details/data/utils/preprocessMarkdown.kt`	`<pre><code>` blocks are converted to Markdown fences with optional language hints; `<blockquote>` elements become blockquote-prefixed lines.
Character mapping and expanded entity decoding `feature/details/data/src/commonMain/kotlin/zed/rainxch/details/data/utils/preprocessMarkdown.kt`	Superscript/subscript characters map to Unicode equivalents via lookup tables; HTML entity decoding uses a centralized `HTML_ENTITIES` map and handles decimal/hex entities across a broader Unicode range (32..0x10FFFF).
Localized release notes for v1.8.3 `core/presentation/src/commonMain/composeResources/files/whatsnew/*/18.json`	Thirteen localized files (EN, AR, BN, ES, FR, HI, IT, JA, KO, PL, RU, TR, ZH-CN) update the "IMPROVED" section to describe Markdown rendering fixes: blockquote/code-fence correctness, Unicode sub/superscript handling, and named HTML entity decoding.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Possibly related PRs

OpenHub-Store/GitHub-Store#618: Modifies the same preprocessMarkdown.kt pipeline (SVG skipping logic overlap).
OpenHub-Store/GitHub-Store#617: Also updates preprocessMarkdown.kt (adds emoji-shortcode rendering).
OpenHub-Store/GitHub-Store#619: Works with fenced code output that this PR now emits (syntax-highlighting integration).

Poem

A rabbit hops through Markdown's grace,
With blockquotes, quotes, and code's embrace.
H₂O now shines in Unicode bright,
And entities decode just right.
Thirteen tongues now tell the story true! 🐰

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately summarizes the main changes: adding Markdown rendering polish with support for sub/superscripts, HTML entities, blockquotes, and pre-code blocks.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/markdown-html-polish

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In
`@feature/details/data/src/commonMain/kotlin/zed/rainxch/details/data/utils/preprocessMarkdown.kt`:
- Line 341: The regex in preprocessMarkdown.kt currently uses language-(\w+)
which fails to capture common language IDs like c++, f#, objective-c, or
shell-session; update that capture group to accept plus signs, hashes and
hyphens (e.g. replace language-(\w+) with language-([A-Za-z0-9_+#-]+) or more
generally language-([^"'\s>]+)) so the regex on the line containing
"""<pre[^>]*>\s*<code(?:\s+[^>]*?class\s*=\s*["'][^"']*?language-(\w+)[^"']*?["'])?[^>]*>(.*?)</code>\s*</pre>"""
correctly preserves all common fenced-code language IDs.
- Line 347: The current Markdown code-fence generation always emits triple
backticks and can break if the variable code contains backticks; update the
block in preprocessMarkdown.kt where you build the fence for a code snippet (the
section referencing lang and code) to compute the longest run of backticks
inside code (use Regex like `\`{3,}\`` to find matches), default to 2 if none
found, then create a fence string of one more backtick (e.g., fence =
"`".repeat(maxTicks+1)) and use that fence instead of the hardcoded "```" so the
emitted "\n$fence$lang\n$code\n$fence\n" remains well-formed.
- Around line 494-505: The Unicode entity decoding replaces numeric entities by
calling code.toChar().toString(), which truncates supplementary-plane code
points; in the numeric-decimal and numeric-hex replace blocks (the lambda used
to build processed via processed.replace(...){ match -> ... }), replace
code.toChar().toString() with a construct that preserves full Unicode code
points (e.g., create a String from the code point such as
String(intArrayOf(code))) so both decimal and hex entity decoding handle values
up to 0x10FFFF correctly; keep the existing range check (32..0x10FFFF) and apply
the same change in both places.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: cc90bac9-daf8-4250-91c7-fff281eb8e8e

📥 Commits

Reviewing files that changed from the base of the PR and between 01226dc and 3eb2ffb.

📒 Files selected for processing (14)

core/presentation/src/commonMain/composeResources/files/whatsnew/18.json
core/presentation/src/commonMain/composeResources/files/whatsnew/ar/18.json
core/presentation/src/commonMain/composeResources/files/whatsnew/bn/18.json
core/presentation/src/commonMain/composeResources/files/whatsnew/es/18.json
core/presentation/src/commonMain/composeResources/files/whatsnew/fr/18.json
core/presentation/src/commonMain/composeResources/files/whatsnew/hi/18.json
core/presentation/src/commonMain/composeResources/files/whatsnew/it/18.json
core/presentation/src/commonMain/composeResources/files/whatsnew/ja/18.json
core/presentation/src/commonMain/composeResources/files/whatsnew/ko/18.json
core/presentation/src/commonMain/composeResources/files/whatsnew/pl/18.json
core/presentation/src/commonMain/composeResources/files/whatsnew/ru/18.json
core/presentation/src/commonMain/composeResources/files/whatsnew/tr/18.json
core/presentation/src/commonMain/composeResources/files/whatsnew/zh-CN/18.json
feature/details/data/src/commonMain/kotlin/zed/rainxch/details/data/utils/preprocessMarkdown.kt

coderabbitai · 2026-05-16T15:36:45Z

+    processed =
+        processed.replace(
+            Regex(
+                """<pre[^>]*>\s*<code(?:\s+[^>]*?class\s*=\s*["'][^"']*?language-(\w+)[^"']*?["'])?[^>]*>(.*?)</code>\s*</pre>""",


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Language capture is too restrictive for common fenced-code language IDs.

Line 341 uses language-(\w+), which misses values like c++, objective-c, f#, and shell-session. This silently drops syntax hints.

💡 Proposed fix

- """<pre[^>]*>\s*<code(?:\s+[^>]*?class\s*=\s*["'][^"']*?language-(\w+)[^"']*?["'])?[^>]*>(.*?)</code>\s*</pre>""", + """<pre[^>]*>\s*<code(?:\s+[^>]*?class\s*=\s*["'][^"']*?language-([A-Za-z0-9_+#.-]+)[^"']*?["'])?[^>]*>(.*?)</code>\s*</pre>""",

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

"""<pre[^>]*>\s*<code(?:\s+[^>]*?class\s*=\s*["'][^"']*?language-(\w+)[^"']*?["'])?[^>]*>(.*?)</code>\s*</pre>""",

"""<pre[^>]*>\s*<code(?:\s+[^>]*?class\s*=\s*["'][^"']*?language-([A-Za-z0-9_+#.-]+)[^"']*?["'])?[^>]*>(.*?)</code>\s*</pre>""",

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@feature/details/data/src/commonMain/kotlin/zed/rainxch/details/data/utils/preprocessMarkdown.kt` at line 341, The regex in preprocessMarkdown.kt currently uses language-(\w+) which fails to capture common language IDs like c++, f#, objective-c, or shell-session; update that capture group to accept plus signs, hashes and hyphens (e.g. replace language-(\w+) with language-([A-Za-z0-9_+#-]+) or more generally language-([^"'\s>]+)) so the regex on the line containing """<pre[^>]*>\s*<code(?:\s+[^>]*?class\s*=\s*["'][^"']*?language-(\w+)[^"']*?["'])?[^>]*>(.*?)</code>\s*</pre>""" correctly preserves all common fenced-code language IDs.

coderabbitai · 2026-05-16T15:36:45Z

+        ) { match ->
+            val lang = match.groupValues[1]
+            val code = match.groupValues[2]
+            "\n```$lang\n$code\n```\n"


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Code-fence generation can break when code content contains triple backticks.

Line 347 always emits ``` fences. If code already includes triple backticks, the generated Markdown block becomes malformed.

💡 Proposed fix

) { match -> val lang = match.groupValues[1] val code = match.groupValues[2] - "\n```$lang\n$code\n```\n" + val maxTicks = Regex("""`{3,}""").findAll(code).maxOfOrNull { it.value.length } ?: 2 + val fence = "`".repeat(maxTicks + 1) + "\n$fence$lang\n$code\n$fence\n" }

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@feature/details/data/src/commonMain/kotlin/zed/rainxch/details/data/utils/preprocessMarkdown.kt` at line 347, The current Markdown code-fence generation always emits triple backticks and can break if the variable code contains backticks; update the block in preprocessMarkdown.kt where you build the fence for a code snippet (the section referencing lang and code) to compute the longest run of backticks inside code (use Regex like `\`{3,}\`` to find matches), default to 2 if none found, then create a fence string of one more backtick (e.g., fence = "`".repeat(maxTicks+1)) and use that fence instead of the hardcoded "```" so the emitted "\n$fence$lang\n$code\n$fence\n" remains well-formed.

coderabbitai · 2026-05-16T15:36:45Z

+            if (code != null && code in 32..0x10FFFF) {
+                code.toChar().toString()
+            } else {
+                match.value
+            }
+        }
+    // Numeric HTML entities (hex): &#xHHHH; → char.
+    processed =
+        processed.replace(Regex("""&#x([0-9A-Fa-f]+);""")) { match ->
+            val code = match.groupValues[1].toIntOrNull(16)
+            if (code != null && code in 32..0x10FFFF) {
                code.toChar().toString()


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Verify the potentially lossy conversion path and where it is used. rg -n -C2 '0x10FFFF|toChar' feature/details/data/src/commonMain/kotlin/zed/rainxch/details/data/utils/preprocessMarkdown.kt # Demonstrate why toChar is unsafe for supplementary code points. python - <<'PY' cp = 0x1F600 # 😀 print("Expected code point:", hex(cp), chr(cp)) print("Truncated via 16-bit lower unit:", hex(cp & 0xFFFF), chr(cp & 0xFFFF)) PY

Repository: OpenHub-Store/GitHub-Store

Length of output: 735

🏁 Script executed:

# Check if Character.toChars() is available in Kotlin rg -n 'Character\.' feature/details/data/ || echo "No existing Character usage found" # Check Kotlin version and available APIs head -20 feature/details/data/build.gradle.kts || head -20 feature/details/data/build.gradle # Verify the context: is this commonMain code (multiplatform)? head -5 feature/details/data/src/commonMain/kotlin/zed/rainxch/details/data/utils/preprocessMarkdown.kt # Look for other potentially unsafe toChar() conversions rg -n 'toChar()' feature/details/data/src/commonMain/kotlin/

Repository: OpenHub-Store/GitHub-Store

Length of output: 1094

🏁 Script executed:

# Check what Kotlin version constraint says (from coding guidelines: Kotlin 2.3.10) # Search for existing surrogate pair or Unicode handling in the codebase rg -n 'surrogate|Surrogate|\\\\u|toChars|fromCodePoint' feature/details/data/src/ | head -20 # Check if there's a Kotlin extension or utility for code point to string rg -n 'codePoint|CodePoint' feature/ | head -20 # Look for similar HTML entity handling or Unicode operations in related files fd -t f -e kt feature/details/ -exec grep -l 'Regex\|replace\|entity' {} \; | head -10

Repository: OpenHub-Store/GitHub-Store

Length of output: 328

🏁 Script executed:

# Search for existing multiplatform Unicode handling rg -n 'codePoint|CodePoint|surrogate' feature/details/ # Check what's available in Kotlin stdlib for code point handling # In Kotlin, String constructor can take IntArray of code points rg -n 'String(' feature/details/data/src/commonMain/kotlin/zed/rainxch/details/data/utils/preprocessMarkdown.kt | head -20 # Look at the full context of the entity decoding to understand better sed -n '490,510p' feature/details/data/src/commonMain/kotlin/zed/rainxch/details/data/utils/preprocessMarkdown.kt

Repository: OpenHub-Store/GitHub-Store

Length of output: 882

🏁 Script executed:

# Verify String constructor with code points exists in Kotlin stdlib # by checking if it's used elsewhere in the project rg -n 'String(intArrayOf|String(.*codePoint' feature/ # Also check the Kotlin version to confirm modern stdlib cat feature/details/data/build.gradle.kts | grep -i kotlin

Repository: OpenHub-Store/GitHub-Store

Length of output: 321

🏁 Script executed:

# Simple search for intArrayOf usage rg 'intArrayOf' feature/ # Search for code point handling rg 'codePoint' feature/ # Check if there's any String construction with arrays rg 'String.*Array' feature/details/data/src/commonMain/

Repository: OpenHub-Store/GitHub-Store

Length of output: 52

🏁 Script executed:

# Let's check the actual Kotlin stdlib version available rg 'stdlib' feature/details/data/build.gradle.kts -A 2 -B 2 # Search more broadly for how Unicode strings are created in this codebase rg 'String\(' feature/details/data/src/commonMain/kotlin/ | head -20 # Let's verify the issue one more time with a simpler approach # Check the exact problematic code and its context sed -n '492,508p' feature/details/data/src/commonMain/kotlin/zed/rainxch/details/data/utils/preprocessMarkdown.kt

Repository: OpenHub-Store/GitHub-Store

Length of output: 4160

Unicode entity decoding truncates supplementary plane code points.

Lines 495 and 505 use toChar(), which only preserves the lower 16 bits despite the range check allowing up to 0x10FFFF. This corrupts supplementary plane characters (emoji, rare scripts, historic symbols).

Use String(intArrayOf(code)) instead, which properly handles all valid Unicode code points in multiplatform Kotlin:

💡 Fix

if (code != null && code in 32..0x10FFFF) { - code.toChar().toString() + String(intArrayOf(code)) } else { match.value }

Apply to both decimal (line 495) and hexadecimal (line 505) entity decoding blocks.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if (code != null && code in 32..0x10FFFF) {

code.toChar().toString()

} else {

match.value

}

}

// Numeric HTML entities (hex): &#xHHHH; → char.

processed =

processed.replace(Regex("""&#x([0-9A-Fa-f]+);""")) { match ->

val code = match.groupValues[1].toIntOrNull(16)

if (code != null && code in 32..0x10FFFF) {

code.toChar().toString()

if (code != null && code in 32..0x10FFFF) {

String(intArrayOf(code))

} else {

match.value

}

}

// Numeric HTML entities (hex): &`#xHHHH`; → char.

processed =

processed.replace(Regex("""&`#x`([0-9A-Fa-f]+);""")) { match ->

val code = match.groupValues[1].toIntOrNull(16)

if (code != null && code in 32..0x10FFFF) {

String(intArrayOf(code))

} else {

match.value

}

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@feature/details/data/src/commonMain/kotlin/zed/rainxch/details/data/utils/preprocessMarkdown.kt` around lines 494 - 505, The Unicode entity decoding replaces numeric entities by calling code.toChar().toString(), which truncates supplementary-plane code points; in the numeric-decimal and numeric-hex replace blocks (the lambda used to build processed via processed.replace(...){ match -> ... }), replace code.toChar().toString() with a construct that preserves full Unicode code points (e.g., create a String from the code point such as String(intArrayOf(code))) so both decimal and hex entity decoding handle values up to 0x10FFFF correctly; keep the existing range check (32..0x10FFFF) and apply the same change in both places.

greptile-apps

rainxchzed has reached the 50-review limit for trial accounts. To continue receiving code reviews, upgrade your plan.

greptile-apps Bot reviewed May 16, 2026

View reviewed changes

coderabbitai Bot reviewed May 16, 2026

View reviewed changes

feat(markdown): unicode sub/sup, HTML entities, blockquote, pre-code

485b5d1

rainxchzed force-pushed the feat/markdown-html-polish branch from 3eb2ffb to 485b5d1 Compare May 16, 2026 15:45

rainxchzed merged commit 83d4d92 into main May 16, 2026
1 check was pending

greptile-apps Bot reviewed May 16, 2026

View reviewed changes

rainxchzed deleted the feat/markdown-html-polish branch May 16, 2026 15:45

This was referenced May 16, 2026

fix(markdown): inline details inside table cells stay one-line #623

Merged

perf(details): drop BoxWithConstraints subcompose + lazy keys #636

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(markdown): polish — sub/sup, HTML entities, blockquote, pre-code#620

feat(markdown): polish — sub/sup, HTML entities, blockquote, pre-code#620
rainxchzed merged 1 commit into
mainfrom
feat/markdown-html-polish

rainxchzed commented May 16, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

greptile-apps Bot left a comment

Uh oh!

coderabbitai Bot commented May 16, 2026 •

edited

Loading

Review failed

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot May 16, 2026

Uh oh!

coderabbitai Bot May 16, 2026

Uh oh!

coderabbitai Bot May 16, 2026

Uh oh!

Uh oh!

greptile-apps Bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	"""<pre[^>]>\s<code(?:\s+[^>]?class\s=\s["'][^"']?language-(\w+)[^"']?["'])?[^>]>(.?)</code>\s</pre>""",
	"""<pre[^>]>\s<code(?:\s+[^>]?class\s=\s["'][^"']?language-([A-Za-z0-9_+#.-]+)[^"']?["'])?[^>]>(.?)</code>\s</pre>""",

Uh oh!

Conversation

rainxchzed commented May 16, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

greptile-apps Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot commented May 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 16, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 16, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 16, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

greptile-apps Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

rainxchzed commented May 16, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 16, 2026 •

edited

Loading