"bbox": [
317.98,
201.75,
641.93,
264.3
]
text = '''<span style=\"font-size:60.02pt;color:rgb(255,255,211);\">ค่าธรรมเนียมชำระเมื่อมาถึง</span>'''
# or: '''<span style=\"font-size:60.02pt;color:rgb(255,255,211);\">ค่าธรรมเนียม­ชำระ­เมื่อ­มาถึง</span>'''
page.insert_htmlbox(rect, text, scale_low=0)
Description of the bug
When using
insert_htmlboxto render Thai text or long pure number sequences (which do not have natural word breaks), the following problems occur:No Auto-Scaling:
insert_htmlboxwill not auto-scale (shrink) the text to fit the given rectangle. As a result, the content will overflow the boundary and be cut off.Wrong Hyphen for Thai ( handling):
­to represent soft line-break opportunities,insert_htmlboxmay insert a "-" (hyphen) at the line break. However, adding a hyphen at word breaks in Thai is not in line with Thai writing conventions and is visually/semantically incorrect.How to reproduce the bug
Thai case:
Number case:
Expected:
insert_htmlboxshould apply auto-scaling so the full content fits.­as a break point (such as in Thai tokenization), do not insert a hyphen character at the break; in Thai, no such symbol should appear.Observed:
-is added at Thai­line breaks, which is not appropriate for Thai (and similarly for Chinese, Japanese, Korean, etc).PyMuPDF version
1.26.3
Operating system
Windows
Python version
3.10