Skip to content

Optimize Unicode lookup#236

Merged
benoitkugler merged 27 commits intomainfrom
unicode-lookup
Feb 25, 2026
Merged

Optimize Unicode lookup#236
benoitkugler merged 27 commits intomainfrom
unicode-lookup

Conversation

@benoitkugler
Copy link
Copy Markdown
Contributor

Supersede #227

Micro benchmark shows a 20x speed-up; binary size also decrease from 18KB to 12KB
Micro benchmark shows a 2x speed-up; binary size is roughly the same
…able

Micro benchmark some speed-up, with binary size reduction: around 8 * 32 KB. to 15KB
Micro benchmark shows a 2x speed-up and binary size reduction.
Micro benchmark shows around 4x speed-up and 30% binary size reduction.
Micro benchmark shows around 5x speed-up and 3x binary size reduction.
Micro benchmark shows around 7x speed-up and 2x binary size reduction.
Micro benchmark shows around 27x speed-up (!) and almost 2x binary size reduction.
Micro benchmark shows a 20x speed-up; binary size also decrease from 11KB to 2KB
Micro benchmark shows arouns 1.5x speed-up and binary size reduction.
Comment thread internal/unicodedata/unicode_test.go Outdated
Copy link
Copy Markdown
Member

@whereswaldon whereswaldon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My only concern is the commented benchmarks, and it's minor. Looks like a hefty batch of improvements in both binary size and performance. Thanks very much for putting this together!

@benoitkugler
Copy link
Copy Markdown
Contributor Author

Merging since there is no API change.

@benoitkugler benoitkugler merged commit 40da633 into main Feb 25, 2026
14 checks passed
@benoitkugler benoitkugler deleted the unicode-lookup branch February 25, 2026 15:59
3ace pushed a commit to unidoc/typesetting that referenced this pull request Apr 2, 2026
* [unicode] use smarter storage for unicode general category.

Micro benchmark shows a 20x speed-up; binary size also decrease from 18KB to 12KB

* more compact name

* remove unused file

* minor fix in generated tables

* [unicode] use smarter storage for unicode mirroring characters

Micro benchmark shows a 2x speed-up; binary size is roughly the same

* move script tests into package language

* [unicode] use smarter storage for unicode composition/decomposition table

Micro benchmark some speed-up, with binary size reduction: around 8 * 32 KB. to 15KB

* enforce deterministic order

* [unicode] use smarter storage for unicode emojis table

Micro benchmark shows a 2x speed-up and binary size reduction.

* [harfbuzz/unicode] use more compact storage for USE categories

* [font] simplify pua remap generation

* remove unused code

* remove unused file

* [unicode] use smarter storage for unicode Indic Conjunct Break table

Micro benchmark shows around 4x speed-up and 30% binary size reduction.

* [unicode] use smarter storage for unicode Grapheme Break table

Micro benchmark shows around 5x speed-up and 3x binary size reduction.

* upgrade to Unicode 17

* [unicode] use smarter storage for unicode Word Break table

Micro benchmark shows around 7x speed-up and 2x binary size reduction.

* [unicode] use smarter storage for unicode Line Break table

Micro benchmark shows around 27x speed-up (!) and almost 2x binary size reduction.

* [unicode] be more coherent with names

* [segmenter] optimize comparison with flag unions

* [segmenter] add benchmark

* [segmenter] more caching for speed up

* [segmenter] optimize by returning early instead of always applying ALL the rules

* remove tmp benchmark

* [harfbuzz] simplify code

* [unicode] use smarter storage for unicode combining category

Micro benchmark shows a 20x speed-up; binary size also decrease from 11KB to 2KB

* [unicode] use smarter storage for unicode East Asian Width table

Micro benchmark shows arouns 1.5x speed-up and binary size reduction.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants