-
Notifications
You must be signed in to change notification settings - Fork 87
Description
This proposal is to add support of hyphenation of non-English languages. This is the first step of supporting internationalization.
Proposal
- Add a new type:
hyphen-dictHyphenation pattern. Underlying OCaml representation isLoadHyph.t.
- Add new primitives:
load-hyphen-dict : string -> hyphen-dictset-hyphen-dict : hyphen-dict -> ctx -> ctxget-hyphen-dict : ctx -> hyphen-dict
- Use BCP 47 Language Tag or UTS#35 Language Identifier for filenames of hyphenation dictionary files.
- The current hyphenation file
english.satysfi-hyphneeds to be renamed withen.satysfi-hyph.
- The current hyphenation file
load-hyphen-pattern language loads a hyphenation dictionary from hyph/<language>.satysfi-hyph. It raises an exception when the file is not found.
set-hyphen-pattern hyph ctx sets hyphnation pattern hyph to ctx.hyphenation_pattern.
get-hyphen-pattern ctx returns hyphnation pattern ctx.hyphenation_pattern.
Current Implementation
- English hyphenation is located at
lib-satysfi/dist/hyph/english.satysfi-hyph english.satysfi-hyphis loaded atSATySFi/src/frontend/primitives.cppo.ml
Line 604 in 1243829
default_hyphen_dictionary := LoadHyph.main (Config.resolve_lib_file_exn (make_lib_path "dist/hyph/english.satysfi-hyph")); - The only operation which sets
hyphenation_dictionaryisget_pdf_mode_initial_contextatSATySFi/src/frontend/primitives.cppo.ml
Line 497 in 1243829
hyphen_dictionary = !default_hyphen_dictionary;
Alternative Options
Activate multiple hyphen-dicts at the same time
This proposal based on a design where users can replace English hyphenation pattern with other language's. It may be natural to set a hyphenation dictionary to each language/script (i.e., set-hyphen-dict : language-tag -> hyphen-dict > ctx -> ctx or set-hyphen-dict : hyphen-dict language-tag-map -> ctx -> ctx) rather than applying given hyphenation pattern globally, if we decide to extend the multi-language system, where English and Japanese are automatically detected with script types.
Introducing new type hyphen-dict
Instead of introducing hyphen-dict and having users explicitly handle hyphenation dictionaries, we could provide primitives get/set strings that represent languages (e.g., set-hyphen-dict : string -> ctx -> ctx).
However, hyphen-dict type allows more extension points (e.g., tweaking hyphenation patterns, adding exceptional words ad hoc) in future.
load-hyphen-dict throwing exceptions
load-hyphen-dict can have signature load-hyphen-dict : string -> hyphen-dict option. I don't have strong opinion about this. I was thinking of having a new package for each language, therefore specifying wrong filenames is unlikely.
Having a primitive to get available hyphenation dictionary files
I could include another primitive get-hyph-dict-list that returns available files under hyph/ (for example, returning [ "en" ]). This primitive is not mandatory.
Renaming english.satysfi-hyph for en.satysfi-hyph
We could leave the filename as is. However, considering even TeX has already adopted naming scheme with BCP 47 Language Tag, there is no reason to stick at traditional naming scheme with language names in English.
で、そのTeXとかいうやつのハイフネーションファイルの名前も今ではコレだったりする。#TeX pic.twitter.com/48vtJFz8G7
— 某ZR(ざんねん🙃) (@zr_tex8r) January 11, 2020
Metadata
Metadata
Assignees
Projects
Status