diff --git a/README.rst b/README.rst index 81dbbb11..6e7406f7 100644 --- a/README.rst +++ b/README.rst @@ -13,6 +13,27 @@ Uses This is used as the scanner inside `Mathics `_ but it can also be used for tokenizing and formatting WL code. In fact we intend to write one. +Implementation +============== + +mathics_scaner.characters +------------------------- + +This module consists mostly of translation tables between WL and unicode/ascii. +Because of the large size of this tables, it was decided to store them in a +file and read them from disk at runtime (when the module is imported). Our +tests showed that storing the tables as JSON and using +[ujson](https://github.com/ultrajson/ultrajson) to read them is the most +efficient way to access them. However, this is merelly an implementation +detail and consumers of this library should not relly on this assumption. + +For maintainability and effeciency, we decided to store this data in a +human-readable YAML file (`data/named-characters.yml`) and compile them into +the JSON tables used internally by the library (`data/characters.json`) for +faster access at runtime. The conversion of the data is performed by the +script `admin-tools/compile-translation-tables.py` at each commit to the +`master` branch via GitHub Actions. + Contributing ------------ diff --git a/mathics_scanner/data/named-characters.yml b/mathics_scanner/data/named-characters.yml index 95a63972..a8d0a2d2 100644 --- a/mathics_scanner/data/named-characters.yml +++ b/mathics_scanner/data/named-characters.yml @@ -6935,7 +6935,7 @@ Upsilon: # looks more like U+26E2 (Astronomical Symbol for Uranus) than the Standard Unicode equavalent # seen at https://www.compart.com/en/unicode/U+2645. # As with the Earth, we are going off of the name and the code point rather than the -# visual representation of the symbo. +# visual representation of the symbol. Uranus: has-unicode-inverse: false is-letter-like: false