Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 21 additions & 0 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,27 @@ Uses

This is used as the scanner inside `Mathics <https://mathics.org>`_ but it can also be used for tokenizing and formatting WL code. In fact we intend to write one.

Implementation
==============

mathics_scaner.characters
-------------------------

This module consists mostly of translation tables between WL and unicode/ascii.
Because of the large size of this tables, it was decided to store them in a
file and read them from disk at runtime (when the module is imported). Our
tests showed that storing the tables as JSON and using
[ujson](https://github.com/ultrajson/ultrajson) to read them is the most
efficient way to access them. However, this is merelly an implementation
detail and consumers of this library should not relly on this assumption.

For maintainability and effeciency, we decided to store this data in a
human-readable YAML file (`data/named-characters.yml`) and compile them into
the JSON tables used internally by the library (`data/characters.json`) for
faster access at runtime. The conversion of the data is performed by the
script `admin-tools/compile-translation-tables.py` at each commit to the
`master` branch via GitHub Actions.


Contributing
------------
Expand Down
2 changes: 1 addition & 1 deletion mathics_scanner/data/named-characters.yml
Original file line number Diff line number Diff line change
Expand Up @@ -6935,7 +6935,7 @@ Upsilon:
# looks more like U+26E2 (Astronomical Symbol for Uranus) than the Standard Unicode equavalent
# seen at https://www.compart.com/en/unicode/U+2645.
# As with the Earth, we are going off of the name and the code point rather than the
# visual representation of the symbo.
# visual representation of the symbol.
Uranus:
has-unicode-inverse: false
is-letter-like: false
Expand Down