Conversation
| (new LexicalModelCompiler).compile({ | ||
| format: 'trie-1.0', | ||
| wordBreaking: { | ||
| allowedCharacters: { initials: 'abcdefghijklmnopqrstuvwxyz', medials: 'abcdefghijklmnopqrstuvwxyz', finals: 'abcdefghijklmnopqrstuvwxyz' }, |
There was a problem hiding this comment.
Although I like how compact this notation is, JavaScript's strings may not work well, because there's no explicit way to denote full characters that are greater than one code point (e.g., <X̱> in Northern Haida, two code points in NFC). Also, characters that are outside the BMP will be a pain, and error-prone.
There was a problem hiding this comment.
So could we support string -- for simple situations or array of strings for others? e.g. ['a','b',...,'X̱']?
I am unsure on how this is going to be used -- I would be happy to have the source format support both but compiled into array of strings for simplicity of consumption.
| return indexes; | ||
| } | ||
|
|
||
| return indexesOf(text); |
There was a problem hiding this comment.
Do you feel the indices in the string will be more useful than returning the actual words as strings? If word breaker functions return both the start and the end index of a word, then we can have it both ways: default is to return indices, and a simple function call can convert indices into words as strings.
There was a problem hiding this comment.
Good questions. Am happy to adjust this per the requirements of the wordbreaking when we get to it. I've defaulted to the most basic approach possible now of just marking the characters which are wordbreakers -- but I know that is naive. There are plenty of other wordbreak algorithms that we should research as well (yay!)
| # Define terminal colours | ||
| # | ||
|
|
||
| if [ -t 2 ]; then |
There was a problem hiding this comment.
I had to look up what [ -t 2] is. Could you please provide a comment explain why this is here? I reckon it's to define ANSI colours ONLY when stderr is outputting to the terminal (as opposed to redirected to a file).
| <FileVersion>12.0</FileVersion> | ||
| </System> | ||
| <Options> | ||
| <FollowKeyboardVersion/> |
There was a problem hiding this comment.
?
How would the model follow the keyboard version? As I recall, models and keyboards aren't allowed within the same package.
There was a problem hiding this comment.
Yes, this is not great... but, we will overload this tag name to mean "Follow Lexical Model Version" as well. We may bubble that in the UI more appropriately. It should have been called "Follow Content File Version" but my crystal ball was on the blink on the day I chose the tag name :)
| return fs.readFileSync(path.join(sourcePath, source), 'utf8'); | ||
| }); | ||
|
|
||
| let oc: LexicalModelCompiled = {id:model_info.id, format:o.format, wordBreaking:o.wordBreaking}; |
There was a problem hiding this comment.
format and wordBreaking are not listed as properties in tools/lexical-model.ts.
There was a problem hiding this comment.
interface LexicalModelCompiled extends LexicalModel :)
| // | ||
| // Filename expectations | ||
| // | ||
| const kpsFileName = '../source/'+model_info.id+'.model.kps'; |
There was a problem hiding this comment.
Suggestion: String templates may be cleaner here:
const kpsFileName = `../source/${model_info.id}.model.kps`;| @@ -0,0 +1,79 @@ | |||
| interface KmpJsonFile { | |||
There was a problem hiding this comment.
Where will the official documentation (i.e., meaning of each field) for the JSON file exist?
There was a problem hiding this comment.
I have an open PR on the help site (currently private, although shortly hoping to move it public) which documents these fields. See https://help.keyman.com/developer/11.0/reference/file-types/metadata (probably will be in 12.0 URL when that lands).
|
|
||
| interface LexicalModel { | ||
| readonly format: 'trie-1.0'|'fst-foma-1.0'|'custom-1.0', | ||
| readonly wordBreaking?: { |
There was a problem hiding this comment.
Note that a custom model may provide its own word breaking.
| //... metadata ... | ||
| } | ||
|
|
||
| interface LexicalModelPrediction { |
There was a problem hiding this comment.
Are these interfaces to correspond with the interfaces in the main repo?
See: https://github.com/keymanapp/keyman/blob/master/common/predictive-text/message.d.ts#L145-L224
There was a problem hiding this comment.
Yes, this is really a stub for the real thing. I will update LexicalModelPrediction to correspond more cleanly with the Transform interface (but will not rely on the one in the main repo for now -- refactor can come once everything stabilises I think).
| fst: string; | ||
| } | ||
|
|
||
| interface LexicalModelCompiledCustom extends LexicalModelCompiled { |
There was a problem hiding this comment.
This will most likely also contain src: string, but we'll cross that bridge when we get there :D
| @@ -0,0 +1,33 @@ | |||
| interface ModelInfoFile { | |||
There was a problem hiding this comment.
Where will the official documentation for this live?
There was a problem hiding this comment.
It will live in the https://help.keyman.com/developer/cloud/ area, alongside the https://help.keyman.com/DEVELOPER/cloud/keyboard_info/1.0/ .keyboard_info file. Again, I have a PR open for this.
There is also a PR open against api.keyman.com/schemas for the corresponding JSON schema (and the updated kmp.json schema as well). Also hopefully to go public soon.
There was a problem hiding this comment.
- [api.keyman.com] Lexical model schema drafts api.keyman.com#1 covers the JSON schema and changes to package metadata format.
- [help.keyman.com] Lexical model schema updates help.keyman.com#1 includes the documentation for the .model_info file (as of now, WIP).
These PRs are both #1 because we've just moved api and help sites to public source.
|
My major concerns so far are: where the JSON documentation will exist; and whether the prediction interfaces will be duplicated and compatible with those in keymanapp/keyman. Aside from that, LGTM! |
|
Okay, I've addressed a bunch of bits and pieces with the compiler and documentation, and tried to cover the review comments. At this point, there are still a bunch of unfinished bits and pieces but I think that's fine, because this still gives us a base for generating .kmp files for use in the target applications, and for establishing CI. |
| pushd build | ||
| mkdir obj | ||
| ../../../../node_modules/.bin/tsc --outDir ./obj ../source/model.ts | ||
| ../../../../node_modules/.bin/tsc --module commonjs --target es6 --outDir ./obj ../source/model.ts |
There was a problem hiding this comment.
- would it be cleaner to use
npx tscinstead of../../../../node_modules/.bin/tsc> - Is it worth extracting the compiler options to an explicit
tsconfig.json?
There was a problem hiding this comment.
-
Updated to use
npx. Note, on Windows, this requires Node.js version 10.0 or later. -
Am I right in thinking this would require an explicit
tsconfig.jsonfor each model? If so for now I'll leave it in the script.
chore: Fixup broken build script
add lexical model for chechen latin
No description provided.