First pass lexical model compiler by mcdurdin · Pull Request #1 · keymanapp/lexical-models

mcdurdin · 2019-01-28T16:56:30Z

No description provided.

eddieantonio · 2019-01-28T17:32:33Z

+(new LexicalModelCompiler).compile({
+  format: 'trie-1.0',
+  wordBreaking: {
+    allowedCharacters: { initials: 'abcdefghijklmnopqrstuvwxyz', medials: 'abcdefghijklmnopqrstuvwxyz', finals: 'abcdefghijklmnopqrstuvwxyz' },


Although I like how compact this notation is, JavaScript's strings may not work well, because there's no explicit way to denote full characters that are greater than one code point (e.g., <X̱> in Northern Haida, two code points in NFC). Also, characters that are outside the BMP will be a pain, and error-prone.

So could we support string -- for simple situations or array of strings for others? e.g. ['a','b',...,'X̱']?

I am unsure on how this is going to be used -- I would be happy to have the source format support both but compiled into array of strings for simplicity of consumption.

I like this idea!

eddieantonio · 2019-01-28T17:36:07Z

+    return indexes;
+  }
+
+  return indexesOf(text);


Do you feel the indices in the string will be more useful than returning the actual words as strings? If word breaker functions return both the start and the end index of a word, then we can have it both ways: default is to return indices, and a simple function call can convert indices into words as strings.

Good questions. Am happy to adjust this per the requirements of the wordbreaking when we get to it. I've defaulted to the most basic approach possible now of just marking the characters which are wordbreakers -- but I know that is naive. There are plenty of other wordbreak algorithms that we should research as well (yay!)

eddieantonio · 2019-01-28T17:39:06Z

+# Define terminal colours
+#
+
+if [ -t 2 ]; then


I had to look up what [ -t 2] is. Could you please provide a comment explain why this is here? I reckon it's to define ANSI colours ONLY when stderr is outputting to the terminal (as opposed to redirected to a file).

jahorton · 2019-01-31T15:59:44Z

+    <FileVersion>12.0</FileVersion>
+  </System>
+  <Options>
+    <FollowKeyboardVersion/>


?

How would the model follow the keyboard version? As I recall, models and keyboards aren't allowed within the same package.

Yes, this is not great... but, we will overload this tag name to mean "Follow Lexical Model Version" as well. We may bubble that in the UI more appropriately. It should have been called "Follow Content File Version" but my crystal ball was on the blink on the day I chose the tag name :)

jahorton · 2019-01-31T16:06:22Z

+      return fs.readFileSync(path.join(sourcePath, source), 'utf8'); 
+    });
+
+    let oc: LexicalModelCompiled = {id:model_info.id, format:o.format, wordBreaking:o.wordBreaking};


format and wordBreaking are not listed as properties in tools/lexical-model.ts.

interface LexicalModelCompiled extends LexicalModel :)

eddieantonio · 2019-02-05T16:09:37Z

+    //
+    // Filename expectations
+    //
+    const kpsFileName = '../source/'+model_info.id+'.model.kps';


Suggestion: String templates may be cleaner here:

const kpsFileName = `../source/${model_info.id}.model.kps`;

eddieantonio · 2019-02-05T16:13:21Z

@@ -0,0 +1,79 @@
+interface KmpJsonFile {


Where will the official documentation (i.e., meaning of each field) for the JSON file exist?

I have an open PR on the help site (currently private, although shortly hoping to move it public) which documents these fields. See https://help.keyman.com/developer/11.0/reference/file-types/metadata (probably will be in 12.0 URL when that lands).

eddieantonio · 2019-02-05T16:15:13Z

+
+interface LexicalModel {
+  readonly format: 'trie-1.0'|'fst-foma-1.0'|'custom-1.0',
+  readonly wordBreaking?: {


Note that a custom model may provide its own word breaking.

eddieantonio · 2019-02-05T16:16:35Z

+  //... metadata ...
+}
+
+interface LexicalModelPrediction {


Are these interfaces to correspond with the interfaces in the main repo?

See: https://github.com/keymanapp/keyman/blob/master/common/predictive-text/message.d.ts#L145-L224

Yes, this is really a stub for the real thing. I will update LexicalModelPrediction to correspond more cleanly with the Transform interface (but will not rely on the one in the main repo for now -- refactor can come once everything stabilises I think).

eddieantonio · 2019-02-05T16:17:49Z

+  fst: string;
+}
+
+interface LexicalModelCompiledCustom extends LexicalModelCompiled {


This will most likely also contain src: string, but we'll cross that bridge when we get there :D

eddieantonio · 2019-02-05T16:18:37Z

@@ -0,0 +1,33 @@
+interface ModelInfoFile {


Where will the official documentation for this live?

It will live in the https://help.keyman.com/developer/cloud/ area, alongside the https://help.keyman.com/DEVELOPER/cloud/keyboard_info/1.0/ .keyboard_info file. Again, I have a PR open for this.

There is also a PR open against api.keyman.com/schemas for the corresponding JSON schema (and the updated kmp.json schema as well). Also hopefully to go public soon.

[api.keyman.com] Lexical model schema drafts api.keyman.com#1 covers the JSON schema and changes to package metadata format.

[help.keyman.com] Lexical model schema updates help.keyman.com#1 includes the documentation for the .model_info file (as of now, WIP).

These PRs are both #1 because we've just moved api and help sites to public source.

eddieantonio · 2019-02-05T16:21:14Z

My major concerns so far are: where the JSON documentation will exist; and whether the prediction interfaces will be duplicated and compatible with those in keymanapp/keyman.

Aside from that, LGTM!

…nd config

mcdurdin · 2019-02-11T20:36:29Z

Okay, I've addressed a bunch of bits and pieces with the compiler and documentation, and tried to cover the review comments. At this point, there are still a bunch of unfinished bits and pieces but I think that's fine, because this still gives us a base for generating .kmp files for use in the target applications, and for establishing CI.

eddieantonio · 2019-02-11T21:31:49Z

  pushd build
  mkdir obj
-  ../../../../node_modules/.bin/tsc --outDir ./obj ../source/model.ts 
+  ../../../../node_modules/.bin/tsc --module commonjs --target es6 --outDir ./obj ../source/model.ts 


would it be cleaner to use npx tsc instead of ../../../../node_modules/.bin/tsc>

Is it worth extracting the compiler options to an explicit tsconfig.json?

Updated to use npx. Note, on Windows, this requires Node.js version 10.0 or later.

Am I right in thinking this would require an explicit tsconfig.json for each model? If so for now I'll leave it in the script.

Create LICENSE.md

chore: Fixup broken build script

Getat LM for Obolo v 1.1

add lexical model for chechen latin

mcdurdin added 7 commits January 26, 2019 10:56

Basic compiler, first steps

de2f50e

Basic compiler, create templated kmp.json

05dfe7f

Basic build script

d979334

Compiler polish and refactor

3fc2c77

More refactoring

4277704

Continued refactoring

d9d3670

Last ditch refactor

65d6fb9

eddieantonio reviewed Jan 28, 2019

View reviewed changes

Remove wordlist example from custom

e79b45f

mcdurdin added the enhancement New feature or request label Jan 31, 2019

mcdurdin added this to the P5S3 milestone Jan 31, 2019

Address review comments

6e036ca

jahorton reviewed Jan 31, 2019

View reviewed changes

eddieantonio reviewed Feb 5, 2019

View reviewed changes

mcdurdin modified the milestones: P5S3, P5S4 Feb 8, 2019

[lexical-models] Add required fields to model_info, fixup ts layout a…

f56ef82

…nd config

eddieantonio reviewed Feb 11, 2019

View reviewed changes

[lexical-models] Address review comments

48963a3

mcdurdin merged commit 4bf71a5 into master Feb 13, 2019

mcdurdin deleted the basic-compiler branch February 13, 2019 23:30

DavidLRowe pushed a commit that referenced this pull request Mar 10, 2022

Merge pull request #1 from ind-nt/add-license-1

9909574

Create LICENSE.md

mcdurdin pushed a commit that referenced this pull request Mar 28, 2022

Merge pull request #1 from keymanapp/fv-bea-tsaadane

58204b9

chore: Fixup broken build script

DavidLRowe pushed a commit that referenced this pull request Jul 11, 2022

Merge pull request #1 from katelem24/katelem24-patch-1

e58e5dd

Getat LM for Obolo v 1.1

DavidLRowe pushed a commit that referenced this pull request Jan 29, 2025

Merge pull request #1 from chechen-language/chechen_latin

104d7d8

add lexical model for chechen latin

Uh oh!

Conversation

mcdurdin commented Jan 28, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mcdurdin Jan 31, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

eddieantonio Jan 28, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

eddieantonio Feb 5, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

eddieantonio commented Feb 5, 2019

Uh oh!

mcdurdin commented Feb 11, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

mcdurdin Jan 31, 2019 •

edited

Loading

eddieantonio Jan 28, 2019 •

edited

Loading

eddieantonio Feb 5, 2019 •

edited

Loading