OPENNLP-910: Add checkstyle by smarthi · Pull Request #29 · apache/opennlp

smarthi · 2017-01-03T12:29:17Z

No description provided.

kottmann · 2017-01-03T15:02:42Z

+    </module>
+    <module name="OverloadMethodsDeclarationOrder"/>
+    <module name="VariableDeclarationUsageDistance"/>
+    <module name="CustomImportOrder">


These days the IDE takes care of adding imports for us, do we need to enforce an order? Does that make sense?

yes, most projects i had seen enforce an order on the imports to keep them consistent through out. If anything this will ensure a uniform standard across the project.

kottmann · 2017-01-03T15:07:19Z

+      <message key="ws.notPreceded"
+               value="GenericWhitespace ''{0}'' is not preceded with whitespace."/>
+    </module>
+    <module name="Indentation">


+1 This one is important, and will make sure people configure their editor correctly to modify our code.

kottmann · 2017-01-03T15:09:01Z

+      <message key="name.invalidPattern"
+               value="Parameter name ''{0}'' must match pattern ''{1}''."/>
+    </module>
+    <module name="CatchParameterName">


This does not allow for e as parameter name? Is it better if people have to write ex or so ? I would remove this one.

agree we can remove this

kottmann · 2017-01-03T15:26:51Z

+      <property name="allowByTailComment" value="true"/>
+      <property name="allowNonPrintableEscapes" value="true"/>
+    </module>
+    <module name="LineLength">


The code conventions say 80 to 100 as maximum line length. Maybe we should define a hard limit at 110 or 120 rather than 140?

We can set that to 110.

kottmann · 2017-01-03T15:38:48Z

+      <property name="tagOrder" value="@param, @return, @throws, @deprecated"/>
+      <property name="target" value="CLASS_DEF, INTERFACE_DEF, ENUM_DEF, METHOD_DEF, CTOR_DEF, VARIABLE_DEF"/>
+    </module>
+    <module name="JavadocMethod">


There are many places where JavaDoc is missing, and JavaDoc only has value if someone actually takes time to write something. I suggest to not enable that for now or we really improve the JavaDoc. But I am against adding lots of empty no-value JavaDoc just to make this check happy.

kottmann · 2017-01-03T15:41:13Z

I suggest to exclude the porter and snowball stemmer from checkstyle for now.

kottmann · 2017-01-06T18:15:30Z

Looks like the current config is not excluding the stemmers, this seems to work:
<excludes>**/stemmer/**/*</excludes>

Additive Unicode text handling for matching, search, and tokenization preprocessing (new types only, no breaking changes). UAX #29 word tokenizer (opennlp.tools.tokenize.uax29): - WordSegmenter, WordTokenizer (implements opennlp.tools.tokenize.Tokenizer), and WordType. A single-pass, table-driven engine with O(1) Word_Break lookups and no regular expression; 100% conformant on the official Unicode 17.0 WordBreakTest suite (1944/1944). Offset-preserving spans and a zero-allocation streaming API. Text normalization (opennlp.tools.util.normalizer): - The layered Term model (Dimension, Term, TermAnalyzer): a token as a stack of normalization layers (NFC, NFKC, whitespace, dash, case fold, accent fold, confusable fold, stem, lemma) with eager configured layers, lazy memoized extras, and O(1) peel; integrates the UAX #29 tokenizer and the existing Stemmer/Lemmatizer as the token-level layers. - Confusable (homoglyph) skeleton folding per UTS #39, from the bundled Unicode security data. - Per-language profiles (NormalizationProfile, NormalizationProfiles) mirroring the Snowball algorithm set with LanguageDetector fallback, including a German DIN 5007-2 umlaut fold (a-umlaut to ae, eszett to ss). - First-class builder configuration: whitespace/dash fold targets, locale case folding, accent-fold script scope, and max token length, over a general transform(dimension, normalizer) hook. Documentation: a Text Normalization chapter and a UAX #29 tokenizer section in the manual; the bundled Unicode data files (WordBreakProperty, emoji-data, WordBreakTest, confusables) are attributed in NOTICE. Tests: UAX #29 boundary conformance and unit tests, and unit tests for the normalizer engine, term model, confusables, language profiles, and German fold.

Builds on the normalization foundation. - opennlp-runtime tokenize/uax29: the UAX #29 word segmenter and Tokenizer implementation (WordSegmenter, WordTokenizer, WordType, WordBreak, boundary engine) with bundled Unicode WordBreakProperty and emoji ExtendedPictographic data, validated against the official WordBreakTest conformance suite (1944/1944). - The layered Term model (Term, TermAnalyzer) that tokenizes then normalizes per token over the Dimension ladder, the per-language NormalizationProfile registry, and the confusable-fold coverage. - Extends the bundled-Unicode attribution (NOTICE, NOTICE.template, LICENSE, rat-excludes) to the WordBreakProperty / ExtendedPictographic / WordBreakTest data files, and restores Dimension's javadoc cross-links now that the Term layer is present.

- WordBoundaryConformanceTest: guard the conformance resource stream with Objects.requireNonNull and a clear message instead of an opaque NPE in InputStreamReader, and remove the unused NO_BOUNDARY constant. - NormalizationProfiles.forLanguage: fail loud on a null language argument at the public entry point, with a null-rejection test.

Adds the normalizer manual chapter and updates the tokenizer, doccat, namefinder, and introduction chapters (and the master opennlp.xml) to cover the new normalization pipeline and word tokenizer.

smarthi changed the title ~~OPENNLP-910: Add checkstyle~~ OPENNLP-910: [WIP: Do Not Merge] Add checkstyle Jan 3, 2017

kottmann reviewed Jan 3, 2017

View reviewed changes

OPENNLP-910: Add checkstyle

3efd2a6

smarthi changed the title ~~OPENNLP-910: [WIP: Do Not Merge] Add checkstyle~~ OPENNLP-910: Add checkstyle Jan 6, 2017

asfgit closed this in a83ea28 Jan 6, 2017

smarthi deleted the OPENNLP-910 branch January 6, 2017 22:19

asfgit pushed a commit that referenced this pull request Apr 16, 2017

OPENNLP-910: Add checkstyle, this closes #29

aa567a5

asfgit pushed a commit that referenced this pull request Apr 20, 2017

OPENNLP-910: Add checkstyle, this closes #29

5eee6f6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OPENNLP-910: Add checkstyle#29

OPENNLP-910: Add checkstyle#29
smarthi wants to merge 1 commit into
apache:trunkfrom
smarthi:OPENNLP-910

smarthi commented Jan 3, 2017

Uh oh!

kottmann Jan 3, 2017

Uh oh!

smarthi Jan 3, 2017

Uh oh!

kottmann Jan 3, 2017

Uh oh!

kottmann Jan 3, 2017

Uh oh!

smarthi Jan 3, 2017

Uh oh!

kottmann Jan 3, 2017

Uh oh!

smarthi Jan 3, 2017 •

edited

Loading

Uh oh!

kottmann Jan 3, 2017

Uh oh!

kottmann commented Jan 3, 2017

Uh oh!

kottmann commented Jan 6, 2017 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

smarthi commented Jan 3, 2017

Uh oh!

kottmann Jan 3, 2017

Choose a reason for hiding this comment

Uh oh!

smarthi Jan 3, 2017

Choose a reason for hiding this comment

Uh oh!

kottmann Jan 3, 2017

Choose a reason for hiding this comment

Uh oh!

kottmann Jan 3, 2017

Choose a reason for hiding this comment

Uh oh!

smarthi Jan 3, 2017

Choose a reason for hiding this comment

Uh oh!

kottmann Jan 3, 2017

Choose a reason for hiding this comment

Uh oh!

smarthi Jan 3, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kottmann Jan 3, 2017

Choose a reason for hiding this comment

Uh oh!

kottmann commented Jan 3, 2017

Uh oh!

kottmann commented Jan 6, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

smarthi Jan 3, 2017 •

edited

Loading

kottmann commented Jan 6, 2017 •

edited

Loading