[SPARK-15064][ML] Locale support in StopWordsRemover#12968
Conversation
| * Default: English locale ("en") | ||
| * @group param | ||
| */ | ||
| val locale: Param[String] = new Param[String](this, "locale", |
There was a problem hiding this comment.
Hm, shouldn't all this perhaps be linked to the stopwords set? if you loaded the French stopwords you'd want the French locale always?
There was a problem hiding this comment.
Yes, but, How can we know that users loaded the French stopwords? User can load stopwords by
StopWordsRemover.loadDefaultStopWords("french")
and setting is
new StopWordsRemover().setStopWords(stopWords)
. Do you have any suggestion about that case?
There was a problem hiding this comment.
For supported languages, we can know the appropriate locale and maintain an internal mapping. So "french" is known to map to Locale.FRENCH. For loading an arbitrary list, we don't know, but you could provide an overload where you provide a Locale.
|
(@burakkose I think the |
|
@HyukjinKwon, thank you for informing. Yes, you're right. |
| setDefault(stopWords -> StopWordsRemover.loadDefaultStopWords("english"), caseSensitive -> false) | ||
| /** | ||
| * Locale for doing a case sensitive comparison | ||
| * Default: English locale ("en") |
There was a problem hiding this comment.
Shall we list what're the available options, or provide some reference here?
|
Made a pass. That's all from me. |
|
This is blocking user guide /examples update for 2.0. |
# Conflicts: # mllib/src/test/scala/org/apache/spark/ml/feature/StopWordsRemoverSuite.scala
|
Can you specify the blocking? |
| } else { | ||
| // TODO: support user locale (SPARK-15064) | ||
| val toLower = (s: String) => if (s != null) s.toLowerCase else s | ||
| val loadedLocale = StopWordsRemover.loadLocale($(locale)) |
There was a problem hiding this comment.
Maybe just new Locale($(locale))
|
I'm not sure if this will be shipped with Spark 2.0. If yes, we should update user guide accordingly. |
| /** | ||
| * Locale for doing a case sensitive comparison | ||
| * Default: English locale ("en") | ||
| * @see [[http://www.localeplanet.com/java/]] |
There was a problem hiding this comment.
Please link to the official Java doc: https://docs.oracle.com/javase/8/docs/api/java/util/Locale.html or the Locale class.
|
@burakkose is this something you are still working on? If so can you update it to master and look at @mengxr's comments - if not interested in working on it anymore no worries. |
|
Can one of the admins verify this patch? |
Closes apache#12968 Closes apache#16215 Closes apache#16212 Closes apache#16086 Closes apache#15713 Closes apache#16413 Closes apache#16396
Closes apache#12968 Closes apache#16215 Closes apache#16212 Closes apache#16086 Closes apache#15713 Closes apache#16413 Closes apache#16396 Author: Sean Owen <sowen@cloudera.com> Closes apache#16447 from srowen/CloseStalePRs.
What changes were proposed in this pull request?
How was this patch tested?
unit tests