Conversation
|
Can you share how you're using this? We also plan to deprecate all of blevex soon as well because no one is maintaining it. As for |
|
I don't actually use this feature (detect_lang_filter), but I am actively trying to support Blevex. |
|
BTW, you mentioned deprecating all of blevex soon, Japanese tokenizer will also be deprecated? |
|
So, unless we can better understand that the detect_lang filter has some actual use, I would prefer to get rid of it, rather than change which library it uses. The Japanese tokenizer is the only thing in blevex that I think makes sense to save. Most likely it would move to be it's own top-level module. Do you think there is anything else of value in blevex? |
|
Yes, I think you're right about Basically, I'd like to keep the language analysis modules. For example, |
|
Does the icu tokenizer still work? Does it work with recent version of icu or some specific old ones? It hasn't been touched for 5 years, and it was difficult to get working back then, so I'm surprised if it does. I believe all the languages supported by libstemmer (using cgo) are also supported by our pure Go snowball stemmers: https://github.com/blevesearch/snowballstem The only 2 languages not covered there are Japanese, which we plan to continue supporting, and Thai, which uses a dictionary based tokenizer as part of ICU. So it seems like Thai is the only language we would lose support for. Are you aware of any alternative tokenizers for Thai? |
|
I was not aware of the existence of snowballsrem. With this, I don't need to use libstemmer. Thank you for letting me know! How about this for Thai tokenizer? |
I would like to replace cld2 to whatlanggo as it seems to be archived and not maintained.
What do you think about this?