Skip to content

Indonesian stopwords file contains too many other words than stopwords #211

@luthfianto

Description

@luthfianto

I'm aware that there is no single universal list for stopwords. But the current stopwords-id.txt (currently 1309 sloc) contains too many words that aren't stop words (eg: not function words). It's more an "Indonesian word list" rather than "Indonesian stop words".

I think It's safer to use other existing stop words list and then improve it incrementally.

References:

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions