Skip to content

Chore: Gilda annotation's throw error unless nltk.download is executed before invocation #851

@DnlRKorn

Description

@DnlRKorn

Our Gilda package requires that some nltk datasets be downloaded in order to properly execute. Our testing suite requires the following to be executed in order to not fail.
import nltk; nltk.download("stopwords"); nltk.download("punkt_tab")

The block of code which fails is the following:

def _gilda_annotate(self, text: str) -> Iterator[TextAnnotation]:
from gilda.ner import annotate
for match_text, match, start, end in annotate(text, grounder=self.grounder):
yield TextAnnotation(
subject_start=start,
subject_end=end,
subject_label=match_text,
object_id=match.term.get_curie(),
object_label=match.term.entry_name,
matches_whole_text=start == 0 and end == len(text),
)

Having to know ahead of time you need to invoke these downloads is a bit annoying, I expect it'd be better to just have these nltk.download commands run whenever creating a GildaImplementation class

Metadata

Metadata

Assignees

Labels

choreRoutine maintenance tasks that don't affect application behavior or functionality (e.g., dependency

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions