I found strange behaviour using the pipe() method (only verified on german variant):
If you parse a document using pipe() you can get a ValueError, while if i use nlp(text) everything is fine. I boiled it down to single words, while german words work, english words like 'windows' don't work.
Steps to reproduce:
import spacy
nlp = spacy.load('de')
def texts():
yield "Windows"
for doc in nlp.pipe(texts(), n_threads=16, batch_size=1000):
print(len(doc)) # doc access -> ValueError
Trace
ValueError Traceback (most recent call last)
<ipython-input-2-9a095ec5505b> in <module>()
8 def texts():
9 yield "Windows"
---> 10 for doc in nlp.pipe(texts(), n_threads=16, batch_size=1000):
11 print(len(doc))
.../venv/lib/python3.4/site-packages/spacy/language.py in pipe(self, texts, tag, parse, entity, n_threads, batch_size)
254 stream = self.entity.pipe(stream,
255 n_threads=1, batch_size=batch_size)
--> 256 for doc in stream:
257 yield doc
258
ValueError: Error parsing doc: Windows
If you use nlp("Windows") it works fine. Also if you execute nlp("Windows") before the same pipe() call, pipe() does not raise an exception (a dictionary is built?)
Versions:
Python 3.4.3 (Problem not related to ipython)
spacy 0.101.0
Maybe this is related to this region syntax/parser.pyx
if not eg.is_valid[guess]:
# with gil:
# move_name = self.moves.move_name(action.move, action.label)
# print 'invalid action:', move_name
return 1
I found strange behaviour using the
pipe()method (only verified on german variant):If you parse a document using
pipe()you can get a ValueError, while if i usenlp(text)everything is fine. I boiled it down to single words, while german words work, english words like 'windows' don't work.Steps to reproduce:
Trace
If you use
nlp("Windows")it works fine. Also if you executenlp("Windows")before the samepipe()call,pipe()does not raise an exception (a dictionary is built?)Versions:
Maybe this is related to this region syntax/parser.pyx