Somehow vector_norm is incorrectly calculated.
import spacy
import numpy as np
nlp = spacy.load("en")
# using u"apples" just as an example
apples = nlp.vocab[u"apples"]
print apples.vector_norm
# prints 1.4142135381698608, or sqrt(2)
print np.sqrt(np.dot(apples.vector, apples.vector))
# prints 1.0
Then vector_norm is used in similarity, which always returns a value that is always half of the correct value.
def similarity(self, other):
if self.vector_norm == 0 or other.vector_norm == 0:
return 0.0
return numpy.dot(self.vector, other.vector) / (self.vector_norm * other.vector_norm)
It is OK if the use case is to rank similarity scores for synonyms. But the cosine similarity score itself is incorrect.
Somehow vector_norm is incorrectly calculated.
Then vector_norm is used in similarity, which always returns a value that is always half of the correct value.
It is OK if the use case is to rank similarity scores for synonyms. But the cosine similarity score itself is incorrect.