vector_norm and similarity value incorrect

Somehow vector_norm is incorrectly calculated.

```
import spacy
import numpy as np
nlp = spacy.load("en")
# using u"apples" just as an example
apples = nlp.vocab[u"apples"]
print apples.vector_norm
# prints 1.4142135381698608, or sqrt(2)
print np.sqrt(np.dot(apples.vector, apples.vector))
# prints 1.0
```

Then vector_norm is used in similarity, which always returns a value that is always half of the correct value.

```
def similarity(self, other):
    if self.vector_norm == 0 or other.vector_norm == 0:
        return 0.0
    return numpy.dot(self.vector, other.vector) / (self.vector_norm * other.vector_norm)
```

It is OK if the use case is to rank similarity scores for synonyms. But the cosine similarity score itself is incorrect.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

vector_norm and similarity value incorrect #522

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

vector_norm and similarity value incorrect #522

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions