Skip to content

Commit 7ed1494

Browse files
authored
Merge pull request #1179 from JuGecko/patch-1
Update README: small correction of a linguistic error
2 parents aa89ef6 + 71b0f60 commit 7ed1494

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -80,7 +80,7 @@ following three tokens.
8080
> [Hello] [World] [.]
8181
8282
One observation is that the original input and tokenized sequence are **NOT
83-
reversibly convertible**. For instance, the information that is no space between
83+
reversibly convertible**. For instance, the information that there is no space between
8484
“World” and “.” is dropped from the tokenized sequence, since e.g., `Tokenize(“World.”) == Tokenize(“World .”)`
8585

8686
SentencePiece treats the input text just as a sequence of Unicode characters. Whitespace is also handled as a normal symbol. To handle the whitespace as a basic token explicitly, SentencePiece first escapes the whitespace with a meta symbol "▁" (U+2581) as follows.

0 commit comments

Comments
 (0)