Change tokenizer name to bpe_tokenizer and extract a base class#3009
Change tokenizer name to bpe_tokenizer and extract a base class#3009larryliu0820 wants to merge 1 commit intomainfrom
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/3009
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 4d1d502 with merge base 17c64a3 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
This pull request was exported from Phabricator. Differential Revision: D56052583 |
|
This pull request was exported from Phabricator. Differential Revision: D56052583 |
Summary: Pull Request resolved: #3009 We want to be able to support more than 1 implementation of tokenizer. Currently `tokenizer.cpp` is adopted from `llama2.c` but we also wanted to support `Tiktoken` (will be added in next PR). This PR extract out a base class `Tokenizer` and make it extendable by different implementations. Differential Revision: D56052583
d68a45b to
37368da
Compare
|
This pull request was exported from Phabricator. Differential Revision: D56052583 |
Summary: Pull Request resolved: #3009 We want to be able to support more than 1 implementation of tokenizer. Currently `tokenizer.cpp` is adopted from `llama2.c` but we also wanted to support `Tiktoken` (will be added in next PR). This PR extract out a base class `Tokenizer` and make it extendable by different implementations. Differential Revision: D56052583
37368da to
1a42760
Compare
|
This pull request was exported from Phabricator. Differential Revision: D56052583 |
Summary: Pull Request resolved: #3009 We want to be able to support more than 1 implementation of tokenizer. Currently `tokenizer.cpp` is adopted from `llama2.c` but we also wanted to support `Tiktoken` (will be added in next PR). This PR extract out a base class `Tokenizer` and make it extendable by different implementations. Differential Revision: D56052583
1a42760 to
fd635c5
Compare
|
This pull request was exported from Phabricator. Differential Revision: D56052583 |
Summary: Pull Request resolved: #3009 We want to be able to support more than 1 implementation of tokenizer. Currently `tokenizer.cpp` is adopted from `llama2.c` but we also wanted to support `Tiktoken` (will be added in next PR). This PR extract out a base class `Tokenizer` and make it extendable by different implementations. Differential Revision: D56052583
fd635c5 to
473f107
Compare
|
This pull request was exported from Phabricator. Differential Revision: D56052583 |
Summary: Pull Request resolved: #3009 We want to be able to support more than 1 implementation of tokenizer. Currently `tokenizer.cpp` is adopted from `llama2.c` but we also wanted to support `Tiktoken` (will be added in next PR). This PR extract out a base class `Tokenizer` and make it extendable by different implementations. Differential Revision: D56052583
473f107 to
0603bc5
Compare
|
This pull request was exported from Phabricator. Differential Revision: D56052583 |
Summary: Pull Request resolved: #3009 We want to be able to support more than 1 implementation of tokenizer. Currently `tokenizer.cpp` is adopted from `llama2.c` but we also wanted to support `Tiktoken` (will be added in next PR). This PR extract out a base class `Tokenizer` and make it extendable by different implementations. Reviewed By: mergennachin Differential Revision: D56052583
0603bc5 to
084fad1
Compare
Summary: Pull Request resolved: #3009 We want to be able to support more than 1 implementation of tokenizer. Currently `tokenizer.cpp` is adopted from `llama2.c` but we also wanted to support `Tiktoken` (will be added in next PR). This PR extract out a base class `Tokenizer` and make it extendable by different implementations. Reviewed By: mergennachin Differential Revision: D56052583
|
This pull request was exported from Phabricator. Differential Revision: D56052583 |
084fad1 to
4d1d502
Compare
|
This pull request has been merged in 21fdc4e. |
Summary:
We want to be able to support more than 1 implementation of tokenizer. Currently
tokenizer.cppis adopted fromllama2.cbut we also wanted to supportTiktoken(will be added in next PR).This PR extract out a base class
Tokenizerand make it extendable by different implementations.Differential Revision: D56052583