LLaMA FastTokenizer does not add `eos_token_id` at the end.

### System Info

- `transformers` version: 4.29.0.dev0
- Platform: Linux-4.18.0-305.19.1.el8_4.x86_64-x86_64-with-glibc2.28
- Python version: 3.9.7
- Huggingface_hub version: 0.13.3
- Safetensors version: 0.3.0
- PyTorch version (GPU?): 2.1.0.dev20230411+cu117 (True)
- Tensorflow version (GPU?): not installed (NA)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Using GPU in script?: No
- Using distributed or parallel set-up in script?: No

### Who can help?

@ArthurZucker 

### Information

- [ ] The official example scripts
- [X] My own modified scripts

### Tasks

- [ ] An officially supported task in the `examples` folder (such as GLUE/SQuAD, ...)
- [X] My own task or dataset (give details below)

### Reproduction

As mentioned on the title, the LLaMA tokenizer does not add the `eos_token` at the end of the inputs. This only happens on the fast version (`use_fast=True`).

Steps to reproduce the behaviour:

1. Load the LLaMA tokenizer
```python
tokenizer = AutoTokenizer.from_pretrained(LLAMA_PATH, add_eos_token=True, use_fast=True)
```
2. Tokenize something
```python
simple_sentence = "This is a sentence to test if the tokenizer adds eos token."
simple_sentence_ids = tokenizer(
    simple_sentence, add_special_tokens=True
).input_ids
```
3. Print the `input_ids` to check if the `eos_token_id` (`2`) is added at the end.
```python
print(simple_sentence_ids)
```
4. Output:
```python
[1, 910, 338, 263, 10541, 304, 1243, 565, 278, 5993, 3950, 12778, 321, 359, 5993, 29889]
```

### Expected behavior

Expected output
```python
[1, 910, 338, 263, 10541, 304, 1243, 565, 278, 5993, 3950, 12778, 321, 359, 5993, 29889, 2]
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LLaMA FastTokenizer does not add `eos_token_id` at the end. #22794

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

LLaMA FastTokenizer does not add eos_token_id at the end. #22794

Description

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

LLaMA FastTokenizer does not add `eos_token_id` at the end. #22794