Fix llama tokenizer padding_side when using model.generate in inference mode #3644
+15
−5
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
When using unsloth for batch inference, the
model.generate()changes thetokenizer.padding_sidefrom left to right. This causes an issue when using the tokenizer to decode the response.Root Cause
I debugged this and found that the padding_side changes when
FastLlamaModel.for_training(self)is called after generation and it sets the padding_side to right.Changes
To fix the issue, I made code changes to only call
FastLlamaModel.for_training(self)if the model was originally in training mode before generation. If the model was already in inference mode, then it should remain in inference mode.Related Issues
Fixes #2217
Fixes #3283
Testing
I tested the use cases reported in both of these issues with my branch to verify the fix. For issue #2217, the output now looks like:
And for issue #3283, the test script output now looks like: