Support defaulting to infinity or -1 for completions#111
Support defaulting to infinity or -1 for completions#111abetlen merged 5 commits intoabetlen:mainfrom
Conversation
8c93cf8 to
cc0fe43
Compare
| stream: bool = False | ||
| stop: List[str] = [] | ||
| max_tokens: int = 128 | ||
| max_tokens: Optional[Union[int, None]] = -1 |
There was a problem hiding this comment.
The type signature for max_tokens: Optional[Union[int, None]] doesn't make any sense.
Optional[T] is equivalent/alias of Union[T, None]
|
Hey guys, seems like there's a related error in #983 where if you don't specify the |
|
@K-Mistele yup, I added a few commits to hopefully fix that in the server error handler. @swg thanks for the contribution! Sorry it took so long to merge. |
|
With these fixes what is now the proper way to set an unlimited number of response tokens? |
|
@K-Mistele passing None / null for |
Hello,
As per llama.cpp help:
-n N, --n_predict N number of tokens to predict (default: 128, -1 = infinity)The OpenAI docs state that the /completions endpoint defaults to 16, but is optional:
max_tokens integer Optional Defaults to 16https://platform.openai.com/docs/api-reference/completions/create#completions/create-max_tokens
However, /chat/completions defaults to infinity, and is again optional:
max_tokens integer Optional Defaults to infhttps://platform.openai.com/docs/api-reference/chat/create#chat/create-max_tokens
This pull request modifies the fast_api example, the main server and supporting python lib to allow passing a 'null' which will default to infinity, mimicking the OpenAI API.