Note: This issue was copied from ggml-org#9291
Original Author: @ggerganov
Original Issue Number: ggml-org#9291
Created: 2024-09-03T06:56:11Z
Overview
This is a list of changes to the public HTTP interface of the llama-server example. Collaborators are encouraged to edit this post in order to reflect important changes to the API that end up merged into the master branch.
If you are building a 3rd party project that relies on llama-server, it is recommended to follow this issue and check it carefully before upgrading to new versions.
See also:
Recent API changes (most recent at the top)
| version |
PR |
desc |
| b6523 |
ggml-org#16109 |
In stream mode, error events are now OAI-compatible |
| b6508 |
ggml-org#16052 |
include usage statistics only when stream_options.include_usage is specified |
| b6399 |
ggml-org#15827 |
added return_progress and timings.cache_n |
| b6243 |
ggml-org#15108 |
Add multimodal support to completions and embeddings endpoints |
| b6205 |
ggml-org#15416 |
Disable context shift by default |
| b5441 |
ggml-org#13660 |
Remove /metrics fields related to KV cache tokens and cells` |
| b5441 |
ggml-org#13660 |
Remove /metrics fields related to KV cache tokens and cells` |
| b5223 |
ggml-org#13174 |
For chat competion, if last message is assistant, it will be a prefilled message |
| b4599 |
ggml-org#9639 |
/v1/chat/completions now supports tools & tool_choice |
| TBD. |
ggml-org#10974 |
/v1/completions is now OAI-compat |
| TBD. |
ggml-org#10783 |
logprobs is now OAI-compat, default to pre-sampling probs |
| TBD. |
ggml-org#10861 |
/embeddings supports pooling type none |
| TBD. |
ggml-org#10853 |
Add optional "tokens" output to /completions endpoint |
| b4337 |
ggml-org#10803 |
Remove penalize_nl |
| b4265 |
ggml-org#10626 |
CPU docker images working directory changed to /app |
| b4285 |
ggml-org#10691 |
(Again) Change /slots and /props responses |
| b4283 |
ggml-org#10704 |
Change /slots and /props responses |
| b4027 |
ggml-org#10162 |
/slots endpoint: remove slot[i].state, add slot[i].is_processing |
| b3912 |
ggml-org#9865 |
Add option to time limit the generation phase |
| b3911 |
ggml-org#9860 |
Remove self-extend support |
| b3910 |
ggml-org#9857 |
Remove legacy system prompt support |
| b3897 |
ggml-org#9776 |
Change default security settings, /slots is now disabled by default Endpoints now check for API key if it's set |
| b3887 |
ggml-org#9510 |
Add /rerank endpoint |
| b3754 |
ggml-org#9459 |
Add [DONE]\n\n in OAI stream response to match spec |
| b3721 |
ggml-org#9398 |
Add seed_cur to completion response |
| b3683 |
ggml-org#9308 |
Environment variable updated |
| b3599 |
ggml-org#9056 |
Change /health and /slots |
For older changes, use:
git log --oneline -p b3599 -- examples/server/README.md
Upcoming API changes
Note: This issue was copied from ggml-org#9291
Original Author: @ggerganov
Original Issue Number: ggml-org#9291
Created: 2024-09-03T06:56:11Z
Overview
This is a list of changes to the public HTTP interface of the
llama-serverexample. Collaborators are encouraged to edit this post in order to reflect important changes to the API that end up merged into themasterbranch.If you are building a 3rd party project that relies on
llama-server, it is recommended to follow this issue and check it carefully before upgrading to new versions.See also:
libllamaAPIRecent API changes (most recent at the top)
stream_options.include_usageis specifiedreturn_progressandtimings.cache_ncompletionsandembeddingsendpoints/metricsfields related to KV cache tokens and cells`/metricsfields related to KV cache tokens and cells`/v1/chat/completionsnow supportstools&tool_choice/v1/completionsis now OAI-compatlogprobsis now OAI-compat, default to pre-sampling probs/embeddingssupports pooling typenone"tokens"output to/completionsendpointpenalize_nl/slotsand/propsresponses/slotsand/propsresponses/slotsendpoint: removeslot[i].state, addslot[i].is_processing/slotsis now disabled by defaultEndpoints now check for API key if it's set
/rerankendpoint[DONE]\n\nin OAI stream response to match specseed_curto completion response/healthand/slotsFor older changes, use:
Upcoming API changes