Skip to content

return the last token's logprobs, logits and last_hidden_states if include_stop_str_in_output is requested#4000

Merged
lvhan028 merged 1 commit intoInternLM:mainfrom
lvhan028:fix-include-stop-str
Sep 22, 2025
Merged

return the last token's logprobs, logits and last_hidden_states if include_stop_str_in_output is requested#4000
lvhan028 merged 1 commit intoInternLM:mainfrom
lvhan028:fix-include-stop-str

Conversation

@lvhan028
Copy link
Copy Markdown
Collaborator

No description provided.

@lvhan028
Copy link
Copy Markdown
Collaborator Author

lmdeploy serve api_server Qwen/Qwen3-8B --backend pytorch --logprobs-mode raw_logprobs
from openai import OpenAI
client = OpenAI(api_key='11', base_url='http://0.0.0.0:23333/v1/')
model_name = client.models.list().data[0].id

response = client.chat.completions.create(
    model=model_name,
    messages=[{
        'role': 'user',
        'content': "Hello!",
    }],
    temperature=0.8,
    top_p=0.8,
    logprobs=True,
    top_logprobs=1,
    stream=False,
    extra_body={
        "include_stop_str_in_output": True,
        "return_token_ids": True,
    })
logprobs = []
for item in response.choices[0].logprobs.content:
    logprobs.append(item.logprob)
print(len(logprobs), logprobs)
print(len(response.choices[0].message.gen_tokens), response.choices[0].message.gen_tokens)
print(response)

The logprobs of <|im_end|> is supposed to be output

@lvhan028 lvhan028 merged commit 67f8eda into InternLM:main Sep 22, 2025
5 checks passed
irexyc pushed a commit to irexyc/lmdeploy that referenced this pull request Sep 23, 2025
irexyc pushed a commit to irexyc/lmdeploy that referenced this pull request Sep 23, 2025
lvhan028 added a commit that referenced this pull request Nov 19, 2025
* use driver flag

* update

* accurate mask iter

* use fast divmod

* remove cp_O

* remove unused

* return the last token's logprobs if include_stop_str_in_output is requested (#4000)

* [Fix] device args in chat cli when using pytorch engine (#3999)

* [Fix] device args in chat cli when using pytorch engine

* [Fix] change device into device_type in chat cli

* fix NULL raw data

* add attn_cp_size to cli

* build cutlass::FastDivmod on host

* use single buffer

* udpate comm

* use two stage reduce

* remove unused

* better AllreduceResidualRMSnorm

* fix max_session_len

* update docs

* fix embedding/lm_head split

* use same split_k on different cp_rank

* always use seperate reduce for cp

* add cp configuration parameter

* remove redundant parameters

* remove redundant parameters

* fix build

* fix xgrammar build

* update docs

* remove unused

* fix test_attention

* unify attn split_k reduction w/ w/o cp

* fix nccl found

* update reduce

* fix windows build

* remove print

* revert is_driver_

* prevent create new allocator

* use Store to write partial_ML

* use expressive names

* use cdiv

* remove separate_reduce

* apply attention sink on cp_rank0

* move cp_utils.* to kernels/attention

* update cli description

---------

Co-authored-by: Lyu Han <lvhan_028@163.com>
Co-authored-by: CyCle1024 <chenchiyu@pjlab.org.cn>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant