return the last token's logprobs, logits and last_hidden_states if include_stop_str_in_output is requested by lvhan028 · Pull Request #4000 · InternLM/lmdeploy

lvhan028 · 2025-09-22T09:28:56Z

No description provided.

…uested

lvhan028 · 2025-09-22T09:40:08Z

lmdeploy serve api_server Qwen/Qwen3-8B --backend pytorch --logprobs-mode raw_logprobs

from openai import OpenAI
client = OpenAI(api_key='11', base_url='http://0.0.0.0:23333/v1/')
model_name = client.models.list().data[0].id

response = client.chat.completions.create(
    model=model_name,
    messages=[{
        'role': 'user',
        'content': "Hello!",
    }],
    temperature=0.8,
    top_p=0.8,
    logprobs=True,
    top_logprobs=1,
    stream=False,
    extra_body={
        "include_stop_str_in_output": True,
        "return_token_ids": True,
    })
logprobs = []
for item in response.choices[0].logprobs.content:
    logprobs.append(item.logprob)
print(len(logprobs), logprobs)
print(len(response.choices[0].message.gen_tokens), response.choices[0].message.gen_tokens)
print(response)

The logprobs of <|im_end|> is supposed to be output

…uested (InternLM#4000)

* use driver flag * update * accurate mask iter * use fast divmod * remove cp_O * remove unused * return the last token's logprobs if include_stop_str_in_output is requested (#4000) * [Fix] device args in chat cli when using pytorch engine (#3999) * [Fix] device args in chat cli when using pytorch engine * [Fix] change device into device_type in chat cli * fix NULL raw data * add attn_cp_size to cli * build cutlass::FastDivmod on host * use single buffer * udpate comm * use two stage reduce * remove unused * better AllreduceResidualRMSnorm * fix max_session_len * update docs * fix embedding/lm_head split * use same split_k on different cp_rank * always use seperate reduce for cp * add cp configuration parameter * remove redundant parameters * remove redundant parameters * fix build * fix xgrammar build * update docs * remove unused * fix test_attention * unify attn split_k reduction w/ w/o cp * fix nccl found * update reduce * fix windows build * remove print * revert is_driver_ * prevent create new allocator * use Store to write partial_ML * use expressive names * use cdiv * remove separate_reduce * apply attention sink on cp_rank0 * move cp_utils.* to kernels/attention * update cli description --------- Co-authored-by: Lyu Han <lvhan_028@163.com> Co-authored-by: CyCle1024 <chenchiyu@pjlab.org.cn>

return the last token's logprobs if include_stop_str_in_output is req…

d793629

…uested

lvhan028 added Bug:P0 Bug:P1 and removed Bug:P0 labels Sep 22, 2025

lvhan028 requested review from lzhangzz and zhulinJulia24 September 22, 2025 09:30

lvhan028 merged commit 67f8eda into InternLM:main Sep 22, 2025
5 checks passed

irexyc pushed a commit to irexyc/lmdeploy that referenced this pull request Sep 23, 2025

return the last token's logprobs if include_stop_str_in_output is req…

1f75dd6

…uested (InternLM#4000)

irexyc pushed a commit to irexyc/lmdeploy that referenced this pull request Sep 23, 2025

return the last token's logprobs if include_stop_str_in_output is req…

c30195d

…uested (InternLM#4000)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

return the last token's logprobs, logits and last_hidden_states if include_stop_str_in_output is requested#4000

return the last token's logprobs, logits and last_hidden_states if include_stop_str_in_output is requested#4000
lvhan028 merged 1 commit intoInternLM:mainfrom
lvhan028:fix-include-stop-str

lvhan028 commented Sep 22, 2025

Uh oh!

lvhan028 commented Sep 22, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

lvhan028 commented Sep 22, 2025

Uh oh!

lvhan028 commented Sep 22, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant