Qwen3 next by grimoire · Pull Request #4039 · InternLM/lmdeploy

grimoire · 2025-10-14T10:48:43Z

Qwen3-next require kernels from:

https://github.com/Dao-AILab/causal-conv1d
https://github.com/fla-org/flash-linear-attention

We need env check for different model-device combinations.

windreamer

Do we need to consider to integrate ssm cache pool in PD migration request?

lvhan028 · 2025-11-01T14:36:04Z

opencompass evaluation failed

2025-11-01 22:35:12,565 - lmdeploy - ERROR - engine.py:1234 - exception happened: <class 'IndexError'> index 1032 is out of bounds for axis 0 with size 1032
Traceback (most recent call last):
  File "/nvme1/lvhan/lmdeploy/lmdeploy/pytorch/engine/engine.py", line 1229, in async_loop
    await self._async_loop_main(resp_que=resp_que,
  File "/nvme1/lvhan/lmdeploy/lmdeploy/pytorch/engine/engine.py", line 1128, in _async_loop_main
    forward_inputs, next_running = await inputs_maker.prefetch_next_inputs()
                                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/nvme1/lvhan/lmdeploy/lmdeploy/pytorch/engine/engine.py", line 320, in prefetch_next_inputs
    return await self._send_next_inputs_impl(prefill, True)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/nvme1/lvhan/lmdeploy/lmdeploy/pytorch/engine/engine.py", line 285, in _send_next_inputs_impl
    forward_inputs = self._make_forward_inputs(prefill, enable_empty)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/nvme1/lvhan/lmdeploy/lmdeploy/pytorch/engine/engine.py", line 227, in _make_forward_inputs
    return self.engine._make_forward_inputs(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/nvme1/lvhan/lmdeploy/lmdeploy/pytorch/engine/engine.py", line 897, in _make_forward_inputs
    scheduler_output = scheduler.schedule(is_prefill=prefill, prealloc_size=prealloc_size)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/nvme1/lvhan/lmdeploy/lmdeploy/pytorch/paging/scheduler.py", line 299, in schedule
    output = self._schedule_prefill(0)
             ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/nvme1/lvhan/lmdeploy/lmdeploy/utils.py", line 271, in __func_warpper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/nvme1/lvhan/lmdeploy/lmdeploy/pytorch/paging/scheduler.py", line 232, in _schedule_prefill
    if not __evict_for_seq(seq, waiting):
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/nvme1/lvhan/lmdeploy/lmdeploy/pytorch/paging/scheduler.py", line 213, in __evict_for_seq
    return eviction_helper.evict_for_seq(seq, evictable, prealloc_size)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/nvme1/lvhan/lmdeploy/lmdeploy/pytorch/paging/eviction_helper/recompute_eviction_helper.py", line 74, in _evict_for_ssm
    state_manager.free(evict_seq)
  File "/nvme1/lvhan/lmdeploy/lmdeploy/pytorch/paging/state_manager.py", line 55, in free
    self.allocator.free(seq.logical_state)
  File "/nvme1/lvhan/lmdeploy/lmdeploy/pytorch/paging/state_manager.py", line 29, in free
    self._free_states[num_used] = state_id
    ~~~~~~~~~~~~~~~~~^^^^^^^^^^
IndexError: index 1032 is out of bounds for axis 0 with size 1032

serving:

 lmdeploy serve api_server Qwen/Qwen3-Next-80B-A3B-Instruct --tp 4

grimoire · 2025-11-02T11:17:30Z

Fixed

lvhan028 · 2025-11-03T08:22:21Z

The OC evaluation test failed due to an issue where quite a few prompts from aime2025 dataset became trapped in repetition.

grimoire · 2025-11-03T12:40:17Z

Fixed

lvhan028 · 2025-11-05T04:18:00Z

mmlu_pro got lower accuracy than opencompass academic leaderboard report

dataset                       version    metric                        mode    qwe3-next-instruct
----------------------------  ---------  ----------------------------  ------  --------------------
core_average                  -          -                             -       -
                              -          -                             -       -
Instruction Following         -          -                             -       -
IFEval                        353ae7     Prompt-level-strict-accuracy  gen     80.22
                              -          -                             -       -
General Reasoning             -          -                             -       -
hle_llmjudge                  6ff468     accuracy                      gen     8.90
GPQA_diamond_repeat_4         772ea0     accuracy (4 runs average)     gen     74.49
                              -          -                             -       -
Math Calculation              -          -                             -       -
aime2025_repeat_32            5e9f4f     accuracy (32 runs average)    gen     70.00
                              -          -                             -       -
Knowledge                     -          -                             -       -
mmlu_pro                      -          naive_average                 gen     79.51
                              -          -                             -       -
Code                          -          -                             -       -
lcb_code_generation_repeat_6  -          -                             -       -

from https://rank.opencompass.org.cn/leaderboard-llm-academic/?m=REALTIME
Qwen3-Next-80B-A3B-Instruct: IFEval(87.6), mmlu_pro(81.3), GPQA_diamond(74.1), HLE(8), aime2025(69.2),

lvhan028 · 2025-11-07T03:49:08Z

dataset                       version    metric                        mode    qwe3-next-instruct
----------------------------  ---------  ----------------------------  ------  --------------------
core_average                  -          naive_average                 gen     62.64
                              -          -                             -       -
Instruction Following         -          -                             -       -
IFEval                        353ae7     Prompt-level-strict-accuracy  gen     87.43
                              -          -                             -       -
General Reasoning             -          -                             -       -
hle_llmjudge                  6ff468     accuracy                      gen     9.04
GPQA_diamond_repeat_4         772ea0     accuracy (4 runs average)     gen     73.36
                              -          -                             -       -
Math Calculation              -          -                             -       -
aime2025_repeat_32            5e9f4f     accuracy (32 runs average)    gen     69.38
                              -          -                             -       -
Knowledge                     -          -                             -       -
mmlu_pro                      -          naive_average                 gen     81.19
                              -          -                             -       -
Code                          -          -                             -       -
lcb_code_generation_repeat_6  b5b6c5     pass@1 (6 runs average)       gen     55.43

lvhan028 · 2025-11-07T05:19:46Z

cc @zhulinJulia24

grimoire added 8 commits September 18, 2025 15:39

WIP

bbe8729

merge main

d00c39d

Merge branch 'main' into qwen3-next

b935db6

wip

4549a50

WIP

290f73d

first

1996e9a

fix chat

e048e19

add env check

111c65d

lvhan028 requested a review from windreamer October 15, 2025 07:47

lvhan028 added the enhancement New feature or request label Oct 15, 2025

windreamer reviewed Oct 15, 2025

View reviewed changes

grimoire added 4 commits October 16, 2025 16:54

add comment

5147183

fix pad

822aa84

cudagraph

e1e9856

merge main

fb448f8

windreamer approved these changes Oct 30, 2025

View reviewed changes

lvhan028 self-requested a review November 1, 2025 14:36

fix free seq with state

e562146

grimoire added 2 commits November 3, 2025 20:29

init cache

688d545

fix

42873f5

grimoire added 2 commits November 4, 2025 11:14

mem pool

7d346be

fix state allocate

e27b65c

grimoire added 3 commits November 5, 2025 20:56

add skip warmup flag

1d905c9

fix graph capture

b1bb56d

update conv state

e4d53b6

lvhan028 approved these changes Nov 7, 2025

View reviewed changes

lvhan028 merged commit 281e101 into InternLM:main Nov 7, 2025
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Qwen3 next#4039

Qwen3 next#4039
lvhan028 merged 20 commits intoInternLM:mainfrom
grimoire:qwen3-next

grimoire commented Oct 14, 2025

Uh oh!

windreamer left a comment

Uh oh!

lvhan028 commented Nov 1, 2025 •

edited

Loading

Uh oh!

grimoire commented Nov 2, 2025

Uh oh!

lvhan028 commented Nov 3, 2025

Uh oh!

grimoire commented Nov 3, 2025

Uh oh!

lvhan028 commented Nov 5, 2025 •

edited

Loading

Uh oh!

lvhan028 commented Nov 7, 2025

Uh oh!

Uh oh!

lvhan028 commented Nov 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

grimoire commented Oct 14, 2025

Uh oh!

windreamer left a comment

Choose a reason for hiding this comment

Uh oh!

lvhan028 commented Nov 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

grimoire commented Nov 2, 2025

Uh oh!

lvhan028 commented Nov 3, 2025

Uh oh!

grimoire commented Nov 3, 2025

Uh oh!

lvhan028 commented Nov 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lvhan028 commented Nov 7, 2025

Uh oh!

Uh oh!

lvhan028 commented Nov 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

lvhan028 commented Nov 1, 2025 •

edited

Loading

lvhan028 commented Nov 5, 2025 •

edited

Loading