fix qwen3-vl-moe long context#4342
Conversation
There was a problem hiding this comment.
Pull request overview
This PR aims to fix long-context generation behavior by ensuring updated per-step model_metas (e.g., sequence length tracking) are correctly propagated through the PyTorch engine after a forward pass.
Changes:
- Add a
context.is_model_meta_updatedflag inInternVLpaths that (re)initializecontext.model_metas. - Change the engine’s
model_forwardlogic to only takemodel_metasfromcontextwhenis_model_meta_updatedis set.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
lmdeploy/pytorch/models/internvl.py |
Marks context.model_metas as “updated” in several InternVL Flash/long-context paths. |
lmdeploy/pytorch/engine/model_agent/agent.py |
Alters how model_forward decides whether to return model_metas from update_model_metas() vs context.model_metas. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| prev_lens = [meta.get('new_seqlen', 0) for meta in context.model_metas] | ||
|
|
||
| for idx, meta in enumerate(context.model_metas): | ||
| meta.update({'new_seqlen': prev_lens[idx] + new_seqlens[idx]}) |
There was a problem hiding this comment.
context.is_model_meta_updated is only set when initializing context.model_metas, but in the has_model_metas branch you mutate context.model_metas in-place (meta.update(...)) without setting the flag. With the updated model_forward logic (agent now relies on is_model_meta_updated), these in-place updates won’t be propagated to output['model_metas'], which can break long-context segmented prefill where new_seqlen must carry across segments. Set context.is_model_meta_updated = True whenever new_seqlen is changed, including the in-place update path.
| meta.update({'new_seqlen': prev_lens[idx] + new_seqlens[idx]}) | |
| meta.update({'new_seqlen': prev_lens[idx] + new_seqlens[idx]}) | |
| context.is_model_meta_updated = True |
lmdeploy/pytorch/models/internvl.py
Outdated
| else: | ||
| # init model metas | ||
| context.model_metas = [{'new_seqlen': seqlen} for seqlen in seq_lengths.tolist()] | ||
| context.is_model_meta_updated = True |
There was a problem hiding this comment.
In the long-context prefill path, context.model_metas is updated in the has_model_metas branch above via in-place meta.update(...), but context.is_model_meta_updated is only set when initializing model metas here. With model_forward now keying off is_model_meta_updated, the updated new_seqlen may not be returned to the scheduler on subsequent segments. Set the flag when new_seqlen is updated in the has_model_metas branch as well (or otherwise ensure updated metas are returned).
| if context.model_metas is not None and context.model_metas[0] is not None: | ||
| # if context.model_metas is not None and context.model_metas[0] is not None: | ||
| if getattr(context, 'is_model_meta_updated', False): | ||
| model_metas = context.model_metas |
There was a problem hiding this comment.
Switching the post-forward override to only use context.model_metas when context.is_model_meta_updated is true can drop model_metas for models that don’t implement update_model_metas (it returns None) and rely on context.model_metas being carried through the engine. Consider adding a safe fallback such as using context.model_metas when model_metas is None and context.model_metas is present/non-empty, in addition to (or instead of) the flag, to avoid regressions.
| model_metas = context.model_metas | |
| model_metas = context.model_metas | |
| elif model_metas is None: | |
| # Fallback: if update_model_metas returned None but context has model_metas, | |
| # reuse context.model_metas to avoid dropping metadata for models | |
| # that rely on it being carried through the engine. | |
| ctx_model_metas = getattr(context, 'model_metas', None) | |
| if ctx_model_metas: | |
| model_metas = ctx_model_metas |
| # InternVL-3.5-Flash will change the seqlen, model_metas during forward | ||
| if context.model_metas is not None and context.model_metas[0] is not None: | ||
| # if context.model_metas is not None and context.model_metas[0] is not None: | ||
| if getattr(context, 'is_model_meta_updated', False): |
There was a problem hiding this comment.
The commented-out if context.model_metas ... line is dead code now and makes the control flow harder to read. Consider removing it and updating the surrounding comment to document the new intended behavior (when context.model_metas should override update_model_metas).
| # InternVL-3.5-Flash will change the seqlen, model_metas during forward | ||
| if context.model_metas is not None and context.model_metas[0] is not None: | ||
| # if context.model_metas is not None and context.model_metas[0] is not None: | ||
| if getattr(context, 'is_model_meta_updated', False): | ||
| model_metas = context.model_metas | ||
| output['model_metas'] = model_metas |
There was a problem hiding this comment.
The PR description is still the default template and doesn’t explain the motivation/repro for the “qwen3-vl-moe long context” issue or how these model_metas changes address it. Please add a brief reproduction scenario and expected/actual behavior so reviewers can validate the fix and assess risk.
| output = dict(hidden_states=output) | ||
| # InternVL-3.5-Flash will change the seqlen, model_metas during forward | ||
| if context.model_metas is not None and context.model_metas[0] is not None: | ||
| # if context.model_metas is not None and context.model_metas[0] is not None: |
There was a problem hiding this comment.
May delete this line. Tested on qwen3vl and internvl-3.5-flash with a large image context, functionality looks good.
Thanks for your contribution and we appreciate it a lot. The following instructions would make your pull request more healthy and more easily receiving feedbacks. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers.
Motivation
Please describe the motivation of this PR and the goal you want to achieve through this PR.
Modification
Please briefly describe what modification is made in this PR.
BC-breaking (Optional)
Does the modification introduce changes that break the backward-compatibility of the downstream repositories?
If so, please describe how it breaks the compatibility and how the downstream projects should modify their code to keep compatibility with this PR.
Use cases (Optional)
If this PR introduces a new feature, it is better to list some use cases here, and update the documentation.
Checklist