Skip to content

fix inference crashed on v100 with qwen3.5-0.8b#4420

Merged
lvhan028 merged 1 commit intoInternLM:mainfrom
lvhan028:fix-v100
Mar 18, 2026
Merged

fix inference crashed on v100 with qwen3.5-0.8b#4420
lvhan028 merged 1 commit intoInternLM:mainfrom
lvhan028:fix-v100

Conversation

@lvhan028
Copy link
Copy Markdown
Collaborator

No description provided.

Copilot AI review requested due to automatic review settings March 17, 2026 10:56
@lvhan028 lvhan028 requested a review from lzhangzz March 17, 2026 10:57
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses a V100 (SM70) decoding-time crash observed with Qwen3.5-0.8B by reducing resource usage in the SM70 HeadDim=256 decoding kernel and by improving Qwen3.5 weight export handling when embeddings are tied.

Changes:

  • Tune SM70 HeadDim=256 decoding kernel parameters (CTA_S and staging) to reduce shared-memory usage and avoid runtime launch failures.
  • For Qwen3.5 export, honor tie_word_embeddings by mapping the output head weight to the token embedding weight.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
src/turbomind/kernels/attention/kernel/decoding_sm70_256.cu Lowers SM70 decoding kernel tile size / stages to reduce shared-memory footprint and prevent V100 launch aborts.
lmdeploy/turbomind/deploy/source_model/qwen.py Sets output_weight_key to embeddings when tie_word_embeddings is enabled for Qwen3.5 export.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

@@ -12,9 +12,9 @@
namespace turbomind::attention {

constexpr int kHeadDim = 256;
self.tok_embeddings_key = 'model.language_model.embed_tokens.weight'
self.norm_weight_key = 'model.language_model.norm.weight'

tie_word_embeddings = self.model_cfg.get('tie_word_embeddings', False)
@lvhan028 lvhan028 merged commit a30b976 into InternLM:main Mar 18, 2026
10 of 13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants