fix: correct weight loading prefix mapping for Qwen3-VL#18024
fix: correct weight loading prefix mapping for Qwen3-VL#18024Kangyan-Zhou merged 3 commits intosgl-project:mainfrom
Conversation
|
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
|
It seems not a valid fix, my test still shows garbage output. |
400c549 to
ce0e92f
Compare
…to lm_head The weight loading code unconditionally copied embed_tokens.weight to lm_head.weight, which is incorrect for models with tie_word_embeddings=False (e.g. Qwen3-VL-8B). This caused garbage output from the 8B model. Add a check for self.config.tie_word_embeddings to ensure embed_tokens is only copied to lm_head when they are supposed to share weights. Fixes sgl-project#17887
…to lm_head The weight loading code unconditionally copied embed_tokens.weight to lm_head.weight, which is incorrect for models with tie_word_embeddings=False (e.g. Qwen3-VL-8B). This caused garbage output from the 8B model. Add a check for self.config.tie_word_embeddings to ensure embed_tokens is only copied to lm_head when they are supposed to share weights. Fixes sgl-project#17887
ce0e92f to
670aec5
Compare
|
/tag-and-rerun-ci |
|
@JustinTong0323 Thanks for testing! The previous fix was incorrect - I've updated the PR with the correct fix now. Root Cause ClarificationThe issue only affects models with
For models with For Qwen3-VL-8B ( Updated FixThe new fix adds a check for if (
self.pp_group.is_last_rank
and "model.embed_tokens.weight" in name
and self.config.tie_word_embeddings # <-- only copy when weights are shared
):Could you please test again with Qwen3-VL-8B specifically? The 2B/4B models should work fine with or without this fix. |
…18024) Co-authored-by: liuxiaoming <liuxiaoming@modelbest.cn> Co-authored-by: Xinyuan Tong <115166877+JustinTong0323@users.noreply.github.com>
…18024) Co-authored-by: liuxiaoming <liuxiaoming@modelbest.cn> Co-authored-by: Xinyuan Tong <115166877+JustinTong0323@users.noreply.github.com>
…18024) Co-authored-by: liuxiaoming <liuxiaoming@modelbest.cn> Co-authored-by: Xinyuan Tong <115166877+JustinTong0323@users.noreply.github.com>
Summary
Fix Qwen3-VL-8B model producing garbage output due to incorrect weight loading.
Fixes #17887
Problem
The weight loading code unconditionally copies
embed_tokens.weighttolm_head.weight:This is incorrect for models with
tie_word_embeddings=False(like Qwen3-VL-8B), wherelm_headhas independent weights that should not be overwritten.Fix
Add a check to only copy when
tie_word_embeddings=True: