[megatron] fix: Qwen3.5 LoRA & MTP support (with Megatron-Bridge)#5599
[megatron] fix: Qwen3.5 LoRA & MTP support (with Megatron-Bridge)#5599HollowMan6 wants to merge 3 commits intoverl-project:mainfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces significant improvements and fixes for Qwen3.5 LoRA support within the Megatron framework. The core change involves refactoring the weight synchronization logic by moving the .base_layer normalization from the sender (Megatron) to the receiver (vLLM). This makes the system more robust and less brittle, especially for newer models with packed projections and fused MoE modules. The PR also addresses a critical bug in asynchronous LoRA updates, ensuring that adapters split across multiple IPC buckets are applied correctly only after the final bucket is received. Additionally, it includes several compatibility fixes for MTP with nested Hugging Face text configurations and resolves an incorrect engine offload behavior. The changes are well-supported by new regression tests, enhancing the overall stability and maintainability of the codebase.
There was a problem hiding this comment.
Pull request overview
This PR hardens Megatron→vLLM weight synchronization (notably for Qwen3.5 + LoRA), fixes multi-bucket LoRA IPC updates, improves MTP config compatibility, and corrects actor engine offload behavior when param offload is disabled.
Changes:
- Move weight-name normalization (including
.base_layerand packed-module alias handling) into the vLLM receiver side and make it robust to fused MoE/packed projections. - Update bucketed IPC receiving to surface
is_lastand accumulate LoRA tensors across buckets before issuing a singleadd_lora. - Improve MTP compatibility for nested HF
text_configandmtp_num_hidden_layers, avoid incorrect CPU offload when param offload is disabled, and add regression tests.
Reviewed changes
Copilot reviewed 13 out of 13 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
verl/workers/rollout/vllm_rollout/utils.py |
Adds receiver-side weight-name normalization logic and multi-bucket LoRA accumulation before add_lora. |
verl/workers/rollout/vllm_rollout/bucketed_weight_transfer.py |
Extends per-bucket callback contract to include is_last. |
verl/workers/megatron_workers.py |
Updates MTP enable/disable logic to support nested text_config and newer MTP fields. |
verl/workers/fsdp_workers.py |
Mirrors MTP-disable behavior for nested text_config and mtp_num_hidden_layers. |
verl/workers/engine_workers.py |
Offloads actor engine to CPU only when param offload is enabled. |
verl/workers/engine/megatron/transformer_impl.py |
Removes sender-side .base_layer rewriting for non-merged LoRA sync. |
verl/utils/megatron_utils.py |
Updates MTP detection/disable logic for nested text_config and alternate MTP fields. |
verl/utils/megatron_peft_utils.py |
Replaces hard-coded stacked-param suffix logic with generic .base_layer name helpers. |
verl/utils/checkpoint/megatron_checkpoint_manager.py |
Preserves vision_config in transformer config backup state. |
tests/workers/test_engine_workers_update_weights.py |
Adds regression tests for param-offload-gated CPU offload behavior. |
tests/utils/test_vllm_weight_name_normalization_on_cpu.py |
Adds targeted tests for vLLM receiver normalization and multi-bucket LoRA accumulation. |
tests/utils/test_megatron_peft_utils.py |
Tests new .base_layer resolver utilities and GDN module mapping expansion. |
tests/utils/test_bucketed_weight_transfer.py |
Updates test callback signature to match new is_last contract. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
cc1964d to
38cc3e8
Compare
- move `.base_layer` normalization out of the Megatron sender and into the vLLM receiver - make vLLM weight-name normalization robust to packed modules and fused MoE logical aliases - fix bucketed LoRA IPC updates so multi-bucket adapters are applied only after the final bucket arrives - avoid incorrect engine offload behavior when param offload is disabled - improve MTP compatibility for nested HF text configs and newer Megatron-LM APIs - add targeted regression coverage for the new weight-sync and worker behaviors The previous weight sync path relied on sender-side name rewriting, including hard-coded `.base_layer` handling. That made the Megatron/vLLM boundary brittle for newer models, especially packed projections and fused MoE modules. In addition, the async LoRA update path assumed each adapter arrived in a single IPC bucket, which is not guaranteed by the bucketed transport. That could produce incomplete `add_lora` requests when LoRA tensors were split across multiple buckets. There were also a few compatibility issues around: - Qwen-style nested `text_config` MTP fields - newer Megatron-LM `process_mtp_loss` API - actor engine offload behavior when automatic reload is not enabled Instead of mutating exported Megatron parameter names on the sender side, the vLLM colocate worker now resolves incoming names against the live vLLM parameter namespace. This includes: - generic add/remove resolution for `.base_layer` leaf params (`weight` / `bias`) - stripping Bridge-inserted `.base_layer` from non-leaf fused-MoE logical aliases - packed-owner lookup for aliases such as `q_proj -> qkv_proj` This keeps the sender simpler and lets the receiver normalize names based on the actual loaded vLLM model structure. `BucketedWeightReceiver` now forwards `is_last` to the callback. The async rollout weight update path uses that signal to: - keep standard base-weight loading per bucket - accumulate LoRA tensors across buckets - call `add_lora` only once, after the final bucket is received This aligns VERL's bucketed transport semantics with vLLM's expectation that one `add_lora` request contains a complete adapter tensor dict. The old hard-coded stacked-parameter suffix list was removed. `megatron_peft_utils` now exposes generic helpers to: - add `.base_layer` - remove `.base_layer` - resolve the correct name by probing the target namespace The Megatron-to-HF module mapping was also expanded for GDN modules, including: - `in_proj -> [in_proj_qkv, in_proj_z, in_proj_b, in_proj_a]` - `out_proj -> [out_proj]` MTP checks and disabling logic now work with nested HF text configs as well as configs that use `mtp_num_hidden_layers` instead of only `num_nextn_predict_layers`. The Megatron MTP patch also prefers the newer upstream `process_mtp_loss` helper when available, while preserving a fallback path for older Megatron-LM versions. - `ActorRolloutRefWorker.update_weights` now only offloads the actor engine back to CPU when param offload is actually enabled. - `vision_config` is now preserved in Megatron checkpoint manager backup state. Signed-off-by: Hollow Man <hollowman@opensuse.org>
Signed-off-by: Hollow Man <hollowman@opensuse.org>
a323007 to
192fe74
Compare
What does this PR do?
Need the following PRs:
[megatron] fix: MTP patch for newer mcore #5587
[Bugfix][LoRA] Fix Qwen35 LoRA vllm-project/vllm#36976
fix: generalize LoRA layer handling for N-way fused projections vllm-project/vllm#37019
[Bugfix] out-of-bounds error for routed experts capture vllm-project/vllm#37118
[Bugfix] LoRA: extend expert base_layer loading to Qwen3.5 and Step3.x vllm-project/vllm#37114
Fix Megatron->HF export when PP>1 for Qwen3.5 VL MoE models with MTP enabled NVIDIA-NeMo/Megatron-Bridge#2799
LoRA bridge & merge for Qwen3.5 NVIDIA-NeMo/Megatron-Bridge#2736
move
.base_layernormalization out of the Megatron sender and into the vLLM receivermake vLLM weight-name normalization robust to packed modules and fused MoE logical aliases
fix bucketed LoRA IPC updates so multi-bucket adapters are applied only after the final bucket arrives
avoid incorrect engine offload behavior when param offload is disabled
improve MTP compatibility for nested HF text configs and newer Megatron-LM APIs
add targeted regression coverage for the new weight-sync and worker behaviors
The previous weight sync path relied on sender-side name rewriting, including hard-coded
.base_layerhandling. That made the Megatron/vLLM boundary brittle for newer models, especially packed projections and fused MoE modules.In addition, the async LoRA update path assumed each adapter arrived in a single IPC bucket, which is not guaranteed by the bucketed transport. That could produce incomplete
add_lorarequests when LoRA tensors were split across multiple buckets.There were also a few compatibility issues around:
text_configMTP fieldsprocess_mtp_lossAPIChecklist Before Starting
[{modules}] {type}: {description}(This will be checked by the CI){modules}includefsdp,megatron,veomni,sglang,vllm,rollout,trainer,ci,training_utils,recipe,hardware,deployment,ray,worker,single_controller,misc,perf,model,algo,env,tool,ckpt,doc,data,cfg,reward,fully_async,one_step_off,like[megatron, fsdp, doc]{type}is infeat,fix,refactor,chore,test[BREAKING]to the beginning of the title.[BREAKING][fsdp, megatron] feat: dynamic batchingTest
API and Usage Example
# Add code snippet or script demonstrating how to use thisDesign & Code Changes
1. Receiver-side base-layer normalization for vLLM weight sync
Instead of mutating exported Megatron parameter names on the sender side, the vLLM colocate worker now resolves incoming names against the live vLLM parameter namespace.
This includes:
.base_layerleaf params (weight/bias).base_layerfrom non-leaf fused-MoE logical aliasesq_proj -> qkv_proj2. Multi-bucket LoRA update support
This keeps the sender simpler and lets the receiver normalize names based on the actual loaded vLLM model structure.
BucketedWeightReceivernow forwardsis_lastto the callback.The async rollout weight update path uses that signal to:
add_loraonly once, after the final bucket is receivedThis aligns VERL's bucketed transport semantics with vLLM's expectation that one
add_lorarequest contains a complete adapter tensor dict.3. Megatron PEFT utility cleanup
The old hard-coded stacked-parameter suffix list was removed.
megatron_peft_utilsnow exposes generic helpers to:.base_layer.base_layerThe Megatron-to-HF module mapping was also expanded for GDN modules, including:
in_proj -> [in_proj_qkv, in_proj_z, in_proj_b, in_proj_a]out_proj -> [out_proj]4. MTP compatibility updates
MTP checks and disabling logic now work with nested HF text configs as well as configs that use
mtp_num_hidden_layersinstead of onlynum_nextn_predict_layers.The Megatron MTP patch also prefers the newer upstream
process_mtp_losshelper when available, while preserving a fallback path for older Megatron-LM versions.5. Worker/runtime fixes
ActorRolloutRefWorker.update_weightsnow only offloads the actor engine back to CPU when param offload is actually enabled.vision_configis now preserved in Megatron checkpoint manager backup state.Checklist Before Submitting
Important
Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review.
pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=alwaysci-requestchannel in theverlSlack workspace. (If not accessible, please try the Feishu group (飞书群).)recipesubmodule, please also update the reference to the submodule commit viagit submodule update --remoteorcd recipe && git pull origin main.