[megatron] fix: Qwen3.5 LoRA & MTP support (with Megatron-Bridge) by HollowMan6 · Pull Request #5599 · verl-project/verl

HollowMan6 · 2026-03-15T19:05:49Z

What does this PR do?

Need the following PRs:

[megatron] fix: MTP patch for newer mcore #5587
[Bugfix][LoRA] Fix Qwen35 LoRA vllm-project/vllm#36976
fix: generalize LoRA layer handling for N-way fused projections vllm-project/vllm#37019
[Bugfix] out-of-bounds error for routed experts capture vllm-project/vllm#37118
[Bugfix] LoRA: extend expert base_layer loading to Qwen3.5 and Step3.x vllm-project/vllm#37114
Fix Megatron->HF export when PP>1 for Qwen3.5 VL MoE models with MTP enabled NVIDIA-NeMo/Megatron-Bridge#2799
LoRA bridge & merge for Qwen3.5 NVIDIA-NeMo/Megatron-Bridge#2736
move .base_layer normalization out of the Megatron sender and into the vLLM receiver
make vLLM weight-name normalization robust to packed modules and fused MoE logical aliases
fix bucketed LoRA IPC updates so multi-bucket adapters are applied only after the final bucket arrives
avoid incorrect engine offload behavior when param offload is disabled
improve MTP compatibility for nested HF text configs and newer Megatron-LM APIs
add targeted regression coverage for the new weight-sync and worker behaviors

The previous weight sync path relied on sender-side name rewriting, including hard-coded .base_layer handling. That made the Megatron/vLLM boundary brittle for newer models, especially packed projections and fused MoE modules.

In addition, the async LoRA update path assumed each adapter arrived in a single IPC bucket, which is not guaranteed by the bucketed transport. That could produce incomplete add_lora requests when LoRA tensors were split across multiple buckets.

There were also a few compatibility issues around:

Qwen-style nested text_config MTP fields
newer Megatron-LM process_mtp_loss API
actor engine offload behavior when automatic reload is not enabled

Checklist Before Starting

Search for similar PRs. Paste at least one query link here: ...
Format the PR title as [{modules}] {type}: {description} (This will be checked by the CI)
- {modules} include fsdp, megatron, veomni, sglang, vllm, rollout, trainer, ci, training_utils, recipe, hardware, deployment, ray, worker, single_controller, misc, perf, model, algo, env, tool, ckpt, doc, data, cfg, reward, fully_async, one_step_off
- If this PR involves multiple modules, separate them with , like [megatron, fsdp, doc]
- {type} is in feat, fix, refactor, chore, test
- If this PR breaks any API (CLI arguments, config, function signature, etc.), add [BREAKING] to the beginning of the title.
- Example: [BREAKING][fsdp, megatron] feat: dynamic batching

Test

For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc.

API and Usage Example

Demonstrate how the API changes if any, and provide usage example(s) if possible.

# Add code snippet or script demonstrating how to use this

Design & Code Changes

1. Receiver-side base-layer normalization for vLLM weight sync

Instead of mutating exported Megatron parameter names on the sender side, the vLLM colocate worker now resolves incoming names against the live vLLM parameter namespace.

This includes:

generic add/remove resolution for .base_layer leaf params (weight / bias)
stripping Bridge-inserted .base_layer from non-leaf fused-MoE logical aliases
packed-owner lookup for aliases such as q_proj -> qkv_proj

2. Multi-bucket LoRA update support

This keeps the sender simpler and lets the receiver normalize names based on the actual loaded vLLM model structure.

BucketedWeightReceiver now forwards is_last to the callback.

The async rollout weight update path uses that signal to:

keep standard base-weight loading per bucket
accumulate LoRA tensors across buckets
call add_lora only once, after the final bucket is received

This aligns VERL's bucketed transport semantics with vLLM's expectation that one add_lora request contains a complete adapter tensor dict.

3. Megatron PEFT utility cleanup

The old hard-coded stacked-parameter suffix list was removed.

megatron_peft_utils now exposes generic helpers to:

add .base_layer
remove .base_layer
resolve the correct name by probing the target namespace

The Megatron-to-HF module mapping was also expanded for GDN modules, including:

in_proj -> [in_proj_qkv, in_proj_z, in_proj_b, in_proj_a]
out_proj -> [out_proj]

4. MTP compatibility updates

MTP checks and disabling logic now work with nested HF text configs as well as configs that use mtp_num_hidden_layers instead of only num_nextn_predict_layers.

The Megatron MTP patch also prefers the newer upstream process_mtp_loss helper when available, while preserving a fallback path for older Megatron-LM versions.

5. Worker/runtime fixes

ActorRolloutRefWorker.update_weights now only offloads the actor engine back to CPU when param offload is actually enabled.
vision_config is now preserved in Megatron checkpoint manager backup state.

Checklist Before Submitting

Important

Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review.

Read the Contribute Guide.
Apply pre-commit checks: pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always
Add / Update the documentation.
Add unit or end-to-end test(s) to the CI workflow to cover all the code. If not feasible, explain why: ...
Once your PR is ready for CI, send a message in the ci-request channel in the verl Slack workspace. (If not accessible, please try the Feishu group (飞书群).)
If your PR is related to the recipe submodule, please also update the reference to the submodule commit via git submodule update --remote or cd recipe && git pull origin main.

gemini-code-assist

Code Review

This pull request introduces significant improvements and fixes for Qwen3.5 LoRA support within the Megatron framework. The core change involves refactoring the weight synchronization logic by moving the .base_layer normalization from the sender (Megatron) to the receiver (vLLM). This makes the system more robust and less brittle, especially for newer models with packed projections and fused MoE modules. The PR also addresses a critical bug in asynchronous LoRA updates, ensuring that adapters split across multiple IPC buckets are applied correctly only after the final bucket is received. Additionally, it includes several compatibility fixes for MTP with nested Hugging Face text configurations and resolves an incorrect engine offload behavior. The changes are well-supported by new regression tests, enhancing the overall stability and maintainability of the codebase.

Copilot

Pull request overview

This PR hardens Megatron→vLLM weight synchronization (notably for Qwen3.5 + LoRA), fixes multi-bucket LoRA IPC updates, improves MTP config compatibility, and corrects actor engine offload behavior when param offload is disabled.

Changes:

Move weight-name normalization (including .base_layer and packed-module alias handling) into the vLLM receiver side and make it robust to fused MoE/packed projections.
Update bucketed IPC receiving to surface is_last and accumulate LoRA tensors across buckets before issuing a single add_lora.
Improve MTP compatibility for nested HF text_config and mtp_num_hidden_layers, avoid incorrect CPU offload when param offload is disabled, and add regression tests.

Reviewed changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
`verl/workers/rollout/vllm_rollout/utils.py`	Adds receiver-side weight-name normalization logic and multi-bucket LoRA accumulation before `add_lora`.
`verl/workers/rollout/vllm_rollout/bucketed_weight_transfer.py`	Extends per-bucket callback contract to include `is_last`.
`verl/workers/megatron_workers.py`	Updates MTP enable/disable logic to support nested `text_config` and newer MTP fields.
`verl/workers/fsdp_workers.py`	Mirrors MTP-disable behavior for nested `text_config` and `mtp_num_hidden_layers`.
`verl/workers/engine_workers.py`	Offloads actor engine to CPU only when param offload is enabled.
`verl/workers/engine/megatron/transformer_impl.py`	Removes sender-side `.base_layer` rewriting for non-merged LoRA sync.
`verl/utils/megatron_utils.py`	Updates MTP detection/disable logic for nested `text_config` and alternate MTP fields.
`verl/utils/megatron_peft_utils.py`	Replaces hard-coded stacked-param suffix logic with generic `.base_layer` name helpers.
`verl/utils/checkpoint/megatron_checkpoint_manager.py`	Preserves `vision_config` in transformer config backup state.
`tests/workers/test_engine_workers_update_weights.py`	Adds regression tests for param-offload-gated CPU offload behavior.
`tests/utils/test_vllm_weight_name_normalization_on_cpu.py`	Adds targeted tests for vLLM receiver normalization and multi-bucket LoRA accumulation.
`tests/utils/test_megatron_peft_utils.py`	Tests new `.base_layer` resolver utilities and GDN module mapping expansion.
`tests/utils/test_bucketed_weight_transfer.py`	Updates test callback signature to match new `is_last` contract.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

verl/workers/rollout/vllm_rollout/utils.py

verl/workers/megatron_workers.py

verl/utils/megatron_utils.py

- move `.base_layer` normalization out of the Megatron sender and into the vLLM receiver - make vLLM weight-name normalization robust to packed modules and fused MoE logical aliases - fix bucketed LoRA IPC updates so multi-bucket adapters are applied only after the final bucket arrives - avoid incorrect engine offload behavior when param offload is disabled - improve MTP compatibility for nested HF text configs and newer Megatron-LM APIs - add targeted regression coverage for the new weight-sync and worker behaviors The previous weight sync path relied on sender-side name rewriting, including hard-coded `.base_layer` handling. That made the Megatron/vLLM boundary brittle for newer models, especially packed projections and fused MoE modules. In addition, the async LoRA update path assumed each adapter arrived in a single IPC bucket, which is not guaranteed by the bucketed transport. That could produce incomplete `add_lora` requests when LoRA tensors were split across multiple buckets. There were also a few compatibility issues around: - Qwen-style nested `text_config` MTP fields - newer Megatron-LM `process_mtp_loss` API - actor engine offload behavior when automatic reload is not enabled Instead of mutating exported Megatron parameter names on the sender side, the vLLM colocate worker now resolves incoming names against the live vLLM parameter namespace. This includes: - generic add/remove resolution for `.base_layer` leaf params (`weight` / `bias`) - stripping Bridge-inserted `.base_layer` from non-leaf fused-MoE logical aliases - packed-owner lookup for aliases such as `q_proj -> qkv_proj` This keeps the sender simpler and lets the receiver normalize names based on the actual loaded vLLM model structure. `BucketedWeightReceiver` now forwards `is_last` to the callback. The async rollout weight update path uses that signal to: - keep standard base-weight loading per bucket - accumulate LoRA tensors across buckets - call `add_lora` only once, after the final bucket is received This aligns VERL's bucketed transport semantics with vLLM's expectation that one `add_lora` request contains a complete adapter tensor dict. The old hard-coded stacked-parameter suffix list was removed. `megatron_peft_utils` now exposes generic helpers to: - add `.base_layer` - remove `.base_layer` - resolve the correct name by probing the target namespace The Megatron-to-HF module mapping was also expanded for GDN modules, including: - `in_proj -> [in_proj_qkv, in_proj_z, in_proj_b, in_proj_a]` - `out_proj -> [out_proj]` MTP checks and disabling logic now work with nested HF text configs as well as configs that use `mtp_num_hidden_layers` instead of only `num_nextn_predict_layers`. The Megatron MTP patch also prefers the newer upstream `process_mtp_loss` helper when available, while preserving a fallback path for older Megatron-LM versions. - `ActorRolloutRefWorker.update_weights` now only offloads the actor engine back to CPU when param offload is actually enabled. - `vision_config` is now preserved in Megatron checkpoint manager backup state. Signed-off-by: Hollow Man <hollowman@opensuse.org>

Signed-off-by: Hollow Man <hollowman@opensuse.org>

HollowMan6 requested review from ISEEKYAN, PeterSH6, chenhaiq, eric-haibin-lin, vermouth1992 and wuxibin89 as code owners March 15, 2026 19:05

Copilot AI review requested due to automatic review settings March 15, 2026 19:05

Copilot started reviewing on behalf of HollowMan6 March 15, 2026 19:06 View session

gemini-code-assist bot reviewed Mar 15, 2026

View reviewed changes

Copilot AI reviewed Mar 15, 2026

View reviewed changes

verl/workers/rollout/vllm_rollout/utils.py Outdated Show resolved Hide resolved

verl/workers/rollout/vllm_rollout/utils.py Show resolved Hide resolved

verl/workers/megatron_workers.py Outdated Show resolved Hide resolved

verl/utils/megatron_utils.py Show resolved Hide resolved

HollowMan6 force-pushed the qwen3.5_lora branch from cc1964d to 38cc3e8 Compare March 15, 2026 19:14

HollowMan6 changed the title ~~[megatron] fix: Qwen3.5 LoRA support~~ [megatron] fix: Qwen3.5 LoRA & MTP support (with Megatron-Bridge) Mar 15, 2026

HollowMan6 mentioned this pull request Mar 16, 2026

[Bugfix] LoRA: extend expert base_layer loading to Qwen3.5 and Step3.x vllm-project/vllm#37114

Open

5 tasks

HollowMan6 added 3 commits March 16, 2026 20:57

Address review comments

0cbed08

Signed-off-by: Hollow Man <hollowman@opensuse.org>

fix ci

192fe74

Signed-off-by: Hollow Man <hollowman@opensuse.org>

HollowMan6 force-pushed the qwen3.5_lora branch from a323007 to 192fe74 Compare March 16, 2026 18:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[megatron] fix: Qwen3.5 LoRA & MTP support (with Megatron-Bridge)#5599

[megatron] fix: Qwen3.5 LoRA & MTP support (with Megatron-Bridge)#5599
HollowMan6 wants to merge 3 commits intoverl-project:mainfrom
HollowMan6:qwen3.5_lora

HollowMan6 commented Mar 15, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

HollowMan6 commented Mar 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Checklist Before Starting

Test

API and Usage Example

Design & Code Changes

1. Receiver-side base-layer normalization for vLLM weight sync

2. Multi-bucket LoRA update support

3. Megatron PEFT utility cleanup

4. MTP compatibility updates

5. Worker/runtime fixes

Checklist Before Submitting

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

HollowMan6 commented Mar 15, 2026 •

edited

Loading