[Fix] Add lora tied lm head support (for Qwen2.5, Gemma, etc model need) by yushengsu-thu · Pull Request #18634 · sgl-project/sglang

yushengsu-thu · 2026-02-11T19:11:58Z

Motivation

Added test ci: 324 lines
Modified code in sgl: 87 lines

When tie_word_embeddings=True (e.g., Qwen2.5, Gemma), the model's lm_head is the same Python object as embed_tokens. PyTorch's named_modules() deduplicates by object identity, so lm_head never appears as a separate module — LoRA cannot wrap it. This causes LoRA adapters that target lm_head to silently fail or produce incorrect results.

Additionally, PEFT adapters may use shorthand strings like "all-linear" or "all" for target_modules, which SGLang previously did not handle, leading to crashes during adapter loading. PEFT also renames lm_head to unembed_tokens internally in some configurations, which was not recognized by SGLang's weight loader.

Modifications

`python/sglang/srt/lora/lora_manager.py`

Untie lm_head for LoRA wrapping: When lm_head is the same object as embed_tokens, create a new ParallelLMHead that shares the base weight tensor (no extra GPU memory) so that named_modules() yields it as an independent module.
Handle PEFT shorthand target_modules: Gracefully handle "all-linear" and "all" strings by requiring the user to specify --lora-target-modules at server startup. Raise clear error messages for unrecognized string values.

`python/sglang/srt/lora/lora.py`

Remap unembed_tokens to lm_head: PEFT internally renames lm_head to unembed_tokens in some adapter configs. The weight loader now remaps this key so the weight is loaded into the correct buffer.
Allow loading when normalized_target_modules is empty: When target_modules is a shorthand like "all-linear", the normalized set is empty. Allow embed_tokens/lm_head weights to be loaded in this case, deferring to --lora-target-modules for module selection.

`python/sglang/srt/lora/utils.py`

Handle string target_modules in get_normalized_target_modules(): Return an empty set for PEFT shorthands so callers can fall back to CLI config.
Add unembed_tokens → lm_head mapping to params_mapping.

`test/registered/lora/test_lora_tied_lm_head.py` (new)

Programmatically creates a LoRA adapter with lm_head in target_modules on a model with tie_word_embeddings=True (Qwen/Qwen2.5-0.5B).
test_tied_lm_head_lora_no_nan: Verifies SGLang does not produce NaN values.
test_tied_lm_head_lora_differs_from_base: Confirms LoRA output differs from the base model (i.e., lm_head LoRA is actually applied).
test_tied_lm_head_lora_hf_sgl_logprob_match: Compares prefill and decode logprobs between HuggingFace+PEFT and SGLang, ensuring numerical consistency within threshold.

Accuracy Tests

The new test test_lora_tied_lm_head.py validates:

No NaN values in output logprobs
LoRA adapter produces different output from the base model
SGLang logprobs match HuggingFace+PEFT logprobs (max diff < 2e-1)

Tested with Qwen/Qwen2.5-0.5B (tie_word_embeddings=True) + triton LoRA backend.

Benchmarking and Profiling

No impact on inference speed — the untied lm_head shares the same weight tensor as embed_tokens, adding zero GPU memory overhead. The change only affects the module graph structure during initialization.

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Update documentation according to Write documentations.
Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed.
Follow the SGLang code style guidance.

…t PEFT shorthand target_modules

…ad-support

gemini-code-assist · 2026-02-11T19:12:02Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

yushengsu-thu · 2026-02-11T19:19:33Z

/tag-and-rerun-ci

Copilot

Pull request overview

This PR fixes LoRA adapter loading/application for models where tie_word_embeddings=True (so lm_head is the same module object as embed_tokens), and improves compatibility with PEFT configs that use shorthand target_modules values and/or rename lm_head to unembed_tokens.

Changes:

Add logic to make tied lm_head appear as an independent module so LoRA can wrap it.
Improve PEFT config compatibility: handle string shorthands for target_modules and remap unembed_tokens → lm_head during weight loading.
Add a CUDA nightly regression test covering tied lm_head LoRA behavior and HF-vs-SGLang logprob consistency.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 7 comments.

File	Description
`python/sglang/srt/lora/lora_manager.py`	Handles PEFT shorthand `target_modules` and creates an untied `lm_head` module for LoRA wrapping when embeddings are tied.
`python/sglang/srt/lora/lora.py`	Remaps PEFT `unembed_tokens` weights to `lm_head` and loosens embedding weight filtering when normalized targets are empty.
`python/sglang/srt/lora/utils.py`	Accepts string `target_modules` inputs (PEFT shorthands) and adds `unembed_tokens` → `lm_head` normalization mapping.
`test/registered/lora/test_lora_tied_lm_head.py`	New regression test for tied `lm_head` LoRA, including NaN checks, base-vs-LoRA difference, and HF parity.

Comments suppressed due to low confidence (1)

test/registered/lora/test_lora_tied_lm_head.py:325

'except' clause does nothing but pass and there is no explanatory comment.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-11T19:19:44Z

test/registered/lora/test_lora_tied_lm_head.py

+    model = AutoModelForCausalLM.from_pretrained(
+        base_model_name,
+        torch_dtype=torch.float16,
+        device_map="cpu",


Using device_map="cpu" in Transformers typically pulls in the Accelerate dependency; if it’s missing, this will error even though the test only needs a CPU load. Consider removing device_map (CPU is the default) or explicitly moving the model to CPU after load to avoid an unnecessary dependency in CI.

Suggested change

device_map="cpu",

test/registered/lora/test_lora_tied_lm_head.py

python/sglang/srt/lora/lora_manager.py

python/sglang/srt/lora/utils.py

test/registered/lora/test_lora_tied_lm_head.py

yushengsu-thu · 2026-02-12T06:01:14Z

/tag-and-rerun-ci

python/sglang/srt/lora/utils.py

test/registered/lora/test_lora_tied_lm_head.py

…ad-support

Fridge003 · 2026-02-17T16:58:26Z

@yushengsu-thu Can you please post the local result of running the newly added test?

…ed) (sgl-project#18634)

yushengsu-thu added 2 commits February 11, 2026 19:09

[LoRA] Fix lm_head LoRA on models with tie_word_embeddings and suppor…

1830355

…t PEFT shorthand target_modules

Merge remote-tracking branch 'upstream/main' into fix/lora-tied-lm-he…

3829597

…ad-support

Copilot AI review requested due to automatic review settings February 11, 2026 19:11

yushengsu-thu requested review from Fridge003, Ying1123 and lifuhuang as code owners February 11, 2026 19:11

github-actions bot added the lora label Feb 11, 2026

Copilot started reviewing on behalf of yushengsu-thu February 11, 2026 19:12 View session

yushengsu-thu changed the title ~~[Fix] Add lora tied lm head support~~ [Fix] Add lora tied lm head support (for Qwen2.5, Gemma, etc model need) Feb 11, 2026

run pre-commit

418e183

Copilot AI reviewed Feb 11, 2026

View reviewed changes

github-actions bot added the run-ci label Feb 11, 2026

Update test_lora_tied_lm_head.py

d3fa78c

yushengsu-thu mentioned this pull request Feb 12, 2026

Development Roadmap - miles LoRA training support radixark/miles#340

Open

25 tasks

Fridge003 reviewed Feb 12, 2026

View reviewed changes

python/sglang/srt/lora/utils.py Outdated Show resolved Hide resolved

test/registered/lora/test_lora_tied_lm_head.py Show resolved Hide resolved

yushengsu-thu added 4 commits February 14, 2026 00:50

fix review

f91bbbf

Merge remote-tracking branch 'upstream/main' into fix/lora-tied-lm-he…

a37f818

…ad-support

fx review

fcb7a69

Merge remote-tracking branch 'upstream/main' into fix/lora-tied-lm-he…

824ce5c

…ad-support

Fridge003 approved these changes Feb 18, 2026

View reviewed changes

Fridge003 merged commit 9c5aae4 into sgl-project:main Feb 18, 2026
85 of 91 checks passed

yushengsu-thu added a commit to yushengsu-thu/sglang that referenced this pull request Feb 25, 2026

[Fix] Add lora tied lm head support (for Qwen2.5, Gemma, etc model ne…

b382890

…ed) (sgl-project#18634)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Fix] Add lora tied lm head support (for Qwen2.5, Gemma, etc model need)#18634

[Fix] Add lora tied lm head support (for Qwen2.5, Gemma, etc model need)#18634
Fridge003 merged 8 commits intosgl-project:mainfrom
yushengsu-thu:fix/lora-tied-lm-head-support

yushengsu-thu commented Feb 11, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot commented Feb 11, 2026

Uh oh!

yushengsu-thu commented Feb 11, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Feb 11, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

yushengsu-thu commented Feb 12, 2026

Uh oh!

Uh oh!

Uh oh!

Fridge003 commented Feb 17, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

yushengsu-thu commented Feb 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

python/sglang/srt/lora/lora_manager.py

python/sglang/srt/lora/lora.py

python/sglang/srt/lora/utils.py

test/registered/lora/test_lora_tied_lm_head.py (new)

Accuracy Tests

Benchmarking and Profiling

Checklist

Uh oh!

gemini-code-assist bot commented Feb 11, 2026

Uh oh!

yushengsu-thu commented Feb 11, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

yushengsu-thu commented Feb 12, 2026

Uh oh!

Uh oh!

Uh oh!

Fridge003 commented Feb 17, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

yushengsu-thu commented Feb 11, 2026 •

edited

Loading

`python/sglang/srt/lora/lora_manager.py`

`python/sglang/srt/lora/lora.py`

`python/sglang/srt/lora/utils.py`

`test/registered/lora/test_lora_tied_lm_head.py` (new)