Support llm-compressor symmetric quantized model inference in TurboMind by 43758726 · Pull Request #4305 · InternLM/lmdeploy

43758726 · 2026-01-28T14:09:50Z

Thanks for your contribution and we appreciate it a lot. The following instructions would make your pull request more healthy and more easily receiving feedbacks. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers.

Motivation

Make symmetric awq/gptq model quantized by llm-compressor can be inferenced in turboMind.

Modification

lmdeploy/lmdeploy/turbomind/deploy/parameter.py: add module that need to init weight_zero_point if model quantized from llm-compressor is symmetric.

Use cases (Optional)

from lmdeploy import pipeline, TurbomindEngineConfig
engine_config = TurbomindEngineConfig()
pipe = pipeline("{awq/gptq model path quantzied by llm-compressor}", 
                backend_config=engine_config)
response = pipe(["Hi, pls intro yourself", "Shanghai is"])
print(response)

Checklist

Pre-commit or other linting tools are used to fix the potential lint issues.
The modification is covered by complete unit tests. If not, please add more unit tests to ensure the correctness.
If the modification has a dependency on downstream projects of a newer version, this PR should be tested with all supported versions of downstream projects.
The documentation has been modified accordingly, like docstring or example tutorials.

…boMind

Copilot

Pull request overview

Adds support for symmetric AWQ/GPTQ models quantized by llm-compressor when running inference with TurboMind by ensuring a zeros (zero-point) tensor exists for compressed weights.

Changes:

Add a fallback path to generate weight_zero_point tensors when missing (intended for symmetric quantized compressed-tensors models).
Adjust Parameter.take() to return the matched key list, enabling CompressedWeight to detect whether weight_zero_point exists.
Update get_params() wiring to pass matched keys into CompressedWeight.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-01-28T14:14:53Z

lmdeploy/turbomind/deploy/parameter.py

+    import torch
+    weight_shapes = g('weight_shape')
+    result = []
+    for weight_shape in weight_shapes:
+        row, col = weight_shape
+        tensor = torch.full((row, col // 128), 8, dtype=torch.uint8)


generate_zero_point() relies on g('weight_shape'), but weight_shape is not a supported kind in the input policies (e.g. process_compressed_tensor) and many readers (e.g. LlamaReader._attn) use dict.get, so this can return None or a tensor of shape values. In either case, row, col = weight_shape will produce tensors/None and torch.full((row, col // 128), ...) will raise at runtime. Prefer deriving the zeros tensor shapes from g('weight_scale') (or g('weight_packed')) which are guaranteed present for CompressedWeight, and allocate on the same device as the other tensors.

Suggested change

import torch

weight_shapes = g('weight_shape')

result = []

for weight_shape in weight_shapes:

row, col = weight_shape

tensor = torch.full((row, col // 128), 8, dtype=torch.uint8)

weight_scales = g('weight_scale')

# Normalize to a tuple of tensors for uniform handling

if isinstance(weight_scales, torch.Tensor):

weight_scales = (weight_scales, )

result = []

for scale in weight_scales:

# Match the shape and device of the corresponding scale tensor

tensor = torch.full(scale.shape, 8, dtype=torch.uint8, device=scale.device)

Copilot · 2026-01-28T14:14:53Z

lmdeploy/turbomind/deploy/parameter.py



+def generate_zero_point(g):
+    import torch


This import of module torch is redundant, as it was previously imported on line 5.
This import of module lmdeploy.pytorch.check_env.torch is redundant, as it was previously imported on line 5.

Suggested change

import torch

[Add] add module make llm-compressor symmetric model inference in tur…

fe051f2

…boMind

Copilot AI review requested due to automatic review settings January 28, 2026 14:09

Copilot started reviewing on behalf of 43758726 January 28, 2026 14:10 View session

Copilot AI reviewed Jan 28, 2026

View reviewed changes

43758726 added 4 commits January 29, 2026 03:14

[Fix] delete repeat 'import torch' in parameter.py

18be073

[Fix] modify accuracy evaluation data in llm_compressor.md

cfce130

[Fix] delete diff between bf16 and awq models in accuracy evaluation

62007a9

[Fix] modify accuracy evaluation conclusion

df2c10d

lvhan028 approved these changes Feb 2, 2026

View reviewed changes

lvhan028 changed the title ~~[Add] make llm-compressor symmetric model inference in TurboMind~~ Support llm-compressor symmetric quantized model inference in TurboMind Feb 2, 2026

lvhan028 merged commit 809d114 into InternLM:main Feb 2, 2026
4 of 5 checks passed

lvhan028 added the enhancement New feature or request label Feb 2, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support llm-compressor symmetric quantized model inference in TurboMind#4305

Support llm-compressor symmetric quantized model inference in TurboMind#4305
lvhan028 merged 5 commits intoInternLM:mainfrom
43758726:add/awq_symmetric

43758726 commented Jan 28, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Jan 28, 2026

Uh oh!

Copilot AI Jan 28, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

43758726 commented Jan 28, 2026

Motivation

Modification

Use cases (Optional)

Checklist

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Jan 28, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 28, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants