[Feat] Support llm-compressor AWQ models in TurboMind#4290

Merged

lvhan028 merged 6 commits intoInternLM:mainfrom

43758726:add/awq_pack

Jan 28, 2026

Collaborator

43758726 commented Jan 26, 2026 •

edited by lvhan028

Loading

Support llm-compressor AWQ models in TurboMind. ( Fix #3917)

Thanks for your contribution and we appreciate it a lot. The following instructions would make your pull request more healthy and more easily receiving feedbacks. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers.

Motivation

Make awq model quantized by llm-compressor can be inferenced in turboMind.

Modification

lmdeploy/lmdeploy/turbomind/deploy/converter.py: add module that read config.json from llm-compressor.
lmdeploy/lmdeploy/turbomind/deploy/policy.py: add module that unpack awq weights from llm-compressor.
lmdeploy/lmdeploy/turbomind/deploy/parameter.py: add module that can export weights from llm-compressor to turboMind.

Use cases (Optional)

from lmdeploy import pipeline, TurbomindEngineConfig
engine_config = TurbomindEngineConfig()
pipe = pipeline("{awq model path quantzied by llm-compressor}", 
                backend_config=engine_config)
response = pipe(["Hi, pls intro yourself", "Shanghai is"])
print(response)

Checklist

Pre-commit or other linting tools are used to fix the potential lint issues.
The modification is covered by complete unit tests. If not, please add more unit tests to ensure the correctness.
If the modification has a dependency on downstream projects of a newer version, this PR should be tested with all supported versions of downstream projects.
The documentation has been modified accordingly, like docstring or example tutorials.


          [Add] add module that make awq model quantized by llm-compressor can …

661ff3f

…be inderenced in turboMind

Copilot AI review requested due to automatic review settings

January 26, 2026 04:14

Copilot started reviewing on behalf of 43758726

January 26, 2026 04:14

Copilot AI reviewed

View reviewed changes

Contributor

Copilot AI left a comment

Pull request overview

This PR adds support for AWQ models quantized by llm-compressor in TurboMind by treating them as a variant of the AWQ format. The implementation adds handlers for the "compressed-tensors" quantization format throughout the deployment pipeline.

Changes:

Added support for parsing llm-compressor's "compressed-tensors" quantization config format
Implemented weight unpacking logic specific to llm-compressor's packed tensor format
Added parameter export handlers to convert llm-compressor weights to TurboMind format

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 7 comments.

File	Description
lmdeploy/turbomind/deploy/converter.py	Adds parsing logic for compressed-tensors config and converts format to 'awq' internally
lmdeploy/turbomind/deploy/policy.py	Implements `process_compressed_tensor` function to unpack llm-compressor quantized weights
lmdeploy/turbomind/deploy/parameter.py	Adds `QuantWeightCompressorOnly` class to handle llm-compressor weight parameter export

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

lmdeploy/turbomind/deploy/converter.py Outdated

Comment on lines +153 to +160

+                          _config_groups = quant_config.get('config_groups')
+                          _group_0 = _config_groups.get('group_0')
+                          _format = _group_0.get('format')
+                          assert _format == 'pack-quantized', \
+                              f'format is {_format}' \
+                              f'only pack-quantized format is supported' \
+                              f'when quant_method is compressed-tensors'
+                          _weights = _group_0.get('weights')

Copilot AI Jan 26, 2026

The code does not handle the case where group_0 might be None or missing. If _config_groups.get('group_0') returns None, the subsequent _group_0.get('format') will raise an AttributeError. Consider adding a check to ensure this nested key exists before accessing it.

Suggested change

      
                        _config_groups = quant_config.get('config_groups')
          
                        _group_0 = _config_groups.get('group_0')
          
                        _format = _group_0.get('format')
          
                        assert _format == 'pack-quantized', \
          
                            f'format is {_format}' \
          
                            f'only pack-quantized format is supported' \
          
                            f'when quant_method is compressed-tensors'
          
                        _weights = _group_0.get('weights')
          
                        _config_groups = quant_config.get('config_groups') or {}
          
                        _group_0 = _config_groups.get('group_0') or {}
          
                        _format = _group_0.get('format')
          
                        assert _format == 'pack-quantized', \
          
                            f'format is {_format}' \
          
                            f'only pack-quantized format is supported' \
          
                            f'when quant_method is compressed-tensors'
          
                        _weights = _group_0.get('weights') or {}

Copilot uses AI. Check for mistakes.

lmdeploy/turbomind/deploy/converter.py Outdated

Comment on lines +153 to +165

+                          _config_groups = quant_config.get('config_groups')
+                          _group_0 = _config_groups.get('group_0')
+                          _format = _group_0.get('format')
+                          assert _format == 'pack-quantized', \
+                              f'format is {_format}' \
+                              f'only pack-quantized format is supported' \
+                              f'when quant_method is compressed-tensors'
+                          _weights = _group_0.get('weights')
+                          _group_size = _weights.get('group_size')
+                          _num_bits = _weights.get('num_bits')
+                          _type = _weights.get('type')
+                          assert _num_bits == 4 and _type == 'int', \
+                              'only 4 bit integer weighttype is supported' \

Copilot AI Jan 26, 2026

The code does not handle the case where weights might be None or missing. If _group_0.get('weights') returns None, the subsequent calls to _weights.get('group_size'), _weights.get('num_bits'), and _weights.get('type') will raise AttributeError. Consider adding a check to ensure this nested key exists before accessing it.

Suggested change

      
                        _config_groups = quant_config.get('config_groups')
          
                        _group_0 = _config_groups.get('group_0')
          
                        _format = _group_0.get('format')
          
                        assert _format == 'pack-quantized', \
          
                            f'format is {_format}' \
          
                            f'only pack-quantized format is supported' \
          
                            f'when quant_method is compressed-tensors'
          
                        _weights = _group_0.get('weights')
          
                        _group_size = _weights.get('group_size')
          
                        _num_bits = _weights.get('num_bits')
          
                        _type = _weights.get('type')
          
                        assert _num_bits == 4 and _type == 'int', \
          
                            'only 4 bit integer weighttype is supported' \
          
                        _config_groups = quant_config.get('config_groups') or {}
          
                        _group_0 = _config_groups.get('group_0') or {}
          
                        _format = _group_0.get('format')
          
                        assert _format == 'pack-quantized', \
          
                            f'format is {_format}' \
          
                            f'only pack-quantized format is supported' \
          
                            f'when quant_method is compressed-tensors'
          
                        _weights = _group_0.get('weights') or {}
          
                        _group_size = _weights.get('group_size')
          
                        _num_bits = _weights.get('num_bits')
          
                        _type = _weights.get('type')
          
                        assert _group_size is not None and _num_bits == 4 \
          
                            and _type == 'int', \
          
                            'only 4 bit integer weighttype is supported ' \

Copilot uses AI. Check for mistakes.

lmdeploy/turbomind/deploy/converter.py Outdated

+                          _num_bits = _weights.get('num_bits')
+                          _type = _weights.get('type')
+                          assert _num_bits == 4 and _type == 'int', \
+                              'only 4 bit integer weighttype is supported' \

Copilot AI Jan 26, 2026

The word "weighttype" should be two separate words: "weight type". This improves readability and follows standard English conventions.

Suggested change

      
                            'only 4 bit integer weighttype is supported' \
          
                            'only 4 bit integer weight type is supported' \

Copilot uses AI. Check for mistakes.

lmdeploy/turbomind/deploy/converter.py Outdated

-                  if model_format in ['awq', 'gptq']:
+                  if model_format in ['awq', 'gptq', 'compressed-tensors']:
                       weight_type = 'int4'
                       dtype = 'float16'  # force float16 for GPTQ/AWQ weights

Copilot AI Jan 26, 2026

The comment refers to "GPTQ/AWQ weights" but now compressed-tensors format is also included in this conditional branch. The comment should be updated to "GPTQ/AWQ/compressed-tensors weights" or more generically "int4 quantized weights" to accurately reflect the code behavior.

Suggested change

      
                    dtype = 'float16'  # force float16 for GPTQ/AWQ weights
          
                    dtype = 'float16'  # force float16 for int4 quantized weights

Copilot uses AI. Check for mistakes.

lmdeploy/turbomind/deploy/converter.py Outdated

Comment on lines +157 to +165

+                              f'format is {_format}' \
+                              f'only pack-quantized format is supported' \
+                              f'when quant_method is compressed-tensors'
+                          _weights = _group_0.get('weights')
+                          _group_size = _weights.get('group_size')
+                          _num_bits = _weights.get('num_bits')
+                          _type = _weights.get('type')
+                          assert _num_bits == 4 and _type == 'int', \
+                              'only 4 bit integer weighttype is supported' \

Copilot AI Jan 26, 2026

The error message strings are not properly joined and are missing spaces between the concatenated f-strings. This will produce a message like "format is pack-quantizedonly pack-quantized format is supportedwhen quant_method is compressed-tensors" instead of the intended message with proper spacing.

Suggested change

      
                            f'format is {_format}' \
          
                            f'only pack-quantized format is supported' \
          
                            f'when quant_method is compressed-tensors'
          
                        _weights = _group_0.get('weights')
          
                        _group_size = _weights.get('group_size')
          
                        _num_bits = _weights.get('num_bits')
          
                        _type = _weights.get('type')
          
                        assert _num_bits == 4 and _type == 'int', \
          
                            'only 4 bit integer weighttype is supported' \
          
                            f'format is {_format}. ' \
          
                            'only pack-quantized format is supported ' \
          
                            'when quant_method is compressed-tensors'
          
                        _weights = _group_0.get('weights')
          
                        _group_size = _weights.get('group_size')
          
                        _num_bits = _weights.get('num_bits')
          
                        _type = _weights.get('type')
          
                        assert _num_bits == 4 and _type == 'int', \
          
                            'only 4 bit integer weight type is supported ' \

Copilot uses AI. Check for mistakes.

lmdeploy/turbomind/deploy/converter.py Outdated

Comment on lines +157 to +165

+                              f'format is {_format}' \
+                              f'only pack-quantized format is supported' \
+                              f'when quant_method is compressed-tensors'
+                          _weights = _group_0.get('weights')
+                          _group_size = _weights.get('group_size')
+                          _num_bits = _weights.get('num_bits')
+                          _type = _weights.get('type')
+                          assert _num_bits == 4 and _type == 'int', \
+                              'only 4 bit integer weighttype is supported' \

Copilot AI Jan 26, 2026

The error message strings are not properly joined and are missing spaces between the concatenated strings. This will produce a message like "only 4 bit integer weighttype is supportedwhen format is pack-quantized" instead of the intended message with proper spacing.

Suggested change

      
                            f'format is {_format}' \
          
                            f'only pack-quantized format is supported' \
          
                            f'when quant_method is compressed-tensors'
          
                        _weights = _group_0.get('weights')
          
                        _group_size = _weights.get('group_size')
          
                        _num_bits = _weights.get('num_bits')
          
                        _type = _weights.get('type')
          
                        assert _num_bits == 4 and _type == 'int', \
          
                            'only 4 bit integer weighttype is supported' \
          
                            f'format is {_format} ' \
          
                            f'only pack-quantized format is supported ' \
          
                            f'when quant_method is compressed-tensors'
          
                        _weights = _group_0.get('weights')
          
                        _group_size = _weights.get('group_size')
          
                        _num_bits = _weights.get('num_bits')
          
                        _type = _weights.get('type')
          
                        assert _num_bits == 4 and _type == 'int', \
          
                            'only 4 bit integer weighttype is supported ' \

Copilot uses AI. Check for mistakes.

lmdeploy/turbomind/deploy/converter.py Outdated

Comment on lines +153 to +160

+                          _config_groups = quant_config.get('config_groups')
+                          _group_0 = _config_groups.get('group_0')
+                          _format = _group_0.get('format')
+                          assert _format == 'pack-quantized', \
+                              f'format is {_format}' \
+                              f'only pack-quantized format is supported' \
+                              f'when quant_method is compressed-tensors'
+                          _weights = _group_0.get('weights')

Copilot AI Jan 26, 2026

The code does not handle the case where config_groups might be None or missing from the quant_config. If quant_config.get('config_groups') returns None, the subsequent _config_groups.get('group_0') will raise an AttributeError. Consider adding a check to ensure the nested structure exists before accessing it.

Suggested change

      
                        _config_groups = quant_config.get('config_groups')
          
                        _group_0 = _config_groups.get('group_0')
          
                        _format = _group_0.get('format')
          
                        assert _format == 'pack-quantized', \
          
                            f'format is {_format}' \
          
                            f'only pack-quantized format is supported' \
          
                            f'when quant_method is compressed-tensors'
          
                        _weights = _group_0.get('weights')
          
                        _config_groups = quant_config.get('config_groups')
          
                        assert isinstance(_config_groups, dict), \
          
                            f'invalid quant_config: expected "config_groups" dict, ' \
          
                            f'got {_config_groups!r}'
          
                        _group_0 = _config_groups.get('group_0')
          
                        assert isinstance(_group_0, dict), \
          
                            f'invalid quant_config: expected "group_0" dict, ' \
          
                            f'got {_group_0!r}'
          
                        _format = _group_0.get('format')
          
                        assert _format == 'pack-quantized', \
          
                            f'format is {_format}' \
          
                            f'only pack-quantized format is supported' \
          
                            f'when quant_method is compressed-tensors'
          
                        _weights = _group_0.get('weights')
          
                        assert isinstance(_weights, dict), \
          
                            f'invalid quant_config: expected "weights" dict, ' \
          
                            f'got {_weights!r}'

Copilot uses AI. Check for mistakes.

lvhan028 added the enhancement label

lvhan028 reviewed

View reviewed changes

lmdeploy/turbomind/deploy/parameter.py Outdated

		f(i, g('weight'), 'weight', identity)


		class QuantWeightCompressorOnly(Parameter):

Collaborator

lvhan028 Jan 27, 2026

How about Int4CompressedTensor? @lzhangzz

Collaborator

lzhangzz Jan 28, 2026

CompressedWeight

Collaborator

lvhan028 Jan 28, 2026

@43758726 Please use "CompressedWeight"

This was referenced Jan 27, 2026

LMDeploy Lite vs LLM Compressor #3199

Open

[Feature] 请求支持compressed-tensors的各种量化格式 #3917

Closed

43758726 added 2 commits

January 27, 2026 06:30


          [Fix] modify function name and code format

b8b8825


          [Fix] modify get_output_model_registered_name_and_config function use…

2a65b75

… format in converter.py

Collaborator

lvhan028 commented Jan 28, 2026

cc @zhulinJulia24 we need to add llm-compressor quantized model into our test case.

43758726 added 3 commits

January 28, 2026 07:23


          [Add] add llm-compressor deploy user guide document

13c7181


          [Add] add llm_compressor_support.md location in index.rst

f493762


          [Fix] modify some mistakes in llm-compressor user guide

2b03d8f

lvhan028 approved these changes

View reviewed changes

lzhangzz approved these changes

View reviewed changes

lvhan028 merged commit bce2098 into InternLM:main

8 of 9 checks passed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels