Skip to content

[Feat] Support llm-compressor AWQ models in TurboMind#4290

Merged
lvhan028 merged 6 commits intoInternLM:mainfrom
43758726:add/awq_pack
Jan 28, 2026
Merged

[Feat] Support llm-compressor AWQ models in TurboMind#4290
lvhan028 merged 6 commits intoInternLM:mainfrom
43758726:add/awq_pack

Conversation

@43758726
Copy link
Copy Markdown
Collaborator

@43758726 43758726 commented Jan 26, 2026

Support llm-compressor AWQ models in TurboMind. ( Fix #3917)

Thanks for your contribution and we appreciate it a lot. The following instructions would make your pull request more healthy and more easily receiving feedbacks. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers.

Motivation

Make awq model quantized by llm-compressor can be inferenced in turboMind.

Modification

lmdeploy/lmdeploy/turbomind/deploy/converter.py: add module that read config.json from llm-compressor.
lmdeploy/lmdeploy/turbomind/deploy/policy.py: add module that unpack awq weights from llm-compressor.
lmdeploy/lmdeploy/turbomind/deploy/parameter.py: add module that can export weights from llm-compressor to turboMind.

Use cases (Optional)

from lmdeploy import pipeline, TurbomindEngineConfig
engine_config = TurbomindEngineConfig()
pipe = pipeline("{awq model path quantzied by llm-compressor}", 
                backend_config=engine_config)
response = pipe(["Hi, pls intro yourself", "Shanghai is"])
print(response)

Checklist

  1. Pre-commit or other linting tools are used to fix the potential lint issues.
  2. The modification is covered by complete unit tests. If not, please add more unit tests to ensure the correctness.
  3. If the modification has a dependency on downstream projects of a newer version, this PR should be tested with all supported versions of downstream projects.
  4. The documentation has been modified accordingly, like docstring or example tutorials.

Copilot AI review requested due to automatic review settings January 26, 2026 04:14
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds support for AWQ models quantized by llm-compressor in TurboMind by treating them as a variant of the AWQ format. The implementation adds handlers for the "compressed-tensors" quantization format throughout the deployment pipeline.

Changes:

  • Added support for parsing llm-compressor's "compressed-tensors" quantization config format
  • Implemented weight unpacking logic specific to llm-compressor's packed tensor format
  • Added parameter export handlers to convert llm-compressor weights to TurboMind format

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 7 comments.

File Description
lmdeploy/turbomind/deploy/converter.py Adds parsing logic for compressed-tensors config and converts format to 'awq' internally
lmdeploy/turbomind/deploy/policy.py Implements process_compressed_tensor function to unpack llm-compressor quantized weights
lmdeploy/turbomind/deploy/parameter.py Adds QuantWeightCompressorOnly class to handle llm-compressor weight parameter export

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +153 to +160
_config_groups = quant_config.get('config_groups')
_group_0 = _config_groups.get('group_0')
_format = _group_0.get('format')
assert _format == 'pack-quantized', \
f'format is {_format}' \
f'only pack-quantized format is supported' \
f'when quant_method is compressed-tensors'
_weights = _group_0.get('weights')
Copy link

Copilot AI Jan 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code does not handle the case where group_0 might be None or missing. If _config_groups.get('group_0') returns None, the subsequent _group_0.get('format') will raise an AttributeError. Consider adding a check to ensure this nested key exists before accessing it.

Suggested change
_config_groups = quant_config.get('config_groups')
_group_0 = _config_groups.get('group_0')
_format = _group_0.get('format')
assert _format == 'pack-quantized', \
f'format is {_format}' \
f'only pack-quantized format is supported' \
f'when quant_method is compressed-tensors'
_weights = _group_0.get('weights')
_config_groups = quant_config.get('config_groups') or {}
_group_0 = _config_groups.get('group_0') or {}
_format = _group_0.get('format')
assert _format == 'pack-quantized', \
f'format is {_format}' \
f'only pack-quantized format is supported' \
f'when quant_method is compressed-tensors'
_weights = _group_0.get('weights') or {}

Copilot uses AI. Check for mistakes.
Comment on lines +153 to +165
_config_groups = quant_config.get('config_groups')
_group_0 = _config_groups.get('group_0')
_format = _group_0.get('format')
assert _format == 'pack-quantized', \
f'format is {_format}' \
f'only pack-quantized format is supported' \
f'when quant_method is compressed-tensors'
_weights = _group_0.get('weights')
_group_size = _weights.get('group_size')
_num_bits = _weights.get('num_bits')
_type = _weights.get('type')
assert _num_bits == 4 and _type == 'int', \
'only 4 bit integer weighttype is supported' \
Copy link

Copilot AI Jan 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code does not handle the case where weights might be None or missing. If _group_0.get('weights') returns None, the subsequent calls to _weights.get('group_size'), _weights.get('num_bits'), and _weights.get('type') will raise AttributeError. Consider adding a check to ensure this nested key exists before accessing it.

Suggested change
_config_groups = quant_config.get('config_groups')
_group_0 = _config_groups.get('group_0')
_format = _group_0.get('format')
assert _format == 'pack-quantized', \
f'format is {_format}' \
f'only pack-quantized format is supported' \
f'when quant_method is compressed-tensors'
_weights = _group_0.get('weights')
_group_size = _weights.get('group_size')
_num_bits = _weights.get('num_bits')
_type = _weights.get('type')
assert _num_bits == 4 and _type == 'int', \
'only 4 bit integer weighttype is supported' \
_config_groups = quant_config.get('config_groups') or {}
_group_0 = _config_groups.get('group_0') or {}
_format = _group_0.get('format')
assert _format == 'pack-quantized', \
f'format is {_format}' \
f'only pack-quantized format is supported' \
f'when quant_method is compressed-tensors'
_weights = _group_0.get('weights') or {}
_group_size = _weights.get('group_size')
_num_bits = _weights.get('num_bits')
_type = _weights.get('type')
assert _group_size is not None and _num_bits == 4 \
and _type == 'int', \
'only 4 bit integer weighttype is supported ' \

Copilot uses AI. Check for mistakes.
_num_bits = _weights.get('num_bits')
_type = _weights.get('type')
assert _num_bits == 4 and _type == 'int', \
'only 4 bit integer weighttype is supported' \
Copy link

Copilot AI Jan 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The word "weighttype" should be two separate words: "weight type". This improves readability and follows standard English conventions.

Suggested change
'only 4 bit integer weighttype is supported' \
'only 4 bit integer weight type is supported' \

Copilot uses AI. Check for mistakes.
if model_format in ['awq', 'gptq']:
if model_format in ['awq', 'gptq', 'compressed-tensors']:
weight_type = 'int4'
dtype = 'float16' # force float16 for GPTQ/AWQ weights
Copy link

Copilot AI Jan 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment refers to "GPTQ/AWQ weights" but now compressed-tensors format is also included in this conditional branch. The comment should be updated to "GPTQ/AWQ/compressed-tensors weights" or more generically "int4 quantized weights" to accurately reflect the code behavior.

Suggested change
dtype = 'float16' # force float16 for GPTQ/AWQ weights
dtype = 'float16' # force float16 for int4 quantized weights

Copilot uses AI. Check for mistakes.
Comment on lines +157 to +165
f'format is {_format}' \
f'only pack-quantized format is supported' \
f'when quant_method is compressed-tensors'
_weights = _group_0.get('weights')
_group_size = _weights.get('group_size')
_num_bits = _weights.get('num_bits')
_type = _weights.get('type')
assert _num_bits == 4 and _type == 'int', \
'only 4 bit integer weighttype is supported' \
Copy link

Copilot AI Jan 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error message strings are not properly joined and are missing spaces between the concatenated f-strings. This will produce a message like "format is pack-quantizedonly pack-quantized format is supportedwhen quant_method is compressed-tensors" instead of the intended message with proper spacing.

Suggested change
f'format is {_format}' \
f'only pack-quantized format is supported' \
f'when quant_method is compressed-tensors'
_weights = _group_0.get('weights')
_group_size = _weights.get('group_size')
_num_bits = _weights.get('num_bits')
_type = _weights.get('type')
assert _num_bits == 4 and _type == 'int', \
'only 4 bit integer weighttype is supported' \
f'format is {_format}. ' \
'only pack-quantized format is supported ' \
'when quant_method is compressed-tensors'
_weights = _group_0.get('weights')
_group_size = _weights.get('group_size')
_num_bits = _weights.get('num_bits')
_type = _weights.get('type')
assert _num_bits == 4 and _type == 'int', \
'only 4 bit integer weight type is supported ' \

Copilot uses AI. Check for mistakes.
Comment on lines +157 to +165
f'format is {_format}' \
f'only pack-quantized format is supported' \
f'when quant_method is compressed-tensors'
_weights = _group_0.get('weights')
_group_size = _weights.get('group_size')
_num_bits = _weights.get('num_bits')
_type = _weights.get('type')
assert _num_bits == 4 and _type == 'int', \
'only 4 bit integer weighttype is supported' \
Copy link

Copilot AI Jan 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error message strings are not properly joined and are missing spaces between the concatenated strings. This will produce a message like "only 4 bit integer weighttype is supportedwhen format is pack-quantized" instead of the intended message with proper spacing.

Suggested change
f'format is {_format}' \
f'only pack-quantized format is supported' \
f'when quant_method is compressed-tensors'
_weights = _group_0.get('weights')
_group_size = _weights.get('group_size')
_num_bits = _weights.get('num_bits')
_type = _weights.get('type')
assert _num_bits == 4 and _type == 'int', \
'only 4 bit integer weighttype is supported' \
f'format is {_format} ' \
f'only pack-quantized format is supported ' \
f'when quant_method is compressed-tensors'
_weights = _group_0.get('weights')
_group_size = _weights.get('group_size')
_num_bits = _weights.get('num_bits')
_type = _weights.get('type')
assert _num_bits == 4 and _type == 'int', \
'only 4 bit integer weighttype is supported ' \

Copilot uses AI. Check for mistakes.
Comment on lines +153 to +160
_config_groups = quant_config.get('config_groups')
_group_0 = _config_groups.get('group_0')
_format = _group_0.get('format')
assert _format == 'pack-quantized', \
f'format is {_format}' \
f'only pack-quantized format is supported' \
f'when quant_method is compressed-tensors'
_weights = _group_0.get('weights')
Copy link

Copilot AI Jan 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code does not handle the case where config_groups might be None or missing from the quant_config. If quant_config.get('config_groups') returns None, the subsequent _config_groups.get('group_0') will raise an AttributeError. Consider adding a check to ensure the nested structure exists before accessing it.

Suggested change
_config_groups = quant_config.get('config_groups')
_group_0 = _config_groups.get('group_0')
_format = _group_0.get('format')
assert _format == 'pack-quantized', \
f'format is {_format}' \
f'only pack-quantized format is supported' \
f'when quant_method is compressed-tensors'
_weights = _group_0.get('weights')
_config_groups = quant_config.get('config_groups')
assert isinstance(_config_groups, dict), \
f'invalid quant_config: expected "config_groups" dict, ' \
f'got {_config_groups!r}'
_group_0 = _config_groups.get('group_0')
assert isinstance(_group_0, dict), \
f'invalid quant_config: expected "group_0" dict, ' \
f'got {_group_0!r}'
_format = _group_0.get('format')
assert _format == 'pack-quantized', \
f'format is {_format}' \
f'only pack-quantized format is supported' \
f'when quant_method is compressed-tensors'
_weights = _group_0.get('weights')
assert isinstance(_weights, dict), \
f'invalid quant_config: expected "weights" dict, ' \
f'got {_weights!r}'

Copilot uses AI. Check for mistakes.
@lvhan028 lvhan028 added the enhancement New feature or request label Jan 26, 2026
f(i, g('weight'), 'weight', identity)


class QuantWeightCompressorOnly(Parameter):
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about Int4CompressedTensor? @lzhangzz

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CompressedWeight

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@43758726 Please use "CompressedWeight"

@lvhan028
Copy link
Copy Markdown
Collaborator

cc @zhulinJulia24 we need to add llm-compressor quantized model into our test case.

@lvhan028 lvhan028 merged commit bce2098 into InternLM:main Jan 28, 2026
8 of 9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature] 请求支持compressed-tensors的各种量化格式

4 participants