debug vllm per-block by Charles2530 · Pull Request #456 · ModelTC/LightCompress

Charles2530 · 2026-03-16T03:58:05Z

debug: original judge will get into per-block when per-channel or per-token quant when export vllm

JiwaniZakir

The fix correctly addresses a logic bug in export_vllm.py. The original config.quant.weight.get('granularity', 'per_block') was effectively always truthy — any other granularity string like 'per_channel' would also evaluate to True, causing this branch to be entered incorrectly for non-per_block configurations.

One subtle behavioral change worth noting: when the 'granularity' key is absent entirely, the old code defaulted to 'per_block' (truthy, entering the branch), while the new code returns None == 'per_block' → False (skipping the branch). It's worth confirming whether omitting granularity should imply per_block behavior or fall through — if the former, the fix should use config.quant.weight.get('granularity', 'per_block') == 'per_block' to preserve that default.

The commented-out elif on line 33 should be removed rather than left in; dead code in a condition chain adds noise and could confuse future readers about the intended logic.

debug vllm per-block

63e1825

JiwaniZakir reviewed Apr 1, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

debug vllm per-block#456

debug vllm per-block#456
Charles2530 wants to merge 1 commit intoModelTC:mainfrom
Charles2530:feat/block

Charles2530 commented Mar 16, 2026

Uh oh!

JiwaniZakir left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Charles2530 commented Mar 16, 2026

Uh oh!

JiwaniZakir left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants