Conversation
JiwaniZakir
left a comment
There was a problem hiding this comment.
The fix correctly addresses a logic bug in export_vllm.py. The original config.quant.weight.get('granularity', 'per_block') was effectively always truthy — any other granularity string like 'per_channel' would also evaluate to True, causing this branch to be entered incorrectly for non-per_block configurations.
One subtle behavioral change worth noting: when the 'granularity' key is absent entirely, the old code defaulted to 'per_block' (truthy, entering the branch), while the new code returns None == 'per_block' → False (skipping the branch). It's worth confirming whether omitting granularity should imply per_block behavior or fall through — if the former, the fix should use config.quant.weight.get('granularity', 'per_block') == 'per_block' to preserve that default.
The commented-out elif on line 33 should be removed rather than left in; dead code in a condition chain adds noise and could confuse future readers about the intended logic.
debug: original judge will get into per-block when per-channel or per-token quant when export vllm