Skip to content

[quantization] Support PTQ quantization for VLM Qwen3-vl#635

Draft
Torrero wants to merge 1 commit intoSamsung:mainfrom
Torrero:qwen_ptq_quantization
Draft

[quantization] Support PTQ quantization for VLM Qwen3-vl#635
Torrero wants to merge 1 commit intoSamsung:mainfrom
Torrero:qwen_ptq_quantization

Conversation

@Torrero
Copy link
Copy Markdown
Contributor

@Torrero Torrero commented Apr 16, 2026

This PR introduces PTQ quantization for VLM Qwen3-vl.

For benchmarks evaluation Qwen3VLForConditionalGeneration has been inherited from the GenerationMixin.
In addition, quant_vision_model.py was changed by introducing logic for dynamically computing position_embeddings,cu_seqlens, or using precomputed values with a fixed grid_thw to export a static graph.

TICO-DCO-1.0-Signed-off-by: Evgenii Maltsev e.maltsev@samsung.com

This commit introduces PTQ quantization for VLM Qwen3-vl.

TICO-DCO-1.0-Signed-off-by:  Evgenii Maltsev <e.maltsev@samsung.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant