Skip to content

[quantization] Precompute RoPE position embeddings in QuantQwen3VLTextModel#636

Draft
dvsav wants to merge 1 commit intoSamsung:mainfrom
dvsav:quant_text_model_precompute_rope
Draft

[quantization] Precompute RoPE position embeddings in QuantQwen3VLTextModel#636
dvsav wants to merge 1 commit intoSamsung:mainfrom
dvsav:quant_text_model_precompute_rope

Conversation

@dvsav
Copy link
Copy Markdown
Contributor

@dvsav dvsav commented Apr 16, 2026

What

This PR implements precomputation of position_ids and position_embeddings in QuantQwen3VLTextModel.

Why

The detailed motivation for this change is described in the issue Qwen3VL: Fixed Input Data Format At Inference Time.
Long story short:

  • Precomputation of rotary position embeddings is common in TICO (see tico/quantization/wrapq/wrappers/qwen_vl/quant_vision_model.py or tico/quantization/wrapq/wrappers/llama/quant_decoder_layer.py).
  • Precomputation of position_ids and position_embeddings in QuantQwen3VLTextModel becomes possible due to the fixation of THW and starting position of visual data within the input prompt.
  • Precomputing position_embeddings eliminates the need to wrap/quantize Qwen3VLTextRotaryEmbedding module.

BEFORE

$ python tico/quantization/wrapq/examples/qwen/quantize_model.py
┌───────────── Quantization Error Summary ─────────────
│ Mean |diff|: 0.188704
│ PEIR       : 23.927758 %
└──────────────────────────────────────────────────────
    ┌────────────────────────────────────────────┐
 3.5┤                                            │
    │                                        ••  │
    │                                 •  • •••   │
 2.3┤                                 • •••••    │
    │                              ••••••••      │
    │                          ••• ••••••        │
    │                     •    •••••••• •        │
 1.1┤                   •••  •••••••••••         │
    │                     ••••••••••• •          │
    │                   ••••••••••••             │
-0.2┤                • ••••••••••                │
    │                •••••••••••                 │
    │            •••••••••••• •                  │
    │            ••••••••••                      │
-1.4┤           •••••••••                        │
    │         ••••••••• •                        │
    │      ••••••••••                            │
-2.6┤      •••••••                               │
    │     ••••                                   │
    │  •••• •                                    │
    │  •                                         │
-3.8┤                                            │
    └┬──────────┬──────────┬─────────┬──────────┬┘
   -3.8       -2.0       -0.2       1.7       3.5 

[QuantCheck] WARNING: 34 nodes without qparam detected (see logs).
Circle model saved as 'qwen3vl_model.q.circle'

AFTER

$ python tico/quantization/wrapq/examples/qwen/quantize_model.py
┌───────────── Quantization Error Summary ─────────────
│ Mean |diff|: 0.227891
│ PEIR       : 23.428268 %
└──────────────────────────────────────────────────────
    ┌────────────────────────────────────────────┐
 3.5┤                                            │
    │                                       •••  │
    │                                 • ••••••   │
 2.3┤                            •  • •••••••    │
    │                            • ••••••••      │
    │                         •••••••••••        │
 1.1┤                   • • • •••••••••••        │
    │                    ••••••••••••••          │
    │                  •••••••••••••             │
-0.2┤                ••••••••••••••              │
    │               ••••••••••••                 │
    │            ••••••••••••••                  │
    │           ••••••••••••                     │
-1.4┤         •••••••••••                        │
    │      •••••••••••  •                        │
    │      ••••••• •• •                          │
-2.6┤     •••••                                  │
    │  •••••                                     │
    │  •                                         │
-3.8┤                                            │
    └┬──────────┬──────────┬─────────┬──────────┬┘
   -3.8       -2.0       -0.2       1.7       3.5 

Circle model saved as 'qwen3vl_model.q.circle'

Notice that the warning [QuantCheck] WARNING: 34 nodes without qparam detected (see logs). has disappeared.

$ python tico/quantization/wrapq/examples/qwen/quantize_text_model.py
┌───────────── Quantization Error Summary ─────────────
│ Mean |diff|: 0.132488
│ PEIR       : 8.692164 %
└──────────────────────────────────────────────────────
    ┌────────────────────────────────────────────┐
 4.0┤                                            │
    │                                       •••  │
    │                                    •••••   │
 2.5┤                                  ••••••    │
    │                                ••••••      │
    │                              •••••••       │
    │                           ••••••••         │
 1.0┤                         ••••••••           │
    │                       ••••••••             │
    │                     ••••••••               │
-0.6┤                   ••••••••                 │
    │                 ••••••••                   │
    │               •••••••                      │
    │             ••••••••                       │
-2.1┤            •••••••                         │
    │         •••••••                            │
    │        ••••••                              │
-3.6┤        •••                                 │
    │     ••••                                   │
    │                                            │
    │  •                                         │
-5.1┤                                            │
    └┬──────────┬──────────┬─────────┬──────────┬┘
   -5.1       -2.9       -0.6       1.7       4.0 

Circle model saved as 'qwen3vl_text_model.q.circle'

Unit Tests

test_quant_text_model.py

$ python -m pytest test/quantization/wrapq/wrappers/qwen_vl/test_quant_text_model.py -v
============================================================ test session starts ============================================================
platform linux -- Python 3.10.12, pytest-8.4.0, pluggy-1.6.0 -- /home/d.savchenkov/myenv/bin/python
cachedir: .pytest_cache
rootdir: /home/d.savchenkov/TICO
configfile: pyproject.toml
plugins: anyio-4.12.0, mock-3.15.1, xdist-3.7.0, cov-6.2.1
collected 15 items                                                                                                                          

test/quantization/wrapq/wrappers/qwen_vl/test_quant_text_model.py::TestQuantQwen3VLTextModel::test_deepstack_injection PASSED         [  6%]
test/quantization/wrapq/wrappers/qwen_vl/test_quant_text_model.py::TestQuantQwen3VLTextModel::test_different_batch_sizes PASSED       [ 13%]
test/quantization/wrapq/wrappers/qwen_vl/test_quant_text_model.py::TestQuantQwen3VLTextModel::test_different_sequence_lengths PASSED  [ 20%]
test/quantization/wrapq/wrappers/qwen_vl/test_quant_text_model.py::TestQuantQwen3VLTextModel::test_embedding_layer_quantization PASSED [ 26%]
test/quantization/wrapq/wrappers/qwen_vl/test_quant_text_model.py::TestQuantQwen3VLTextModel::test_forward_diff PASSED                [ 33%]
test/quantization/wrapq/wrappers/qwen_vl/test_quant_text_model.py::TestQuantQwen3VLTextModel::test_inputs_embeds_path PASSED          [ 40%]
test/quantization/wrapq/wrappers/qwen_vl/test_quant_text_model.py::TestQuantQwen3VLTextModel::test_layers_wrapped PASSED              [ 46%]
test/quantization/wrapq/wrappers/qwen_vl/test_quant_text_model.py::TestQuantQwen3VLTextModel::test_mode_transitions PASSED            [ 53%]
test/quantization/wrapq/wrappers/qwen_vl/test_quant_text_model.py::TestQuantQwen3VLTextModel::test_no_cache_mode PASSED               [ 60%]
test/quantization/wrapq/wrappers/qwen_vl/test_quant_text_model.py::TestQuantQwen3VLTextModel::test_norm_wrapped PASSED                [ 66%]
test/quantization/wrapq/wrappers/qwen_vl/test_quant_text_model.py::TestQuantQwen3VLTextModel::test_observer_count PASSED              [ 73%]
test/quantization/wrapq/wrappers/qwen_vl/test_quant_text_model.py::TestQuantQwen3VLTextModel::test_output_shape PASSED                [ 80%]
test/quantization/wrapq/wrappers/qwen_vl/test_quant_text_model.py::TestQuantQwen3VLTextModel::test_per_module_override PASSED         [ 86%]
test/quantization/wrapq/wrappers/qwen_vl/test_quant_text_model.py::TestQuantQwen3VLTextModel::test_registration_in_registry PASSED    [ 93%]
test/quantization/wrapq/wrappers/qwen_vl/test_quant_text_model.py::TestQuantQwen3VLTextModel::test_rotary_emb_not_wrapped PASSED      [100%]

====================================================== 15 passed, 2 warnings in 5.92s =======================================================

test_quant_model.py

$ python -m pytest test/quantization/wrapq/wrappers/qwen_vl/test_quant_model.py -v
============================================================ test session starts ============================================================
platform linux -- Python 3.10.12, pytest-8.4.0, pluggy-1.6.0 -- /home/d.savchenkov/myenv/bin/python
cachedir: .pytest_cache
rootdir: /home/d.savchenkov/TICO
configfile: pyproject.toml
plugins: anyio-4.12.0, mock-3.15.1, xdist-3.7.0, cov-6.2.1
collected 22 items                                                                                                                          

test/quantization/wrapq/wrappers/qwen_vl/test_quant_model.py::TestQuantQwen3VLModel::test_activation_stats_collected_text_only PASSED [  4%]
test/quantization/wrapq/wrappers/qwen_vl/test_quant_model.py::TestQuantQwen3VLModel::test_activation_stats_collected_with_images PASSED [  9%]
test/quantization/wrapq/wrappers/qwen_vl/test_quant_model.py::TestQuantQwen3VLModel::test_compute_3d_position_ids_reuses_cached_rope_deltas PASSED [ 13%]
test/quantization/wrapq/wrappers/qwen_vl/test_quant_model.py::TestQuantQwen3VLModel::test_different_batch_sizes_text_only PASSED      [ 18%]
test/quantization/wrapq/wrappers/qwen_vl/test_quant_model.py::TestQuantQwen3VLModel::test_dtype_override PASSED                       [ 22%]
test/quantization/wrapq/wrappers/qwen_vl/test_quant_model.py::TestQuantQwen3VLModel::test_forward_diff_text_only PASSED               [ 27%]
test/quantization/wrapq/wrappers/qwen_vl/test_quant_model.py::TestQuantQwen3VLModel::test_forward_input_validation PASSED             [ 31%]
test/quantization/wrapq/wrappers/qwen_vl/test_quant_model.py::TestQuantQwen3VLModel::test_forward_text_only PASSED                    [ 36%]
test/quantization/wrapq/wrappers/qwen_vl/test_quant_model.py::TestQuantQwen3VLModel::test_forward_with_both_images_and_videos PASSED  [ 40%]
test/quantization/wrapq/wrappers/qwen_vl/test_quant_model.py::TestQuantQwen3VLModel::test_forward_with_images PASSED                  [ 45%]
test/quantization/wrapq/wrappers/qwen_vl/test_quant_model.py::TestQuantQwen3VLModel::test_forward_with_inputs_embeds PASSED           [ 50%]
test/quantization/wrapq/wrappers/qwen_vl/test_quant_model.py::TestQuantQwen3VLModel::test_forward_with_inputs_embeds_and_images PASSED [ 54%]
test/quantization/wrapq/wrappers/qwen_vl/test_quant_model.py::TestQuantQwen3VLModel::test_forward_with_videos PASSED                  [ 59%]
test/quantization/wrapq/wrappers/qwen_vl/test_quant_model.py::TestQuantQwen3VLModel::test_get_rope_index_with_images_and_videos PASSED [ 63%]
test/quantization/wrapq/wrappers/qwen_vl/test_quant_model.py::TestQuantQwen3VLModel::test_graph_tracing_behavior_with_images PASSED   [ 68%]
test/quantization/wrapq/wrappers/qwen_vl/test_quant_model.py::TestQuantQwen3VLModel::test_graph_tracing_behavior_with_videos PASSED   [ 72%]
test/quantization/wrapq/wrappers/qwen_vl/test_quant_model.py::TestQuantQwen3VLModel::test_mode_transitions PASSED                     [ 77%]
test/quantization/wrapq/wrappers/qwen_vl/test_quant_model.py::TestQuantQwen3VLModel::test_multiple_calibration_steps_text_only PASSED [ 81%]
test/quantization/wrapq/wrappers/qwen_vl/test_quant_model.py::TestQuantQwen3VLModel::test_observer_count PASSED                       [ 86%]
test/quantization/wrapq/wrappers/qwen_vl/test_quant_model.py::TestQuantQwen3VLModel::test_registration_in_registry PASSED             [ 90%]
test/quantization/wrapq/wrappers/qwen_vl/test_quant_model.py::TestQuantQwen3VLModel::test_rope_deltas_computed_after_forward PASSED   [ 95%]
test/quantization/wrapq/wrappers/qwen_vl/test_quant_model.py::TestQuantQwen3VLModel::test_wraps_submodules PASSED                     [100%]

====================================================== 22 passed, 2 warnings in 11.00s ======================================================

test_quant_for_conditional_generation.py

$ python -m pytest test/quantization/wrapq/wrappers/qwen_vl/test_quant_for_conditional_generation.py -v
============================================================ test session starts ============================================================
platform linux -- Python 3.10.12, pytest-8.4.0, pluggy-1.6.0 -- /home/d.savchenkov/myenv/bin/python
cachedir: .pytest_cache
rootdir: /home/d.savchenkov/TICO
configfile: pyproject.toml
plugins: anyio-4.12.0, mock-3.15.1, xdist-3.7.0, cov-6.2.1
collected 7 items                                                                                                                           

test/quantization/wrapq/wrappers/qwen_vl/test_quant_for_conditional_generation.py::TestQuantQwen3VLForConditionalGeneration::test_forward_text_only PASSED [ 14%]
test/quantization/wrapq/wrappers/qwen_vl/test_quant_for_conditional_generation.py::TestQuantQwen3VLForConditionalGeneration::test_forward_with_both_images_and_videos PASSED [ 28%]
test/quantization/wrapq/wrappers/qwen_vl/test_quant_for_conditional_generation.py::TestQuantQwen3VLForConditionalGeneration::test_forward_with_images PASSED [ 42%]
test/quantization/wrapq/wrappers/qwen_vl/test_quant_for_conditional_generation.py::TestQuantQwen3VLForConditionalGeneration::test_forward_with_videos PASSED [ 57%]
test/quantization/wrapq/wrappers/qwen_vl/test_quant_for_conditional_generation.py::TestQuantQwen3VLForConditionalGeneration::test_mode_transitions PASSED [ 71%]
test/quantization/wrapq/wrappers/qwen_vl/test_quant_for_conditional_generation.py::TestQuantQwen3VLForConditionalGeneration::test_registration_in_registry PASSED [ 85%]
test/quantization/wrapq/wrappers/qwen_vl/test_quant_for_conditional_generation.py::TestQuantQwen3VLForConditionalGeneration::test_wraps_submodules PASSED [100%]

======================================================= 7 passed, 2 warnings in 6.05s =======================================================

@dvsav dvsav changed the title [quantization] Precompute RoPE position embeddings in QuantQwen3VLTex… [quantization] Precompute RoPE position embeddings in QuantQwen3VLTextModel Apr 16, 2026
@dvsav dvsav force-pushed the quant_text_model_precompute_rope branch from c03ba39 to 18a86b2 Compare April 16, 2026 14:31
…tModel

This PR implements precomputation of position_ids and position_embeddings in QuantQwen3VLTextModel.

TICO-DCO-1.0-Signed-off-by: d.savchenkov <d.savchenkov@partner.samsung.com>
@dvsav dvsav force-pushed the quant_text_model_precompute_rope branch from 18a86b2 to f0c112a Compare April 16, 2026 14:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant