[quantization] Precompute RoPE position embeddings in QuantQwen3VLTextModel by dvsav · Pull Request #636 · Samsung/TICO

dvsav · 2026-04-16T13:43:53Z

What

This PR implements precomputation of position_ids and position_embeddings in QuantQwen3VLTextModel.

Why

The detailed motivation for this change is described in the issue Qwen3VL: Fixed Input Data Format At Inference Time.
Long story short:

Precomputation of rotary position embeddings is common in TICO (see tico/quantization/wrapq/wrappers/qwen_vl/quant_vision_model.py or tico/quantization/wrapq/wrappers/llama/quant_decoder_layer.py).
Precomputation of position_ids and position_embeddings in QuantQwen3VLTextModel becomes possible due to the fixation of THW and starting position of visual data within the input prompt.
Precomputing position_embeddings eliminates the need to wrap/quantize Qwen3VLTextRotaryEmbedding module.

BEFORE

$ python tico/quantization/wrapq/examples/qwen/quantize_model.py
┌───────────── Quantization Error Summary ─────────────
│ Mean |diff|: 0.188704
│ PEIR       : 23.927758 %
└──────────────────────────────────────────────────────
    ┌────────────────────────────────────────────┐
 3.5┤                                            │
    │                                        ••  │
    │                                 •  • •••   │
 2.3┤                                 • •••••    │
    │                              ••••••••      │
    │                          ••• ••••••        │
    │                     •    •••••••• •        │
 1.1┤                   •••  •••••••••••         │
    │                     ••••••••••• •          │
    │                   ••••••••••••             │
-0.2┤                • ••••••••••                │
    │                •••••••••••                 │
    │            •••••••••••• •                  │
    │            ••••••••••                      │
-1.4┤           •••••••••                        │
    │         ••••••••• •                        │
    │      ••••••••••                            │
-2.6┤      •••••••                               │
    │     ••••                                   │
    │  •••• •                                    │
    │  •                                         │
-3.8┤                                            │
    └┬──────────┬──────────┬─────────┬──────────┬┘
   -3.8       -2.0       -0.2       1.7       3.5 

[QuantCheck] WARNING: 34 nodes without qparam detected (see logs).
Circle model saved as 'qwen3vl_model.q.circle'

AFTER

$ python tico/quantization/wrapq/examples/qwen/quantize_model.py
┌───────────── Quantization Error Summary ─────────────
│ Mean |diff|: 0.227891
│ PEIR       : 23.428268 %
└──────────────────────────────────────────────────────
    ┌────────────────────────────────────────────┐
 3.5┤                                            │
    │                                       •••  │
    │                                 • ••••••   │
 2.3┤                            •  • •••••••    │
    │                            • ••••••••      │
    │                         •••••••••••        │
 1.1┤                   • • • •••••••••••        │
    │                    ••••••••••••••          │
    │                  •••••••••••••             │
-0.2┤                ••••••••••••••              │
    │               ••••••••••••                 │
    │            ••••••••••••••                  │
    │           ••••••••••••                     │
-1.4┤         •••••••••••                        │
    │      •••••••••••  •                        │
    │      ••••••• •• •                          │
-2.6┤     •••••                                  │
    │  •••••                                     │
    │  •                                         │
-3.8┤                                            │
    └┬──────────┬──────────┬─────────┬──────────┬┘
   -3.8       -2.0       -0.2       1.7       3.5 

Circle model saved as 'qwen3vl_model.q.circle'

Notice that the warning [QuantCheck] WARNING: 34 nodes without qparam detected (see logs). has disappeared.

$ python tico/quantization/wrapq/examples/qwen/quantize_text_model.py
┌───────────── Quantization Error Summary ─────────────
│ Mean |diff|: 0.132488
│ PEIR       : 8.692164 %
└──────────────────────────────────────────────────────
    ┌────────────────────────────────────────────┐
 4.0┤                                            │
    │                                       •••  │
    │                                    •••••   │
 2.5┤                                  ••••••    │
    │                                ••••••      │
    │                              •••••••       │
    │                           ••••••••         │
 1.0┤                         ••••••••           │
    │                       ••••••••             │
    │                     ••••••••               │
-0.6┤                   ••••••••                 │
    │                 ••••••••                   │
    │               •••••••                      │
    │             ••••••••                       │
-2.1┤            •••••••                         │
    │         •••••••                            │
    │        ••••••                              │
-3.6┤        •••                                 │
    │     ••••                                   │
    │                                            │
    │  •                                         │
-5.1┤                                            │
    └┬──────────┬──────────┬─────────┬──────────┬┘
   -5.1       -2.9       -0.6       1.7       4.0 

Circle model saved as 'qwen3vl_text_model.q.circle'

Unit Tests

test_quant_text_model.py

$ python -m pytest test/quantization/wrapq/wrappers/qwen_vl/test_quant_text_model.py -v
============================================================ test session starts ============================================================
platform linux -- Python 3.10.12, pytest-8.4.0, pluggy-1.6.0 -- /home/d.savchenkov/myenv/bin/python
cachedir: .pytest_cache
rootdir: /home/d.savchenkov/TICO
configfile: pyproject.toml
plugins: anyio-4.12.0, mock-3.15.1, xdist-3.7.0, cov-6.2.1
collected 15 items                                                                                                                          

test/quantization/wrapq/wrappers/qwen_vl/test_quant_text_model.py::TestQuantQwen3VLTextModel::test_deepstack_injection PASSED         [  6%]
test/quantization/wrapq/wrappers/qwen_vl/test_quant_text_model.py::TestQuantQwen3VLTextModel::test_different_batch_sizes PASSED       [ 13%]
test/quantization/wrapq/wrappers/qwen_vl/test_quant_text_model.py::TestQuantQwen3VLTextModel::test_different_sequence_lengths PASSED  [ 20%]
test/quantization/wrapq/wrappers/qwen_vl/test_quant_text_model.py::TestQuantQwen3VLTextModel::test_embedding_layer_quantization PASSED [ 26%]
test/quantization/wrapq/wrappers/qwen_vl/test_quant_text_model.py::TestQuantQwen3VLTextModel::test_forward_diff PASSED                [ 33%]
test/quantization/wrapq/wrappers/qwen_vl/test_quant_text_model.py::TestQuantQwen3VLTextModel::test_inputs_embeds_path PASSED          [ 40%]
test/quantization/wrapq/wrappers/qwen_vl/test_quant_text_model.py::TestQuantQwen3VLTextModel::test_layers_wrapped PASSED              [ 46%]
test/quantization/wrapq/wrappers/qwen_vl/test_quant_text_model.py::TestQuantQwen3VLTextModel::test_mode_transitions PASSED            [ 53%]
test/quantization/wrapq/wrappers/qwen_vl/test_quant_text_model.py::TestQuantQwen3VLTextModel::test_no_cache_mode PASSED               [ 60%]
test/quantization/wrapq/wrappers/qwen_vl/test_quant_text_model.py::TestQuantQwen3VLTextModel::test_norm_wrapped PASSED                [ 66%]
test/quantization/wrapq/wrappers/qwen_vl/test_quant_text_model.py::TestQuantQwen3VLTextModel::test_observer_count PASSED              [ 73%]
test/quantization/wrapq/wrappers/qwen_vl/test_quant_text_model.py::TestQuantQwen3VLTextModel::test_output_shape PASSED                [ 80%]
test/quantization/wrapq/wrappers/qwen_vl/test_quant_text_model.py::TestQuantQwen3VLTextModel::test_per_module_override PASSED         [ 86%]
test/quantization/wrapq/wrappers/qwen_vl/test_quant_text_model.py::TestQuantQwen3VLTextModel::test_registration_in_registry PASSED    [ 93%]
test/quantization/wrapq/wrappers/qwen_vl/test_quant_text_model.py::TestQuantQwen3VLTextModel::test_rotary_emb_not_wrapped PASSED      [100%]

====================================================== 15 passed, 2 warnings in 5.92s =======================================================

test_quant_model.py

$ python -m pytest test/quantization/wrapq/wrappers/qwen_vl/test_quant_model.py -v
============================================================ test session starts ============================================================
platform linux -- Python 3.10.12, pytest-8.4.0, pluggy-1.6.0 -- /home/d.savchenkov/myenv/bin/python
cachedir: .pytest_cache
rootdir: /home/d.savchenkov/TICO
configfile: pyproject.toml
plugins: anyio-4.12.0, mock-3.15.1, xdist-3.7.0, cov-6.2.1
collected 22 items                                                                                                                          

test/quantization/wrapq/wrappers/qwen_vl/test_quant_model.py::TestQuantQwen3VLModel::test_activation_stats_collected_text_only PASSED [  4%]
test/quantization/wrapq/wrappers/qwen_vl/test_quant_model.py::TestQuantQwen3VLModel::test_activation_stats_collected_with_images PASSED [  9%]
test/quantization/wrapq/wrappers/qwen_vl/test_quant_model.py::TestQuantQwen3VLModel::test_compute_3d_position_ids_reuses_cached_rope_deltas PASSED [ 13%]
test/quantization/wrapq/wrappers/qwen_vl/test_quant_model.py::TestQuantQwen3VLModel::test_different_batch_sizes_text_only PASSED      [ 18%]
test/quantization/wrapq/wrappers/qwen_vl/test_quant_model.py::TestQuantQwen3VLModel::test_dtype_override PASSED                       [ 22%]
test/quantization/wrapq/wrappers/qwen_vl/test_quant_model.py::TestQuantQwen3VLModel::test_forward_diff_text_only PASSED               [ 27%]
test/quantization/wrapq/wrappers/qwen_vl/test_quant_model.py::TestQuantQwen3VLModel::test_forward_input_validation PASSED             [ 31%]
test/quantization/wrapq/wrappers/qwen_vl/test_quant_model.py::TestQuantQwen3VLModel::test_forward_text_only PASSED                    [ 36%]
test/quantization/wrapq/wrappers/qwen_vl/test_quant_model.py::TestQuantQwen3VLModel::test_forward_with_both_images_and_videos PASSED  [ 40%]
test/quantization/wrapq/wrappers/qwen_vl/test_quant_model.py::TestQuantQwen3VLModel::test_forward_with_images PASSED                  [ 45%]
test/quantization/wrapq/wrappers/qwen_vl/test_quant_model.py::TestQuantQwen3VLModel::test_forward_with_inputs_embeds PASSED           [ 50%]
test/quantization/wrapq/wrappers/qwen_vl/test_quant_model.py::TestQuantQwen3VLModel::test_forward_with_inputs_embeds_and_images PASSED [ 54%]
test/quantization/wrapq/wrappers/qwen_vl/test_quant_model.py::TestQuantQwen3VLModel::test_forward_with_videos PASSED                  [ 59%]
test/quantization/wrapq/wrappers/qwen_vl/test_quant_model.py::TestQuantQwen3VLModel::test_get_rope_index_with_images_and_videos PASSED [ 63%]
test/quantization/wrapq/wrappers/qwen_vl/test_quant_model.py::TestQuantQwen3VLModel::test_graph_tracing_behavior_with_images PASSED   [ 68%]
test/quantization/wrapq/wrappers/qwen_vl/test_quant_model.py::TestQuantQwen3VLModel::test_graph_tracing_behavior_with_videos PASSED   [ 72%]
test/quantization/wrapq/wrappers/qwen_vl/test_quant_model.py::TestQuantQwen3VLModel::test_mode_transitions PASSED                     [ 77%]
test/quantization/wrapq/wrappers/qwen_vl/test_quant_model.py::TestQuantQwen3VLModel::test_multiple_calibration_steps_text_only PASSED [ 81%]
test/quantization/wrapq/wrappers/qwen_vl/test_quant_model.py::TestQuantQwen3VLModel::test_observer_count PASSED                       [ 86%]
test/quantization/wrapq/wrappers/qwen_vl/test_quant_model.py::TestQuantQwen3VLModel::test_registration_in_registry PASSED             [ 90%]
test/quantization/wrapq/wrappers/qwen_vl/test_quant_model.py::TestQuantQwen3VLModel::test_rope_deltas_computed_after_forward PASSED   [ 95%]
test/quantization/wrapq/wrappers/qwen_vl/test_quant_model.py::TestQuantQwen3VLModel::test_wraps_submodules PASSED                     [100%]

====================================================== 22 passed, 2 warnings in 11.00s ======================================================

test_quant_for_conditional_generation.py

$ python -m pytest test/quantization/wrapq/wrappers/qwen_vl/test_quant_for_conditional_generation.py -v
============================================================ test session starts ============================================================
platform linux -- Python 3.10.12, pytest-8.4.0, pluggy-1.6.0 -- /home/d.savchenkov/myenv/bin/python
cachedir: .pytest_cache
rootdir: /home/d.savchenkov/TICO
configfile: pyproject.toml
plugins: anyio-4.12.0, mock-3.15.1, xdist-3.7.0, cov-6.2.1
collected 7 items                                                                                                                           

test/quantization/wrapq/wrappers/qwen_vl/test_quant_for_conditional_generation.py::TestQuantQwen3VLForConditionalGeneration::test_forward_text_only PASSED [ 14%]
test/quantization/wrapq/wrappers/qwen_vl/test_quant_for_conditional_generation.py::TestQuantQwen3VLForConditionalGeneration::test_forward_with_both_images_and_videos PASSED [ 28%]
test/quantization/wrapq/wrappers/qwen_vl/test_quant_for_conditional_generation.py::TestQuantQwen3VLForConditionalGeneration::test_forward_with_images PASSED [ 42%]
test/quantization/wrapq/wrappers/qwen_vl/test_quant_for_conditional_generation.py::TestQuantQwen3VLForConditionalGeneration::test_forward_with_videos PASSED [ 57%]
test/quantization/wrapq/wrappers/qwen_vl/test_quant_for_conditional_generation.py::TestQuantQwen3VLForConditionalGeneration::test_mode_transitions PASSED [ 71%]
test/quantization/wrapq/wrappers/qwen_vl/test_quant_for_conditional_generation.py::TestQuantQwen3VLForConditionalGeneration::test_registration_in_registry PASSED [ 85%]
test/quantization/wrapq/wrappers/qwen_vl/test_quant_for_conditional_generation.py::TestQuantQwen3VLForConditionalGeneration::test_wraps_submodules PASSED [100%]

======================================================= 7 passed, 2 warnings in 6.05s =======================================================

…tModel This PR implements precomputation of position_ids and position_embeddings in QuantQwen3VLTextModel. TICO-DCO-1.0-Signed-off-by: d.savchenkov <d.savchenkov@partner.samsung.com>

dvsav changed the title ~~[quantization] Precompute RoPE position embeddings in QuantQwen3VLTex…~~ [quantization] Precompute RoPE position embeddings in QuantQwen3VLTextModel Apr 16, 2026

dvsav force-pushed the quant_text_model_precompute_rope branch from c03ba39 to 18a86b2 Compare April 16, 2026 14:31

[quantization] Precompute RoPE position embeddings in QuantQwen3VLTex…

f0c112a

…tModel This PR implements precomputation of position_ids and position_embeddings in QuantQwen3VLTextModel. TICO-DCO-1.0-Signed-off-by: d.savchenkov <d.savchenkov@partner.samsung.com>

dvsav force-pushed the quant_text_model_precompute_rope branch from 18a86b2 to f0c112a Compare April 16, 2026 14:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[quantization] Precompute RoPE position embeddings in QuantQwen3VLTextModel#636

[quantization] Precompute RoPE position embeddings in QuantQwen3VLTextModel#636
dvsav wants to merge 1 commit intoSamsung:mainfrom
dvsav:quant_text_model_precompute_rope

dvsav commented Apr 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

dvsav commented Apr 16, 2026

What

Why

BEFORE

AFTER

Unit Tests

test_quant_text_model.py

test_quant_model.py

test_quant_for_conditional_generation.py

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant