Skip to content

Ensure correct module path when calling nnx.#2080

Merged
copybara-service[bot] merged 1 commit into
mainfrom
dangyi_fix_module_path
Aug 5, 2025
Merged

Ensure correct module path when calling nnx.#2080
copybara-service[bot] merged 1 commit into
mainfrom
dangyi_fix_module_path

Conversation

@liudangyi

@liudangyi liudangyi commented Aug 4, 2025

Copy link
Copy Markdown
Collaborator

Qwix relies on the linen module paths to apply the correct quantization configs. The module path is broken when part of the model is converted to nnx. This patch introduces a mechanism to update the linen module paths when calling a nnx module.

This patch also disables the quantization of dot-product attentions, which follows the same behavior as before.

For model_name=default, before:

[QWIX] module='decoder/layers/self_attention/query' op=dot_general0 rule=0
[QWIX] module='decoder/layers/self_attention/key' op=dot_general0 rule=0
[QWIX] module='decoder/layers/self_attention/value' op=dot_general0 rule=0
[QWIX] module='decoder/layers/self_attention/attention_op' op=einsum0 rule=0
[QWIX] module='decoder/layers/self_attention/attention_op' op=einsum1 rule=0
[QWIX] module='decoder/layers/self_attention/out' op=dot_general0 rule=0
[QWIX] module='decoder/layers/mlp' op=dot_general0 rule=0
[QWIX] module='decoder/layers/mlp' op=dot_general1 rule=0
[QWIX] module='decoder/layers/mlp' op=dot_general2 rule=0
[QWIX] module='decoder/logits_dense' op=dot_general0 rule=None

after:

[QWIX] module='decoder/layers/self_attention/query' op=dot_general0 rule=1
[QWIX] module='decoder/layers/self_attention/key' op=dot_general0 rule=1
[QWIX] module='decoder/layers/self_attention/value' op=dot_general0 rule=1
[QWIX] module='decoder/layers/self_attention/attention_op' op=einsum0 rule=0
[QWIX] module='decoder/layers/self_attention/attention_op' op=einsum1 rule=0
[QWIX] module='decoder/layers/self_attention/out' op=dot_general0 rule=1
[QWIX] module='decoder/layers/mlp/wi_0' op=dot_general0 rule=1
[QWIX] module='decoder/layers/mlp/wi_1' op=dot_general1 rule=1
[QWIX] module='decoder/layers/mlp/wo' op=dot_general2 rule=1
[QWIX] module='decoder/logits_dense' op=dot_general0 rule=None

For model_name=deepseek3-671b, before:

[QWIX] module='decoder/dense_layers/self_attention/wq_a' op=dot_general0 rule=0
[QWIX] module='decoder/dense_layers/self_attention/wq_b' op=dot_general0 rule=0
[QWIX] module='decoder/dense_layers/self_attention/wkv_a' op=dot_general0 rule=0
[QWIX] module='decoder/dense_layers/self_attention/wkv_b' op=dot_general0 rule=0
[QWIX] module='decoder/dense_layers/self_attention/attention_op' op=einsum0 rule=0
[QWIX] module='decoder/dense_layers/self_attention/attention_op' op=einsum1 rule=0
[QWIX] module='decoder/dense_layers/self_attention/out' op=dot_general0 rule=0
[QWIX] module='decoder/dense_layers/mlp' op=dot_general0 rule=0
[QWIX] module='decoder/dense_layers/mlp' op=dot_general1 rule=0
[QWIX] module='decoder/dense_layers/mlp' op=dot_general2 rule=0
[QWIX] module='decoder/moe_layers/self_attention/wq_a' op=dot_general0 rule=0
[QWIX] module='decoder/moe_layers/self_attention/wq_b' op=dot_general0 rule=0
[QWIX] module='decoder/moe_layers/self_attention/wkv_a' op=dot_general0 rule=0
[QWIX] module='decoder/moe_layers/self_attention/wkv_b' op=dot_general0 rule=0
[QWIX] module='decoder/moe_layers/self_attention/attention_op' op=einsum0 rule=0
[QWIX] module='decoder/moe_layers/self_attention/attention_op' op=einsum1 rule=0
[QWIX] module='decoder/moe_layers/self_attention/out' op=dot_general0 rule=0
[QWIX] module='decoder/moe_layers/DeepSeekMoeBlock_0/MoeBlock_0/gate' op=dot_general0 rule=0
[QWIX] module='decoder/moe_layers/DeepSeekMoeBlock_0/MoeBlock_0' op=einsum0 rule=0
[QWIX] module='decoder/moe_layers/DeepSeekMoeBlock_0/shared_experts' op=dot_general0 rule=0
[QWIX] module='decoder/moe_layers/DeepSeekMoeBlock_0/shared_experts' op=dot_general1 rule=0
[QWIX] module='decoder/moe_layers/DeepSeekMoeBlock_0/shared_experts' op=dot_general2 rule=0
[QWIX] module='decoder/logits_dense' op=dot_general0 rule=None

after:

[QWIX] module='decoder/dense_layers/self_attention/wq_a' op=dot_general0 rule=1
[QWIX] module='decoder/dense_layers/self_attention/wq_b' op=dot_general0 rule=1
[QWIX] module='decoder/dense_layers/self_attention/wkv_a' op=dot_general0 rule=1
[QWIX] module='decoder/dense_layers/self_attention/wkv_b' op=dot_general0 rule=1
[QWIX] module='decoder/dense_layers/self_attention/attention_op' op=einsum0 rule=0
[QWIX] module='decoder/dense_layers/self_attention/attention_op' op=einsum1 rule=0
[QWIX] module='decoder/dense_layers/self_attention/out' op=dot_general0 rule=1
[QWIX] module='decoder/dense_layers/mlp/wi_0' op=dot_general0 rule=1
[QWIX] module='decoder/dense_layers/mlp/wi_1' op=dot_general1 rule=1
[QWIX] module='decoder/dense_layers/mlp/wo' op=dot_general2 rule=1
[QWIX] module='decoder/moe_layers/self_attention/wq_a' op=dot_general0 rule=1
[QWIX] module='decoder/moe_layers/self_attention/wq_b' op=dot_general0 rule=1
[QWIX] module='decoder/moe_layers/self_attention/wkv_a' op=dot_general0 rule=1
[QWIX] module='decoder/moe_layers/self_attention/wkv_b' op=dot_general0 rule=1
[QWIX] module='decoder/moe_layers/self_attention/attention_op' op=einsum0 rule=0
[QWIX] module='decoder/moe_layers/self_attention/attention_op' op=einsum1 rule=0
[QWIX] module='decoder/moe_layers/self_attention/out' op=dot_general0 rule=1
[QWIX] module='decoder/moe_layers/DeepSeekMoeBlock_0/MoeBlock_0/gate' op=dot_general0 rule=1
[QWIX] module='decoder/moe_layers/DeepSeekMoeBlock_0/MoeBlock_0' op=einsum0 rule=1
[QWIX] module='decoder/moe_layers/DeepSeekMoeBlock_0/shared_experts/wi_0' op=dot_general0 rule=1
[QWIX] module='decoder/moe_layers/DeepSeekMoeBlock_0/shared_experts/wi_1' op=dot_general1 rule=1
[QWIX] module='decoder/moe_layers/DeepSeekMoeBlock_0/shared_experts/wo' op=dot_general2 rule=1
[QWIX] module='decoder/logits_dense' op=dot_general0 rule=None

I also tested with PR #2066 (migrating attention to nnx) and the module paths are the same.

Checklist

Before submitting this PR, please make sure (put X in square brackets):

  • I have performed a self-review of my code.
  • I have necessary comments in my code, particularly in hard-to-understand areas.
  • I have run end-to-end tests tests and provided workload links above if applicable.
  • I have made or will make corresponding changes to the doc if needed.

Qwix relies on the linen module paths to apply the correct quantization
configs. The module path is broken when part of the model is converted
to nnx. This patch introduces a mechanism to update the linen module
paths when calling a nnx module.

This patch also disables the quantization of dot-product attentions,
which follows the same behavior as before.

For model_name=default, before:

[QWIX] module='decoder/layers/self_attention/query' op=dot_general0 rule=0
[QWIX] module='decoder/layers/self_attention/key' op=dot_general0 rule=0
[QWIX] module='decoder/layers/self_attention/value' op=dot_general0 rule=0
[QWIX] module='decoder/layers/self_attention/attention_op' op=einsum0 rule=0
[QWIX] module='decoder/layers/self_attention/attention_op' op=einsum1 rule=0
[QWIX] module='decoder/layers/self_attention/out' op=dot_general0 rule=0
[QWIX] module='decoder/layers/mlp' op=dot_general0 rule=0
[QWIX] module='decoder/layers/mlp' op=dot_general1 rule=0
[QWIX] module='decoder/layers/mlp' op=dot_general2 rule=0
[QWIX] module='decoder/logits_dense' op=dot_general0 rule=None

after:

[QWIX] module='decoder/layers/self_attention/query' op=dot_general0 rule=1
[QWIX] module='decoder/layers/self_attention/key' op=dot_general0 rule=1
[QWIX] module='decoder/layers/self_attention/value' op=dot_general0 rule=1
[QWIX] module='decoder/layers/self_attention/attention_op' op=einsum0 rule=0
[QWIX] module='decoder/layers/self_attention/attention_op' op=einsum1 rule=0
[QWIX] module='decoder/layers/self_attention/out' op=dot_general0 rule=1
[QWIX] module='decoder/layers/mlp/wi_0' op=dot_general0 rule=1
[QWIX] module='decoder/layers/mlp/wi_1' op=dot_general1 rule=1
[QWIX] module='decoder/layers/mlp/wo' op=dot_general2 rule=1
[QWIX] module='decoder/logits_dense' op=dot_general0 rule=None

For model_name=deepseek3-671b, before:

[QWIX] module='decoder/dense_layers/self_attention/wq_a' op=dot_general0 rule=0
[QWIX] module='decoder/dense_layers/self_attention/wq_b' op=dot_general0 rule=0
[QWIX] module='decoder/dense_layers/self_attention/wkv_a' op=dot_general0 rule=0
[QWIX] module='decoder/dense_layers/self_attention/wkv_b' op=dot_general0 rule=0
[QWIX] module='decoder/dense_layers/self_attention/attention_op' op=einsum0 rule=0
[QWIX] module='decoder/dense_layers/self_attention/attention_op' op=einsum1 rule=0
[QWIX] module='decoder/dense_layers/self_attention/out' op=dot_general0 rule=0
[QWIX] module='decoder/dense_layers/mlp' op=dot_general0 rule=0
[QWIX] module='decoder/dense_layers/mlp' op=dot_general1 rule=0
[QWIX] module='decoder/dense_layers/mlp' op=dot_general2 rule=0
[QWIX] module='decoder/moe_layers/self_attention/wq_a' op=dot_general0 rule=0
[QWIX] module='decoder/moe_layers/self_attention/wq_b' op=dot_general0 rule=0
[QWIX] module='decoder/moe_layers/self_attention/wkv_a' op=dot_general0 rule=0
[QWIX] module='decoder/moe_layers/self_attention/wkv_b' op=dot_general0 rule=0
[QWIX] module='decoder/moe_layers/self_attention/attention_op' op=einsum0 rule=0
[QWIX] module='decoder/moe_layers/self_attention/attention_op' op=einsum1 rule=0
[QWIX] module='decoder/moe_layers/self_attention/out' op=dot_general0 rule=0
[QWIX] module='decoder/moe_layers/DeepSeekMoeBlock_0/MoeBlock_0/gate' op=dot_general0 rule=0
[QWIX] module='decoder/moe_layers/DeepSeekMoeBlock_0/MoeBlock_0' op=einsum0 rule=0
[QWIX] module='decoder/moe_layers/DeepSeekMoeBlock_0/shared_experts' op=dot_general0 rule=0
[QWIX] module='decoder/moe_layers/DeepSeekMoeBlock_0/shared_experts' op=dot_general1 rule=0
[QWIX] module='decoder/moe_layers/DeepSeekMoeBlock_0/shared_experts' op=dot_general2 rule=0
[QWIX] module='decoder/logits_dense' op=dot_general0 rule=None

after:

[QWIX] module='decoder/dense_layers/self_attention/wq_a' op=dot_general0 rule=1
[QWIX] module='decoder/dense_layers/self_attention/wq_b' op=dot_general0 rule=1
[QWIX] module='decoder/dense_layers/self_attention/wkv_a' op=dot_general0 rule=1
[QWIX] module='decoder/dense_layers/self_attention/wkv_b' op=dot_general0 rule=1
[QWIX] module='decoder/dense_layers/self_attention/attention_op' op=einsum0 rule=0
[QWIX] module='decoder/dense_layers/self_attention/attention_op' op=einsum1 rule=0
[QWIX] module='decoder/dense_layers/self_attention/out' op=dot_general0 rule=1
[QWIX] module='decoder/dense_layers/mlp/wi_0' op=dot_general0 rule=1
[QWIX] module='decoder/dense_layers/mlp/wi_1' op=dot_general1 rule=1
[QWIX] module='decoder/dense_layers/mlp/wo' op=dot_general2 rule=1
[QWIX] module='decoder/moe_layers/self_attention/wq_a' op=dot_general0 rule=1
[QWIX] module='decoder/moe_layers/self_attention/wq_b' op=dot_general0 rule=1
[QWIX] module='decoder/moe_layers/self_attention/wkv_a' op=dot_general0 rule=1
[QWIX] module='decoder/moe_layers/self_attention/wkv_b' op=dot_general0 rule=1
[QWIX] module='decoder/moe_layers/self_attention/attention_op' op=einsum0 rule=0
[QWIX] module='decoder/moe_layers/self_attention/attention_op' op=einsum1 rule=0
[QWIX] module='decoder/moe_layers/self_attention/out' op=dot_general0 rule=1
[QWIX] module='decoder/moe_layers/DeepSeekMoeBlock_0/MoeBlock_0/gate' op=dot_general0 rule=1
[QWIX] module='decoder/moe_layers/DeepSeekMoeBlock_0/MoeBlock_0' op=einsum0 rule=1
[QWIX] module='decoder/moe_layers/DeepSeekMoeBlock_0/shared_experts/wi_0' op=dot_general0 rule=1
[QWIX] module='decoder/moe_layers/DeepSeekMoeBlock_0/shared_experts/wi_1' op=dot_general1 rule=1
[QWIX] module='decoder/moe_layers/DeepSeekMoeBlock_0/shared_experts/wo' op=dot_general2 rule=1
[QWIX] module='decoder/logits_dense' op=dot_general0 rule=None

I also tested with PR #2066 (migrating attention to nnx) and the module
paths are the same.
@liudangyi liudangyi force-pushed the dangyi_fix_module_path branch from e715dfb to 0d7d2c3 Compare August 4, 2025 21:39
@copybara-service copybara-service Bot merged commit b293f8f into main Aug 5, 2025
20 checks passed
@copybara-service copybara-service Bot deleted the dangyi_fix_module_path branch August 5, 2025 04:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants