Skip to content

[relax] Fix tree attention for Qwen2-1.5 models#17700

Merged
MasterJH5574 merged 1 commit intoapache:mainfrom
Hzfengsy:fix_qwen_1.5b
Mar 3, 2025
Merged

[relax] Fix tree attention for Qwen2-1.5 models#17700
MasterJH5574 merged 1 commit intoapache:mainfrom
Hzfengsy:fix_qwen_1.5b

Conversation

@Hzfengsy
Copy link
Copy Markdown
Member

@Hzfengsy Hzfengsy commented Mar 3, 2025

Fix the compilation error(mlc-ai/mlc-llm#3143) for Qwen2-1.5 models in the tree attention implementation for vulkan backend.

cc @spectrometerHBH @vinx13

Fix the compilation error for Qwen2-1.5 models in the tree attention
implementation for vulkan backend.
@Hzfengsy
Copy link
Copy Markdown
Member Author

Hzfengsy commented Mar 3, 2025

One additional note: this PR provides an immediate fix for the issue, but it doesn't address the underlying problem - the simplifier can potentially cause integer overflow. For illustration, here's a minimal reproducible example:

import tvm

x = tvm.tir.Var("x", "int32")
# Creating an expression that triggers integer overflow during simplification
expr = (tvm.tir.Div(x + 1073741826, 3) - 357913942) * 1536
ana = tvm.arith.Analyzer()
print(ana.simplify(expr))

cc @tqchen

Copy link
Copy Markdown
Contributor

@MasterJH5574 MasterJH5574 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix!

@MasterJH5574 MasterJH5574 merged commit c286638 into apache:main Mar 3, 2025
ShiboXing pushed a commit to ShiboXing/tvm that referenced this pull request Aug 10, 2025
Fix the compilation error for Qwen2-1.5 models in the tree attention
implementation for vulkan backend.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants