[Builtin] Sliding window and sink support for PagedKVCache by MasterJH5574 · Pull Request #16729 · apache/tvm

MasterJH5574 · 2024-03-15T21:39:33Z

This PR supports sliding window attention and attention sink for PagedKVCache, so that PagedKVCache can back models such as Mistral.

Meanwhile, this PR removes the "Attention" function (without fused-qkv) from AttentionKVCache interface, given its usage is now completely covered by the "AttentionWithFusedQKV" function. Considering the cost of maintenance, we decide to remove it for now. When in the future there is the need of this function, we will add it back.

This PR also unifies the global function names of the PagedKVCache with the KVState introduced earlier, and introduces a new KV cache raw info query function to get the current total sequence length in the KV cache.

This PR supports sliding window attention and attention sink for PagedKVCache, so that PagedKVCache can back models such as Mistral. Meanwhile, this PR removes the "Attention" function (without fused-qkv) from AttentionKVCache interface, given its usage is now completely covered by the "AttentionWithFusedQKV" function. Considering the cost of maintenance, we decide to remove it for now. When in the future there is the need of this function, we will add it back. This PR also unifies the global function names of the PagedKVCache with the KVState introduced earlier, and introduces a new KV cache raw info query function to get the current total sequence length in the KV cache.

) This PR supports SWA under PagedKVCache, and with this, now all models running on WebLLM will be compiled with PagedKVCache (rather than the old KVCache that is no longer maintained). Hence, we removed codes in `llm_chat.ts` that were for backward compatibility. However, old wasms would still work with npm <= 0.2.29 since wasm versioning will be introduced with 0.2.30. Note that API for `forwardTokensAndSample()` is changed since we no longer need `curPos`. Relevant PRs: - mlc-ai/mlc-llm#1967 - apache/tvm#16729

) This PR supports sliding window attention and attention sink for PagedKVCache, so that PagedKVCache can back models such as Mistral. Meanwhile, this PR removes the "Attention" function (without fused-qkv) from AttentionKVCache interface, given its usage is now completely covered by the "AttentionWithFusedQKV" function. Considering the cost of maintenance, we decide to remove it for now. When in the future there is the need of this function, we will add it back. This PR also unifies the global function names of the PagedKVCache with the KVState introduced earlier, and introduces a new KV cache raw info query function to get the current total sequence length in the KV cache.

…lc-ai#351) This PR supports SWA under PagedKVCache, and with this, now all models running on WebLLM will be compiled with PagedKVCache (rather than the old KVCache that is no longer maintained). Hence, we removed codes in `llm_chat.ts` that were for backward compatibility. However, old wasms would still work with npm <= 0.2.29 since wasm versioning will be introduced with 0.2.30. Note that API for `forwardTokensAndSample()` is changed since we no longer need `curPos`. Relevant PRs: - mlc-ai/mlc-llm#1967 - apache/tvm#16729

tqchen approved these changes Mar 15, 2024

View reviewed changes

MasterJH5574 force-pushed the tvm-dev/2024-03-15-kv-cache-sliding-window branch 4 times, most recently from b0f5506 to 8ea3b5f Compare March 16, 2024 03:39

MasterJH5574 force-pushed the tvm-dev/2024-03-15-kv-cache-sliding-window branch from 8ea3b5f to 52fbbd7 Compare March 16, 2024 04:21

MasterJH5574 mentioned this pull request Mar 16, 2024

[Model] Migrate Mistral to use PagedKVCache mlc-ai/mlc-llm#1967

Merged

tqchen merged commit b8f64c2 into apache:main Mar 16, 2024

MasterJH5574 mentioned this pull request Mar 17, 2024

[Bug] vm.builtin.attention_kv_cache_enable_sliding_window_for_seq mlc-ai/mlc-llm#1969

Closed

CharlieFRuan mentioned this pull request Apr 1, 2024

[SWA] Support SWA in pagedKVCache, remove backward compatible code mlc-ai/web-llm#351

Merged

ysh329 mentioned this pull request Apr 21, 2024

[Release] v0.16.0 Release Candidate Notes #16911

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Builtin] Sliding window and sink support for PagedKVCache#16729

[Builtin] Sliding window and sink support for PagedKVCache#16729
tqchen merged 1 commit intoapache:mainfrom
MasterJH5574:tvm-dev/2024-03-15-kv-cache-sliding-window

MasterJH5574 commented Mar 15, 2024 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

MasterJH5574 commented Mar 15, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

MasterJH5574 commented Mar 15, 2024 •

edited

Loading