[Performance] Optimize radix cache eviction performance#14339
[Performance] Optimize radix cache eviction performance#14339stmatengss merged 1 commit intosgl-project:mainfrom
Conversation
|
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
|
/tag-and-rerun-ci |
86533f5 to
3a846cf
Compare
|
let's get this rebased after merging this PR: #13334 |
|
/tag-and-rerun-ci |
|
|
||
| for child in node.children.values(): | ||
| if not child.evicted: | ||
| if node in self.evictable_leaves: |
There was a problem hiding this comment.
should we remove this? as long as there are non-evicted child, the node should be removed from the list
There was a problem hiding this comment.
When a new leaf node (x) is added to the device pool, both _update_leaf_status(x) and _update_leaf_status(x.parent) should be called, as the parent node needs to update its status based on the child node's state. (In HiCache Insert() func)
There was a problem hiding this comment.
I mean, should we remove the line 756
There was a problem hiding this comment.
or alternatively should we do update leaf node update in a different order? update for itself and then check for the parent node.
There was a problem hiding this comment.
If the parent node was not previously in the list (e.g., when it had existing children and a new child is added), the remove operation will throw an error without this validation.
If we only increment or decrement the lock reference count on a node without changing the tree structure, there is no need to update the parent's status.
I will always update node itself and check it's parent like _update_leaf_status(x) + _update_leaf_status(x.parent) only if the tree structure changed(insert, delete, evict, promote).
|
/rerun-failed-ci |
|
/rerun-failed-ci |
Signed-off-by: Xingrui Yi <yixingrui@linux.alibaba.com> Co-authored-by: Xuchun Shang <xuchun.shang@gmail.com>
|
/rerun-failed-ci |
1 similar comment
|
/rerun-failed-ci |
…14339) Signed-off-by: Xingrui Yi <yixingrui@linux.alibaba.com> Co-authored-by: Xuchun Shang <xuchun.shang@gmail.com>
…14339) Signed-off-by: Xingrui Yi <yixingrui@linux.alibaba.com> Co-authored-by: Xuchun Shang <xuchun.shang@gmail.com>
…14339) Signed-off-by: Xingrui Yi <yixingrui@linux.alibaba.com> Co-authored-by: Xuchun Shang <xuchun.shang@gmail.com>
Motivation
Currently, the RadixCache.evict method calls _collect_leaves() to traverse the entire Radix Tree to find evictable nodes. This operation has a time complexity of O(N), where N is the total number of nodes in the tree.
In high-concurrency scenarios where the GPU memory is fully utilized, evict is triggered frequently. The O(N) traversal causes significant CPU overhead and leads to latency jitter (spikes) during the decoding phase, especially when the Radix Tree is large.
Modifications
Introduced self.evictable_leaves:
Changed from a dynamic search to an incrementally maintained Set
Incremental Updates:
Updated insert, _delete_leaf, inc_lock_ref, and dec_lock_ref methods to add/remove nodes from self.evictable_leaves dynamically when their state changes (e.g., reference count drops to 0 or a node becomes a leaf).
Benchmarking and Profiling
Radix Cache Test
Test on H20 TP8 Qwen/Qwen3-0.6B
Before optimze: 7 ms for each eviction

After optimize: 0.5ms for each eviction
HiCache Test
Checklist