[Feat] add deepseek eagle3 #5194

zhaozx-cn · 2025-12-19T08:22:10Z

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

vLLM version: v0.12.0
vLLM main: vllm-project/vllm@ad32e3e

Signed-off-by: zhaozx-cn <zhaozx2116@163.com> Co-authored-by: drslark <slarksblood@qq.com> Co-authored-by: hwhaokun <haokun0405@163.com>

gemini-code-assist

Code Review

This pull request adds support for deepseek eagle3 speculative decoding. The changes involve patching several parts of the codebase to handle the eagle3 method, including model verification, attention metadata building, and model-specific logic for deepseek_v2. My review identifies several areas where the code could be made more robust and maintainable. Specifically, there are multiple instances of complex, hardcoded conditions that are brittle and difficult to understand. There is also a critical issue with monkey-patching a model's __init__ method, which poses a significant maintenance risk.

gemini-code-assist · 2025-12-19T08:24:58Z

vllm_ascend/patch/worker/patch_deepseek.py

+def __init__(self, *, vllm_config: VllmConfig, prefix: str = ""):
+    super().__init__()

+    config = vllm_config.model_config.hf_config
+    quant_config = vllm_config.quant_config
+    self.config = config
+    self.device = current_platform.device_type
+
+    self.vocab_size = config.vocab_size
+    self.is_v32 = hasattr(config, "index_topk")
+    if self.is_v32:
+        topk_tokens = config.index_topk
+        topk_indices_buffer = torch.empty(
+            vllm_config.scheduler_config.max_num_batched_tokens,
+            topk_tokens,
+            dtype=torch.int32,
+            device=self.device,
+        )
+    else:
+        topk_indices_buffer = None
+
+    if get_pp_group().is_first_rank:
+        self.embed_tokens = VocabParallelEmbedding(
+            config.vocab_size,
+            config.hidden_size,
+            quant_config=quant_config,
+            prefix=f"{prefix}.embed_tokens",
+        )
+    else:
+        self.embed_tokens = PPMissingLayer()
+    self.start_layer, self.end_layer, self.layers = make_layers(
+        config.num_hidden_layers,
+        lambda prefix: DeepseekV2DecoderLayer(
+            vllm_config, prefix, topk_indices_buffer=topk_indices_buffer
+        ),
+        prefix=f"{prefix}.layers",
+    )
+
+    if get_pp_group().is_last_rank:
+        self.norm = RMSNorm(config.hidden_size, eps=config.rms_norm_eps)
+    else:
+        self.norm = PPMissingLayer()
+    self.make_empty_intermediate_tensors = make_empty_intermediate_tensors_factory(
+        ["hidden_states", "residual"], config.hidden_size
+    )
+    self.aux_hidden_state_layers: tuple[int, ...] = ()
+
+DeepseekV2Model.__init__ = __init__


This patch completely overwrites the DeepseekV2Model.__init__ method just to add self.aux_hidden_state_layers. This monkey-patching approach is extremely fragile and poses a significant maintenance risk. If the original __init__ method in vllm changes in a future update, this patch will likely break or introduce subtle bugs. A less invasive approach should be used. If wrapping the original __init__ is not feasible, please add a prominent comment warning about the fragility of this patch and the necessity of keeping it synchronized with the upstream vllm implementation.

gemini-code-assist · 2025-12-19T08:24:58Z

vllm_ascend/spec_decode/eagle_proposer.py

+        if self.vllm_config.model_config.hf_config.model_type == "deepseek_v3":
+            builder = self.runner.attn_groups[0][1].get_metadata_builder()


The logic for selecting the metadata builder for deepseek_v3 uses a hardcoded index [1]. This is brittle and not self-documenting. If the structure of attn_groups changes, this could lead to silent failures. Please add a comment explaining why index [1] is chosen for deepseek_v3. For better maintainability, consider refactoring this logic into a helper function that finds the correct builder based on its properties rather than a fixed index.

gemini-code-assist · 2025-12-19T08:24:58Z

vllm_ascend/worker/model_runner_v1.py

+                    if isinstance(builder, AscendAttentionMetadataBuilder):
+                        attn_state = AscendAttentionState.DecodeOnly


The line if isinstance(builder, AscendAttentionMetadataBuilder): attn_state = AscendAttentionState.DecodeOnly unconditionally overrides the attn_state. This lacks context and could hide underlying issues or make debugging difficult. Please add a comment explaining why this state override is necessary for AscendAttentionMetadataBuilder during dummy metadata creation.

gemini-code-assist · 2025-12-19T08:24:58Z

vllm_ascend/worker/model_runner_v1.py

+                    if not self.model_config.use_mla or (self.speculative_config and 
+                                                        self.speculative_config.method == "eagle3" and 
+                                                        self.vllm_config.model_config.hf_config.model_type == "deepseek_v3" and 
+                                                        str(eagle_layer) in kv_cache_tensor.shared_by[0]):


This conditional logic is very complex and hard to read, making it difficult to maintain. It combines multiple checks, including string comparisons for model type and layer names, which are brittle. Please refactor this logic into a separate helper function with a descriptive name (e.g., _should_use_full_attention_for_eagle3(...)) to improve code clarity and maintainability.

gemini-code-assist · 2025-12-19T08:24:58Z

vllm_ascend/worker/model_runner_v1.py

+                        if self.speculative_config and self.speculative_config.method == 'eagle3' and str(eagle_layer) in layer_name:
+                            k_shape = kv_cache_shape
+                            v_shape = k_shape


Similar to a previous comment, this conditional logic for eagle3 is very specific and uses a brittle string check (str(eagle_layer) in layer_name). This makes the code hard to understand and maintain. Please encapsulate this logic in a helper function to improve readability and robustness.

github-actions · 2025-12-19T11:25:24Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

github-actions · 2025-12-20T01:43:55Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

add deepseek eagle3

5629b5e

Signed-off-by: zhaozx-cn <zhaozx2116@163.com> Co-authored-by: drslark <slarksblood@qq.com> Co-authored-by: hwhaokun <haokun0405@163.com>

gemini-code-assist bot reviewed Dec 19, 2025

View reviewed changes

zhaozx-cn changed the title ~~add deepseek eagle3~~ [Feat] add deepseek eagle3 Dec 19, 2025

github-actions bot added the merge-conflicts label Dec 20, 2025

zhaozx-cn closed this Dec 22, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feat] add deepseek eagle3 #5194

[Feat] add deepseek eagle3 #5194

Uh oh!

zhaozx-cn commented Dec 19, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Dec 19, 2025

Uh oh!

gemini-code-assist bot Dec 19, 2025

Uh oh!

gemini-code-assist bot Dec 19, 2025

Uh oh!

gemini-code-assist bot Dec 19, 2025

Uh oh!

gemini-code-assist bot Dec 19, 2025

Uh oh!

github-actions bot commented Dec 19, 2025

Uh oh!

github-actions bot commented Dec 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		if self.vllm_config.model_config.hf_config.model_type == "deepseek_v3":
		builder = self.runner.attn_groups[0][1].get_metadata_builder()

		if isinstance(builder, AscendAttentionMetadataBuilder):
		attn_state = AscendAttentionState.DecodeOnly

[Feat] add deepseek eagle3 #5194

[Feat] add deepseek eagle3 #5194

Uh oh!

Conversation

zhaozx-cn commented Dec 19, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Dec 19, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Dec 19, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Dec 19, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Dec 19, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Dec 19, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Dec 19, 2025

Uh oh!

github-actions bot commented Dec 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

zhaozx-cn commented Dec 19, 2025 •

edited by github-actions bot

Loading