Skip to content

feat(provider): add DashScope text-embedding-v4 embedding provider#8315

Open
LLsetnow wants to merge 2 commits into
AstrBotDevs:masterfrom
LLsetnow:feat/8067-add-dashscope-embedding
Open

feat(provider): add DashScope text-embedding-v4 embedding provider#8315
LLsetnow wants to merge 2 commits into
AstrBotDevs:masterfrom
LLsetnow:feat/8067-add-dashscope-embedding

Conversation

@LLsetnow
Copy link
Copy Markdown

@LLsetnow LLsetnow commented May 24, 2026

Summary

Add DashScope (Alibaba Cloud) text-embedding-v4 as a new embedding provider. Uses OpenAI-compatible API (dashscope.aliyuncs.com/compatible-mode/v1).

Changes

  • New: astrbot/core/provider/sources/dashscope_embedding_source.py — Provider implementation
  • Modified: astrbot/core/provider/manager.py — Register in dynamic_import_provider()
  • Modified: astrbot/core/config/default.py — Add default config template
  • Modified: 3 i18n locale files — Add hint translations

Verification

DashScope Embedding in WebUI

  • ruff format --check . passed
  • ruff check . passed

Closes #8067

Summary by Sourcery

Add a new DashScope text-embedding-v4 embedding provider and wire it into the provider system and default configuration.

New Features:

  • Introduce DashscopeEmbeddingProvider implementing DashScope text-embedding-v4 embeddings via an OpenAI-compatible API endpoint.
  • Expose DashScope embedding as a configurable provider option in the default configuration, including proxy, model, and dimension settings.

Enhancements:

  • Register the DashScope embedding provider in the dynamic provider manager for runtime loading.
  • Extend i18n config metadata to include UI hints for configuring the DashScope embedding provider.

- Create dashscope_embedding_source.py with OpenAI-compatible AsyncOpenAI client
- Register the provider in manager.py dynamic_import_provider()
- Add default config template with text-embedding-v4 model in default.py
- Add i18n hint entries for en-US, zh-CN, ru-RU

Closes AstrBotDevs#8067

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@auto-assign auto-assign Bot requested review from Fridemn and anka-afk May 24, 2026 13:06
@dosubot dosubot Bot added size:L This PR changes 100-499 lines, ignoring generated files. area:provider The bug / feature is about AI Provider, Models, LLM Agent, LLM Agent Runner. labels May 24, 2026
Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 1 issue, and left some high level feedback:

  • In DashscopeEmbeddingProvider.__init__, you create a custom httpx.AsyncClient for proxy support but don’t keep a reference to it; consider storing it on self and explicitly closing it in terminate() to avoid leaking connections.
  • provider_settings is assigned to self.provider_settings in DashscopeEmbeddingProvider.__init__ but never used; consider removing it from the instance state or wiring it into behavior if it’s intended to configure the provider.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- In `DashscopeEmbeddingProvider.__init__`, you create a custom `httpx.AsyncClient` for proxy support but don’t keep a reference to it; consider storing it on `self` and explicitly closing it in `terminate()` to avoid leaking connections.
- `provider_settings` is assigned to `self.provider_settings` in `DashscopeEmbeddingProvider.__init__` but never used; consider removing it from the instance state or wiring it into behavior if it’s intended to configure the provider.

## Individual Comments

### Comment 1
<location path="astrbot/core/provider/sources/dashscope_embedding_source.py" line_range="26" />
<code_context>
+        http_client = None
+        if proxy:
+            logger.info(f"[DashScope Embedding] {provider_id} Using proxy: {proxy}")
+            http_client = httpx.AsyncClient(proxy=proxy)
+        api_base = (
+            provider_config.get(
</code_context>
<issue_to_address>
**issue (bug_risk):** The `httpx.AsyncClient` initialization uses an unsupported `proxy` argument and will raise at runtime.

httpx expects the `proxies` keyword, not `proxy`, so this will raise a `TypeError` and prevent the provider from initializing. Please switch to `httpx.AsyncClient(proxies=proxy)` (or the correct mapping form) and confirm it matches httpx’s expected config format.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

http_client = None
if proxy:
logger.info(f"[DashScope Embedding] {provider_id} Using proxy: {proxy}")
http_client = httpx.AsyncClient(proxy=proxy)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (bug_risk): The httpx.AsyncClient initialization uses an unsupported proxy argument and will raise at runtime.

httpx expects the proxies keyword, not proxy, so this will raise a TypeError and prevent the provider from initializing. Please switch to httpx.AsyncClient(proxies=proxy) (or the correct mapping form) and confirm it matches httpx’s expected config format.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for the DashScope text-embedding-v4 provider, adding the necessary configuration, dynamic provider loading, and the core implementation of the DashscopeEmbeddingProvider class. Localization files were also updated to include user hints for the new provider. Feedback identifies redundant attribute assignments in the constructor, a potential IndexError when processing API responses, and opportunities to improve the get_dim method by providing a default dimension value and refactoring duplicated logic.

Comment on lines +18 to +20
super().__init__(provider_config, provider_settings)
self.provider_config = provider_config
self.provider_settings = provider_settings
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The assignments of self.provider_config and self.provider_settings are redundant. These attributes are already initialized and stored by the EmbeddingProvider (and its parent AbstractProvider) base class constructors when super().__init__(provider_config, provider_settings) is called.

Suggested change
super().__init__(provider_config, provider_settings)
self.provider_config = provider_config
self.provider_settings = provider_settings
super().__init__(provider_config, provider_settings)

Comment on lines +47 to +54
async def get_embedding(self, text: str) -> list[float]:
kwargs = self._embedding_kwargs()
embedding = await self.client.embeddings.create(
input=text,
model=self.model,
**kwargs,
)
return embedding.data[0].embedding
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Accessing embedding.data[0] without checking if the data list is non-empty could lead to an IndexError. While the API is expected to return data for a valid request, it is safer to handle cases where the response might be empty or malformed.

Suggested change
async def get_embedding(self, text: str) -> list[float]:
kwargs = self._embedding_kwargs()
embedding = await self.client.embeddings.create(
input=text,
model=self.model,
**kwargs,
)
return embedding.data[0].embedding
async def get_embedding(self, text: str) -> list[float]:
kwargs = self._embedding_kwargs()
embedding = await self.client.embeddings.create(
input=text,
model=self.model,
**kwargs,
)
if not embedding.data:
raise Exception("DashScope API returned no embedding data.")
return embedding.data[0].embedding

Comment on lines +76 to +84
def get_dim(self) -> int:
if "embedding_dimensions" in self.provider_config:
try:
return int(self.provider_config["embedding_dimensions"])
except (ValueError, TypeError):
logger.warning(
f"embedding_dimensions in embedding configs is not a valid integer: '{self.provider_config['embedding_dimensions']}', ignored."
)
return 0
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The get_dim method returns 0 if embedding_dimensions is missing or invalid, which can cause issues during vector database initialization. Since this provider is specifically for text-embedding-v4, it should default to 1024 (the model's default dimension). Additionally, the parsing logic is duplicated from _embedding_kwargs and should be refactored into a shared helper function to avoid code duplication.

    def get_dim(self) -> int:
        try:
            return int(self.provider_config.get("embedding_dimensions", 1024))
        except (ValueError, TypeError):
            return 1024
References
  1. When implementing similar functionality for different cases, refactor the logic into a shared helper function to avoid code duplication.

- Remove redundant self.provider_config / self.provider_settings assignments
- Add empty response guard in get_embedding()
- Change get_dim() default from 0 to 1024 for text-embedding-v4

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:provider The bug / feature is about AI Provider, Models, LLM Agent, LLM Agent Runner. size:L This PR changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature] 添加阿里云嵌入模型

1 participant