feat(provider): add DashScope text-embedding-v4 embedding provider#8315
feat(provider): add DashScope text-embedding-v4 embedding provider#8315LLsetnow wants to merge 2 commits into
Conversation
- Create dashscope_embedding_source.py with OpenAI-compatible AsyncOpenAI client - Register the provider in manager.py dynamic_import_provider() - Add default config template with text-embedding-v4 model in default.py - Add i18n hint entries for en-US, zh-CN, ru-RU Closes AstrBotDevs#8067 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
There was a problem hiding this comment.
Hey - I've found 1 issue, and left some high level feedback:
- In
DashscopeEmbeddingProvider.__init__, you create a customhttpx.AsyncClientfor proxy support but don’t keep a reference to it; consider storing it onselfand explicitly closing it interminate()to avoid leaking connections. provider_settingsis assigned toself.provider_settingsinDashscopeEmbeddingProvider.__init__but never used; consider removing it from the instance state or wiring it into behavior if it’s intended to configure the provider.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- In `DashscopeEmbeddingProvider.__init__`, you create a custom `httpx.AsyncClient` for proxy support but don’t keep a reference to it; consider storing it on `self` and explicitly closing it in `terminate()` to avoid leaking connections.
- `provider_settings` is assigned to `self.provider_settings` in `DashscopeEmbeddingProvider.__init__` but never used; consider removing it from the instance state or wiring it into behavior if it’s intended to configure the provider.
## Individual Comments
### Comment 1
<location path="astrbot/core/provider/sources/dashscope_embedding_source.py" line_range="26" />
<code_context>
+ http_client = None
+ if proxy:
+ logger.info(f"[DashScope Embedding] {provider_id} Using proxy: {proxy}")
+ http_client = httpx.AsyncClient(proxy=proxy)
+ api_base = (
+ provider_config.get(
</code_context>
<issue_to_address>
**issue (bug_risk):** The `httpx.AsyncClient` initialization uses an unsupported `proxy` argument and will raise at runtime.
httpx expects the `proxies` keyword, not `proxy`, so this will raise a `TypeError` and prevent the provider from initializing. Please switch to `httpx.AsyncClient(proxies=proxy)` (or the correct mapping form) and confirm it matches httpx’s expected config format.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| http_client = None | ||
| if proxy: | ||
| logger.info(f"[DashScope Embedding] {provider_id} Using proxy: {proxy}") | ||
| http_client = httpx.AsyncClient(proxy=proxy) |
There was a problem hiding this comment.
issue (bug_risk): The httpx.AsyncClient initialization uses an unsupported proxy argument and will raise at runtime.
httpx expects the proxies keyword, not proxy, so this will raise a TypeError and prevent the provider from initializing. Please switch to httpx.AsyncClient(proxies=proxy) (or the correct mapping form) and confirm it matches httpx’s expected config format.
There was a problem hiding this comment.
Code Review
This pull request introduces support for the DashScope text-embedding-v4 provider, adding the necessary configuration, dynamic provider loading, and the core implementation of the DashscopeEmbeddingProvider class. Localization files were also updated to include user hints for the new provider. Feedback identifies redundant attribute assignments in the constructor, a potential IndexError when processing API responses, and opportunities to improve the get_dim method by providing a default dimension value and refactoring duplicated logic.
| super().__init__(provider_config, provider_settings) | ||
| self.provider_config = provider_config | ||
| self.provider_settings = provider_settings |
There was a problem hiding this comment.
The assignments of self.provider_config and self.provider_settings are redundant. These attributes are already initialized and stored by the EmbeddingProvider (and its parent AbstractProvider) base class constructors when super().__init__(provider_config, provider_settings) is called.
| super().__init__(provider_config, provider_settings) | |
| self.provider_config = provider_config | |
| self.provider_settings = provider_settings | |
| super().__init__(provider_config, provider_settings) |
| async def get_embedding(self, text: str) -> list[float]: | ||
| kwargs = self._embedding_kwargs() | ||
| embedding = await self.client.embeddings.create( | ||
| input=text, | ||
| model=self.model, | ||
| **kwargs, | ||
| ) | ||
| return embedding.data[0].embedding |
There was a problem hiding this comment.
Accessing embedding.data[0] without checking if the data list is non-empty could lead to an IndexError. While the API is expected to return data for a valid request, it is safer to handle cases where the response might be empty or malformed.
| async def get_embedding(self, text: str) -> list[float]: | |
| kwargs = self._embedding_kwargs() | |
| embedding = await self.client.embeddings.create( | |
| input=text, | |
| model=self.model, | |
| **kwargs, | |
| ) | |
| return embedding.data[0].embedding | |
| async def get_embedding(self, text: str) -> list[float]: | |
| kwargs = self._embedding_kwargs() | |
| embedding = await self.client.embeddings.create( | |
| input=text, | |
| model=self.model, | |
| **kwargs, | |
| ) | |
| if not embedding.data: | |
| raise Exception("DashScope API returned no embedding data.") | |
| return embedding.data[0].embedding |
| def get_dim(self) -> int: | ||
| if "embedding_dimensions" in self.provider_config: | ||
| try: | ||
| return int(self.provider_config["embedding_dimensions"]) | ||
| except (ValueError, TypeError): | ||
| logger.warning( | ||
| f"embedding_dimensions in embedding configs is not a valid integer: '{self.provider_config['embedding_dimensions']}', ignored." | ||
| ) | ||
| return 0 |
There was a problem hiding this comment.
The get_dim method returns 0 if embedding_dimensions is missing or invalid, which can cause issues during vector database initialization. Since this provider is specifically for text-embedding-v4, it should default to 1024 (the model's default dimension). Additionally, the parsing logic is duplicated from _embedding_kwargs and should be refactored into a shared helper function to avoid code duplication.
def get_dim(self) -> int:
try:
return int(self.provider_config.get("embedding_dimensions", 1024))
except (ValueError, TypeError):
return 1024References
- When implementing similar functionality for different cases, refactor the logic into a shared helper function to avoid code duplication.
- Remove redundant self.provider_config / self.provider_settings assignments - Add empty response guard in get_embedding() - Change get_dim() default from 0 to 1024 for text-embedding-v4 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Summary
Add DashScope (Alibaba Cloud) text-embedding-v4 as a new embedding provider. Uses OpenAI-compatible API (
dashscope.aliyuncs.com/compatible-mode/v1).Changes
astrbot/core/provider/sources/dashscope_embedding_source.py— Provider implementationastrbot/core/provider/manager.py— Register indynamic_import_provider()astrbot/core/config/default.py— Add default config templateVerification
ruff format --check .passedruff check .passedCloses #8067
Summary by Sourcery
Add a new DashScope text-embedding-v4 embedding provider and wire it into the provider system and default configuration.
New Features:
Enhancements: