-
-
Notifications
You must be signed in to change notification settings - Fork 6
fix: ensure vision-capable models for image attachments #388
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -39,9 +39,15 @@ Analyze the user's prompt and the image to provide a holistic understanding of t | |||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||
| const filteredMessages = messages.filter(msg => msg.role !== 'system'); | ||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||
| // Check if any message contains an image (resolution search is specifically for image analysis) | ||||||||||||||||||||||||||||||||||||||||||||||||
| const hasImage = messages.some(message => | ||||||||||||||||||||||||||||||||||||||||||||||||
| Array.isArray(message.content) && | ||||||||||||||||||||||||||||||||||||||||||||||||
| message.content.some(part => part.type === 'image') | ||||||||||||||||||||||||||||||||||||||||||||||||
| ) | ||||||||||||||||||||||||||||||||||||||||||||||||
|
Comment on lines
+42
to
+46
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🧹 Nitpick | 🔵 Trivial Consider extracting duplicate image detection logic. This exact image detection pattern appears in both 🔎 Example helper functionAdd to +export function hasImageContent(messages: CoreMessage[]): boolean {
+ return messages.some(message =>
+ Array.isArray(message.content) &&
+ message.content.some(part => part.type === 'image')
+ )
+}Then use it in both files: - const hasImage = messages.some(message =>
- Array.isArray(message.content) &&
- message.content.some(part => part.type === 'image')
- )
+ const hasImage = hasImageContent(messages)
🤖 Prompt for AI Agents |
||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||
| // Use generateObject to get the full object at once. | ||||||||||||||||||||||||||||||||||||||||||||||||
| const { object } = await generateObject({ | ||||||||||||||||||||||||||||||||||||||||||||||||
| model: getModel(), | ||||||||||||||||||||||||||||||||||||||||||||||||
| model: getModel(hasImage), | ||||||||||||||||||||||||||||||||||||||||||||||||
|
Comment on lines
+42
to
+50
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🧹 Nitpick | 🔵 Trivial Consider validating that images are actually present. Since 🔎 Suggested validation const hasImage = messages.some(message =>
Array.isArray(message.content) &&
message.content.some(part => part.type === 'image')
)
+
+ if (!hasImage) {
+ throw new Error('resolutionSearch requires at least one image in the messages')
+ }
const { object } = await generateObject({📝 Committable suggestion
Suggested change
🤖 Prompt for AI Agents
Comment on lines
40
to
+50
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
For correctness and consistency, compute SuggestionBase the check on const hasImage = filteredMessages.some(message =>
Array.isArray(message.content) &&
message.content.some(part => part.type === 'image')
)Reply with "@CharlieHelps yes please" if you'd like me to add a commit with this fix. |
||||||||||||||||||||||||||||||||||||||||||||||||
| system: systemPrompt, | ||||||||||||||||||||||||||||||||||||||||||||||||
| messages: filteredMessages, | ||||||||||||||||||||||||||||||||||||||||||||||||
| schema: resolutionSearchSchema, | ||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||
| Original file line number | Diff line number | Diff line change | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -16,15 +16,16 @@ export function generateUUID(): string { | |||||||||
| return uuidv4(); | ||||||||||
| } | ||||||||||
|
|
||||||||||
| export function getModel() { | ||||||||||
| export function getModel(requireVision: boolean = false) { | ||||||||||
| const xaiApiKey = process.env.XAI_API_KEY | ||||||||||
| const gemini3ProApiKey = process.env.GEMINI_3_PRO_API_KEY | ||||||||||
| const awsAccessKeyId = process.env.AWS_ACCESS_KEY_ID | ||||||||||
| const awsSecretAccessKey = process.env.AWS_SECRET_ACCESS_KEY | ||||||||||
| const awsRegion = process.env.AWS_REGION | ||||||||||
| const bedrockModelId = '' | ||||||||||
| const bedrockModelId = process.env.BEDROCK_MODEL_ID || 'anthropic.claude-3-5-sonnet-20241022-v2:0' | ||||||||||
|
|
||||||||||
| if (xaiApiKey) { | ||||||||||
| // If vision is required, skip models that don't support it | ||||||||||
| if (!requireVision && xaiApiKey) { | ||||||||||
|
Comment on lines
+27
to
+28
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🧹 Nitpick | 🔵 Trivial Good vision-gating logic. Correctly skips xAI when vision is required, as xAI models don't support multimodal inputs. Optional: Consider clarifying the comment- // If vision is required, skip models that don't support it
+ // If vision is required, skip xAI (no vision support) and try vision-capable providers📝 Committable suggestion
Suggested change
🤖 Prompt for AI Agents |
||||||||||
| const xai = createXai({ | ||||||||||
| apiKey: xaiApiKey, | ||||||||||
| baseURL: 'https://api.x.ai/v1', | ||||||||||
|
|
@@ -67,7 +68,7 @@ export function getModel() { | |||||||||
| return model | ||||||||||
| } | ||||||||||
|
|
||||||||||
| // Default fallback (OpenAI) | ||||||||||
| // Default fallback (OpenAI gpt-4o supports vision) | ||||||||||
| const openai = createOpenAI({ | ||||||||||
| apiKey: process.env.OPENAI_API_KEY, | ||||||||||
| }) | ||||||||||
|
|
||||||||||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
contentis typed asCoreMessage['content'], but the code only checkshasImageto decide whether to send an array vs. a joined string. This can silently produce an invalid payload when non-image non-text parts exist (e.g., futurefile,audio,toolparts) or if atextpart is missing.text. In that case, the non-image branch will dopart.textand potentially drop/flatten data.Given the stated goal (multimodal correctness), it’s safer to build
contentby validating/normalizing parts and only joining when you’re sure all parts are text.Suggestion
Tighten the branching to ensure the string path only runs when every part is text, otherwise keep the array form:
This avoids lossy conversion if additional part types are introduced later. Reply with "@CharlieHelps yes please" if you'd like me to add a commit with this change.