Skip to content

fix: ensure vision-capable models for image attachments#388

Merged
ngoiyaeric merged 2 commits into
mainfrom
fix/image-attachment-tokens
Dec 30, 2025
Merged

fix: ensure vision-capable models for image attachments#388
ngoiyaeric merged 2 commits into
mainfrom
fix/image-attachment-tokens

Conversation

@ngoiyaeric
Copy link
Copy Markdown
Collaborator

@ngoiyaeric ngoiyaeric commented Dec 30, 2025

User description

Problem

When users attach images to their messages, the AI does not return tokens (streaming responses). The system appears to process the image but fails to generate a proper response.

Root Cause

The getModel() function was returning models that don't support vision/multimodal inputs:

  • xAI (grok-4-fast-non-reasoning): Does not support vision
  • AWS Bedrock: Had an empty model ID, causing failures
  • No detection of image content to select appropriate models

Solution

  1. Enhanced getModel(): Added requireVision parameter to ensure vision-capable models are used when images are present
  2. Updated researcher agent: Detects images in messages and requests vision-capable models
  3. Updated resolution-search agent: Uses vision models for image analysis
  4. Fixed Bedrock config: Set valid Claude 3.5 Sonnet model ID as default
  5. Improved type safety: Properly typed multimodal content

Changes

  • lib/utils/index.ts: Added requireVision parameter to getModel()
  • lib/agents/researcher.tsx: Added image detection logic
  • lib/agents/resolution-search.tsx: Added image detection logic
  • app/actions.tsx: Improved content type safety

Testing

The fix ensures that when images are attached:

  1. Content is properly formatted as multimodal array
  2. Image detection triggers use of vision-capable models (gpt-4o or Claude 3.5 Sonnet)
  3. Tokens stream back to the user successfully

Fixes the issue described in the attached documentation.


PR Type

Bug fix, Enhancement


Description

  • Add requireVision parameter to getModel() for vision-capable model selection

  • Fix AWS Bedrock configuration with valid Claude 3.5 Sonnet model ID

  • Detect images in researcher and resolution-search agents

  • Improve type safety for multimodal content handling


Diagram Walkthrough

flowchart LR
  A["Image Attachment"] --> B["Detect Image Content"]
  B --> C["Set requireVision Flag"]
  C --> D["getModel with Vision Support"]
  D --> E["Vision-Capable Model Selected"]
  E --> F["Proper Token Streaming"]
Loading

File Walkthrough

Relevant files
Bug fix
index.ts
Add vision parameter and fix Bedrock model config               

lib/utils/index.ts

  • Added requireVision parameter to getModel() function to conditionally
    select vision-capable models
  • Fixed AWS Bedrock configuration by setting default Claude 3.5 Sonnet
    model ID
  • Skip xAI grok model when vision is required since it doesn't support
    multimodal inputs
  • Updated comments to clarify vision support for each model provider
+7/-7     
Enhancement
actions.tsx
Improve type safety for multimodal content                             

app/actions.tsx

  • Improved type safety for multimodal content by properly typing content
    as CoreMessage['content']
  • Added explicit type casting for message objects to CoreMessage
  • Detect image presence in message parts to determine content structure
+4/-3     
researcher.tsx
Add image detection for vision model selection                     

lib/agents/researcher.tsx

  • Added image detection logic to check if any message contains image
    content
  • Pass hasImage flag to getModel() to request vision-capable models when
    needed
  • Enables proper handling of image attachments in research agent
+7/-1     
resolution-search.tsx
Add image detection for resolution search agent                   

lib/agents/resolution-search.tsx

  • Added image detection logic to identify image content in messages
  • Pass hasImage flag to getModel() for vision-capable model selection
  • Ensures resolution search agent uses appropriate models for image
    analysis
+7/-1     

Summary by CodeRabbit

  • New Features

    • Automatic image detection to pick vision-capable AI models when conversations include images.
  • Refactor

    • Improved type safety for message content and message handling.
    • Model selection logic enhanced to consider whether vision is required and to use appropriate provider fallbacks.

✏️ Tip: You can customize this high-level summary in your review settings.

- Add requireVision parameter to getModel() to select vision-capable models
- Update researcher agent to detect images and use vision models
- Update resolution-search agent to use vision models for image analysis
- Fix AWS Bedrock configuration with valid Claude 3.5 Sonnet model ID
- Improve type safety for multimodal content in actions.tsx

Fixes the issue where image attachments were not returning tokens because
non-vision models were being used for multimodal content processing.
@vercel
Copy link
Copy Markdown
Contributor

vercel Bot commented Dec 30, 2025

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Review Updated (UTC)
qcx Ready Ready Preview, Comment Dec 30, 2025 8:37am

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Dec 30, 2025

Walkthrough

Introduces image-detection and vision-aware model selection across agents and utilities, adds explicit CoreMessage['content'] typing when submitting messages, and changes getModel() to accept a requireVision boolean that influences provider selection and xAI usage.

Changes

Cohort / File(s) Summary
Model selection util
lib/utils/index.ts
getModel() signature changed to getModel(requireVision: boolean = false); provider selection now conditions xAI usage on requireVision; bedrockModelId reads from BEDROCK_MODEL_ID with a default.
Agent image detection
lib/agents/researcher.tsx, lib/agents/resolution-search.tsx
Scan message content parts for type 'image'; compute hasImage and pass it to getModel(hasImage); use selected model for streamText/generateObject.
Message typing / actions
app/actions.tsx
Explicitly type content as CoreMessage['content'] in the hasImage branch and cast the pushed message as CoreMessage when adding to messages.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant Action as Action Handler
    participant Agent as Agent (Researcher / ResolutionSearch)
    participant ModelSelection as getModel(requireVision)
    participant ModelAPI as Model Provider

    Client->>Action: Submit message (may include images)
    Action->>Action: Build message object\ncontent typed as CoreMessage['content']
    Action->>Agent: Invoke agent with messages
    Agent->>Agent: Inspect message.content parts\nset hasImage flag
    Agent->>ModelSelection: Call getModel(hasImage)
    alt hasImage = true
        Note right of ModelSelection `#BFD7EA`: Vision required\nxAI may be considered/skipped based on flag
        ModelSelection->>ModelAPI: Return vision-capable model
    else hasImage = false
        Note right of ModelSelection `#E8F6E8`: Vision not required\nstandard provider selection
        ModelSelection->>ModelAPI: Return standard model
    end
    Agent->>ModelAPI: streamText / generateObject using selected model
    ModelAPI->>Agent: Response / stream
    Agent->>Client: Return streamed/generated output
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Suggested labels

Review effort 4/5

Poem

🐰 I hopped through code with eager paws,
Images found — I sounded the cause.
Models picked with sight in mind,
Typed messages neat and kind.
Hooray — a smarter, vision-friendly cause! 🥕✨

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 20.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: adding vision model selection logic for image attachments by introducing a requireVision parameter and image detection across agents.
✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@qodo-code-review
Copy link
Copy Markdown
Contributor

qodo-code-review Bot commented Dec 30, 2025

PR Compliance Guide 🔍

Below is a summary of compliance checks for this PR:

Security Compliance
🟢
No security concerns identified No security vulnerabilities detected by AI analysis. Human verification advised for critical code.
Ticket Compliance
🎫 No ticket provided
  • Create ticket/issue
Codebase Duplication Compliance
Codebase context is not defined

Follow the guide to enable codebase context checks.

Custom Compliance
🟢
Generic: Comprehensive Audit Trails

Objective: To create a detailed and reliable record of critical system actions for security analysis
and compliance.

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Meaningful Naming and Self-Documenting Code

Objective: Ensure all identifiers clearly express their purpose and intent, making code
self-documenting

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Secure Error Handling

Objective: To prevent the leakage of sensitive system information through error messages while
providing sufficient detail for internal debugging.

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Secure Logging Practices

Objective: To ensure logs are useful for debugging and auditing without exposing sensitive
information like PII, PHI, or cardholder data.

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

🔴
Generic: Robust Error Handling and Edge Case Management

Objective: Ensure comprehensive error handling that provides meaningful context and graceful
degradation

Status:
Missing config validation: The Bedrock initialization path does not validate awsRegion and can attempt to create a
Bedrock client with an undefined region, leading to runtime failures without clear,
contextual handling.

Referred Code
const xaiApiKey = process.env.XAI_API_KEY
const awsAccessKeyId = process.env.AWS_ACCESS_KEY_ID
const awsSecretAccessKey = process.env.AWS_SECRET_ACCESS_KEY
const awsRegion = process.env.AWS_REGION
const bedrockModelId = process.env.BEDROCK_MODEL_ID || 'anthropic.claude-3-5-sonnet-20241022-v2:0'

// If vision is required, skip models that don't support it
if (!requireVision && xaiApiKey) {
  const xai = createXai({
    apiKey: xaiApiKey,
    baseURL: 'https://api.x.ai/v1',
  })
  // Optionally, add a check for credit status or skip xAI if credits are exhausted
  try {
    return xai('grok-4-fast-non-reasoning')
  } catch (error) {
    console.warn('xAI API unavailable, falling back to OpenAI:')
  }
}

// AWS Bedrock - Claude models support vision


 ... (clipped 5 lines)

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Security-First Input Validation and Data Handling

Objective: Ensure all data inputs are validated, sanitized, and handled securely to prevent
vulnerabilities

Status:
Weak env validation: The new Bedrock selection logic uses a default bedrockModelId and only checks for access
keys, but does not validate required configuration like AWS_REGION, which can cause
insecure/undefined behavior when selecting external providers.

Referred Code
const xaiApiKey = process.env.XAI_API_KEY
const awsAccessKeyId = process.env.AWS_ACCESS_KEY_ID
const awsSecretAccessKey = process.env.AWS_SECRET_ACCESS_KEY
const awsRegion = process.env.AWS_REGION
const bedrockModelId = process.env.BEDROCK_MODEL_ID || 'anthropic.claude-3-5-sonnet-20241022-v2:0'

// If vision is required, skip models that don't support it
if (!requireVision && xaiApiKey) {
  const xai = createXai({
    apiKey: xaiApiKey,
    baseURL: 'https://api.x.ai/v1',
  })
  // Optionally, add a check for credit status or skip xAI if credits are exhausted
  try {
    return xai('grok-4-fast-non-reasoning')
  } catch (error) {
    console.warn('xAI API unavailable, falling back to OpenAI:')
  }
}

// AWS Bedrock - Claude models support vision


 ... (clipped 5 lines)

Learn more about managing compliance generic rules or creating your own custom rules

  • Update
Compliance status legend 🟢 - Fully Compliant
🟡 - Partial Compliant
🔴 - Not Compliant
⚪ - Requires Further Human Verification
🏷️ - Compliance label

@qodo-code-review
Copy link
Copy Markdown
Contributor

qodo-code-review Bot commented Dec 30, 2025

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Always request a vision-capable model

To ensure the resolutionSearch agent always uses a vision-capable model, modify
the getModel call to always pass true instead of the hasImage variable.

lib/agents/resolution-search.tsx [42-54]

 // Check if any message contains an image (resolution search is specifically for image analysis)
 const hasImage = messages.some(message => 
   Array.isArray(message.content) && 
   message.content.some(part => part.type === 'image')
 )
 
 // Use generateObject to get the full object at once.
 const { object } = await generateObject({
-  model: getModel(hasImage),
+  model: getModel(true),
   system: systemPrompt,
   messages: filteredMessages,
   schema: resolutionSearchSchema,
 })
  • Apply / Chat
Suggestion importance[1-10]: 7

__

Why: The suggestion correctly identifies that the resolutionSearch agent's purpose is image analysis, so it should always request a vision-capable model, making the agent more robust.

Medium
Ensure Bedrock model supports vision

Add a condition in getModel to verify that the bedrockModelId corresponds to a
vision-capable model (e.g., 'claude-3') when requireVision is true, falling back
to another provider if it does not.

lib/utils/index.ts [40-55]

-// AWS Bedrock - Claude models support vision
+// AWS Bedrock - Claude 3 models support vision
 if (awsAccessKeyId && awsSecretAccessKey && bedrockModelId) {
-  const bedrock = createAmazonBedrock({
-    bedrockOptions: {
-      region: awsRegion,
-      credentials: {
-        accessKeyId: awsAccessKeyId,
-        secretAccessKey: awsSecretAccessKey,
+  // Only use Bedrock for vision if a Claude 3 model is specified.
+  if (requireVision && !bedrockModelId.includes('claude-3')) {
+    // Do not use Bedrock for vision if the model is not a known vision model,
+    // fall through to the next provider (OpenAI).
+  } else {
+    const bedrock = createAmazonBedrock({
+      bedrockOptions: {
+        region: awsRegion,
+        credentials: {
+          accessKeyId: awsAccessKeyId,
+          secretAccessKey: awsSecretAccessKey,
+        },
       },
-    },
-  })
-  const model = bedrock(bedrockModelId, {
-    additionalModelRequestFields: { top_k: 350 },
-  })
-  return model
+    })
+    const model = bedrock(bedrockModelId, {
+      additionalModelRequestFields: { top_k: 350 },
+    })
+    return model
+  }
 }
  • Apply / Chat
Suggestion importance[1-10]: 6

__

Why: This suggestion improves the robustness of model selection by adding a check to ensure the configured Bedrock model supports vision when required, preventing potential runtime errors.

Low
  • Update

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

📜 Review details

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8977bad and f89ae98.

📒 Files selected for processing (4)
  • app/actions.tsx
  • lib/agents/researcher.tsx
  • lib/agents/resolution-search.tsx
  • lib/utils/index.ts
🧰 Additional context used
🧬 Code graph analysis (3)
lib/agents/resolution-search.tsx (2)
lib/db/schema.ts (1)
  • messages (26-37)
lib/utils/index.ts (1)
  • getModel (19-62)
app/actions.tsx (3)
components/chat-panel.tsx (1)
  • e (72-107)
lib/actions/chat.ts (1)
  • msg (119-127)
lib/actions/chat-db.ts (1)
  • msg (117-121)
lib/agents/researcher.tsx (2)
lib/db/schema.ts (1)
  • messages (26-37)
lib/utils/index.ts (1)
  • getModel (19-62)
🔇 Additional comments (5)
lib/agents/researcher.tsx (2)

99-103: Image detection logic looks correct.

The implementation properly checks for multimodal content by inspecting message.content arrays for image parts. This aligns with the PR objective to enable vision-capable model selection when images are present.


106-106: Type assertion is necessary due to missing return type annotation on getModel().

The getModel() function in lib/utils/index.ts lacks an explicit return type annotation and returns instances from different provider factories (createXai, createAmazonBedrock, createOpenAI). Without the as LanguageModel cast, TypeScript cannot guarantee that these provider instances are recognized as LanguageModel. To ensure type safety across all callers, the assertion should be retained. Alternatively, add an explicit return type annotation to getModel() to eliminate the need for repeated casts at every call site.

Likely an incorrect or invalid review comment.

app/actions.tsx (1)

253-282: Type safety improvements look good.

The explicit typing of content as CoreMessage['content'] (line 254) and the message cast (line 282) properly align the multimodal content handling with the expected CoreMessage structure. The conditional logic correctly handles both string content (text-only) and array content (with images).

lib/utils/index.ts (2)

19-38: requireVision parameter correctly gates model selection.

The implementation properly skips xAI's non-vision model when requireVision is true, ensuring that only vision-capable models (Bedrock Claude or OpenAI gpt-4o) are selected for image processing tasks. This directly addresses the root cause described in the PR objectives.


24-24: The default Bedrock model ID is valid and supported.

The model anthropic.claude-3-5-sonnet-20241022-v2:0 is a current Claude 3.5 Sonnet version supported by AWS Bedrock. No changes needed. Regional availability and IAM permissions depend on your AWS configuration—verify these at runtime using your AWS credentials if the model fails to load.

Comment on lines +42 to +50
// Check if any message contains an image (resolution search is specifically for image analysis)
const hasImage = messages.some(message =>
Array.isArray(message.content) &&
message.content.some(part => part.type === 'image')
)

// Use generateObject to get the full object at once.
const { object } = await generateObject({
model: getModel(),
model: getModel(hasImage),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Consider validating that images are actually present.

Since resolutionSearch is specifically designed for satellite image analysis (per the system prompt), consider adding a validation check to ensure images are present before proceeding. This would provide clearer error messages if called incorrectly.

🔎 Suggested validation
   const hasImage = messages.some(message => 
     Array.isArray(message.content) && 
     message.content.some(part => part.type === 'image')
   )
+
+  if (!hasImage) {
+    throw new Error('resolutionSearch requires at least one image in the messages')
+  }

   const { object } = await generateObject({
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// Check if any message contains an image (resolution search is specifically for image analysis)
const hasImage = messages.some(message =>
Array.isArray(message.content) &&
message.content.some(part => part.type === 'image')
)
// Use generateObject to get the full object at once.
const { object } = await generateObject({
model: getModel(),
model: getModel(hasImage),
// Check if any message contains an image (resolution search is specifically for image analysis)
const hasImage = messages.some(message =>
Array.isArray(message.content) &&
message.content.some(part => part.type === 'image')
)
if (!hasImage) {
throw new Error('resolutionSearch requires at least one image in the messages')
}
// Use generateObject to get the full object at once.
const { object } = await generateObject({
model: getModel(hasImage),
🤖 Prompt for AI Agents
In lib/agents/resolution-search.tsx around lines 42 to 50, the code only detects
image parts but does not halt when none exist; add an explicit validation after
computing hasImage that checks if hasImage is false and then return or throw a
clear, contextual error (e.g., "resolutionSearch requires at least one image in
messages") so the function short-circuits before calling
generateObject/getModel; ensure the validation uses the same
Array.isArray(message.content) && message.content.some(...) check and that the
error path provides a helpful message for callers and logs where appropriate.

Comment on lines +42 to +46
// Check if any message contains an image (resolution search is specifically for image analysis)
const hasImage = messages.some(message =>
Array.isArray(message.content) &&
message.content.some(part => part.type === 'image')
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Consider extracting duplicate image detection logic.

This exact image detection pattern appears in both lib/agents/researcher.tsx (lines 99-103) and here. Consider extracting it to a shared utility function to maintain consistency and reduce duplication.

🔎 Example helper function

Add to lib/utils/index.ts:

+export function hasImageContent(messages: CoreMessage[]): boolean {
+  return messages.some(message => 
+    Array.isArray(message.content) && 
+    message.content.some(part => part.type === 'image')
+  )
+}

Then use it in both files:

-  const hasImage = messages.some(message => 
-    Array.isArray(message.content) && 
-    message.content.some(part => part.type === 'image')
-  )
+  const hasImage = hasImageContent(messages)

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In lib/agents/resolution-search.tsx around lines 42 to 46 the image-detection
logic (checking message.content is an array and any part.type === 'image') is
duplicated elsewhere (lib/agents/researcher.tsx lines 99-103); extract this into
a shared utility (e.g., export function hasImage(messages: Message[]): boolean)
placed in lib/utils/index.ts, implement the same array-and-part-type checks once
in that function, export it, then replace the inline detection in both files
with an import of the new hasImage utility and call it (adjust imports/typing
accordingly).

Copy link
Copy Markdown

@charliecreates charliecreates Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The core direction is correct, but getModel(requireVision) does not actually guarantee a vision-capable model when Bedrock is configured with an arbitrary BEDROCK_MODEL_ID, which can reintroduce the original failure mode. app/actions.tsx still has a lossy conversion path (joining parts to string) that may break if non-text parts exist, and it uses a broad as CoreMessage cast that can mask structural issues. Image detection logic is duplicated across agents and is inconsistent in resolution-search (computed from a different message set than the one sent to the model). Overall reliability would improve with a centralized messagesHaveImage() helper and stricter runtime selection/validation for vision models.

Additional notes (4)
  • Maintainability | app/actions.tsx:279-282
    Casting the entire object to CoreMessage (} as CoreMessage) hides structural issues and makes it easy to accidentally omit required fields (or include incompatible ones) without noticing. Since you already have a CoreMessage['content'] variable, it’s better to construct an object that is explicitly a CoreMessage (or ensure the array messages is typed so the push is checked) rather than forcing a cast at the end.

  • Maintainability | lib/agents/researcher.tsx:99-103
    The hasImage detection is duplicated across agents and is subtly coupled to the exact content representation (Array.isArray(message.content) + part.type === 'image'). If the message format evolves (different part types, or nested content), you’ll need to update multiple locations and risk inconsistent behavior.

Given this PR’s theme (vision gating), centralizing image detection will reduce drift and make it easier to test.

  • Compatibility | lib/utils/index.ts:19-19
    getModel(requireVision) currently assumes Bedrock is vision-capable whenever credentials exist and a model id is set. However, BEDROCK_MODEL_ID is configurable and could be set to a non-vision model, which would reintroduce the original failure mode when requireVision=true.

Since this function is now responsible for guaranteeing vision support, it should enforce that guarantee (or have an explicit allowlist of known vision-capable IDs) when requireVision is true.

  • Readability | lib/utils/index.ts:40-40
    When requireVision is true, the function skips xAI (good), but the selection order is still Bedrock-first, then OpenAI. That’s fine, but there’s no explicit check that the Bedrock region is set (awsRegion). If AWS_REGION is missing, this will likely fail at runtime and prevent fallback.

Given the goal is reliability for image inputs, consider requiring region (or catching Bedrock creation errors and falling back).

Summary of changes

What changed

  • Added vision-aware model selection by extending getModel() to accept a requireVision: boolean parameter and using it to avoid xAI non-vision models when images are present.
  • Fixed AWS Bedrock default model id by reading process.env.BEDROCK_MODEL_ID with a fallback to anthropic.claude-3-5-sonnet-20241022-v2:0.
  • Added image detection in both agents:
    • lib/agents/researcher.tsx detects multimodal image parts and calls getModel(hasImage).
    • lib/agents/resolution-search.tsx does the same for image analysis and calls getModel(hasImage).
  • Improved message content typing in app/actions.tsx by using CoreMessage['content'] instead of any, and casting the pushed message to CoreMessage.

Files touched

  • app/actions.tsx
  • lib/agents/researcher.tsx
  • lib/agents/resolution-search.tsx
  • lib/utils/index.ts

Comment thread app/actions.tsx
Comment on lines 252 to 256
const hasImage = messageParts.some(part => part.type === 'image')
const content = hasImage
? (messageParts as any)
// Properly type the content based on whether it contains images
const content: CoreMessage['content'] = hasImage
? messageParts as CoreMessage['content']
: messageParts.map(part => part.text).join('\n')
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

content is typed as CoreMessage['content'], but the code only checks hasImage to decide whether to send an array vs. a joined string. This can silently produce an invalid payload when non-image non-text parts exist (e.g., future file, audio, tool parts) or if a text part is missing .text. In that case, the non-image branch will do part.text and potentially drop/flatten data.

Given the stated goal (multimodal correctness), it’s safer to build content by validating/normalizing parts and only joining when you’re sure all parts are text.

Suggestion

Tighten the branching to ensure the string path only runs when every part is text, otherwise keep the array form:

const isAllText = messageParts.every(p => p.type === 'text')
const content: CoreMessage['content'] = isAllText
  ? messageParts.map(p => p.text).join('\n')
  : (messageParts as CoreMessage['content'])

This avoids lossy conversion if additional part types are introduced later. Reply with "@CharlieHelps yes please" if you'd like me to add a commit with this change.

Comment on lines 40 to +50
const filteredMessages = messages.filter(msg => msg.role !== 'system');

// Check if any message contains an image (resolution search is specifically for image analysis)
const hasImage = messages.some(message =>
Array.isArray(message.content) &&
message.content.some(part => part.type === 'image')
)

// Use generateObject to get the full object at once.
const { object } = await generateObject({
model: getModel(),
model: getModel(hasImage),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

filteredMessages is computed, but hasImage is checked against messages (unfiltered). If a system message ever contains image parts (or the caller passes different arrays), you could incorrectly select a vision model when it’s not needed, or vice versa.

For correctness and consistency, compute hasImage from the actual messages array you pass to the model (filteredMessages).

Suggestion

Base the check on filteredMessages:

const hasImage = filteredMessages.some(message =>
  Array.isArray(message.content) &&
  message.content.some(part => part.type === 'image')
)

Reply with "@CharlieHelps yes please" if you'd like me to add a commit with this fix.

@charliecreates charliecreates Bot removed the request for review from CharlieHelps December 30, 2025 08:34
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
lib/utils/index.ts (2)

37-37: Fix misleading fallback message.

The error message claims fallback to OpenAI, but the actual fallback order is Gemini → Bedrock → OpenAI.

🔎 Suggested fix
-      console.warn('xAI API unavailable, falling back to OpenAI:')
+      console.warn('xAI API unavailable, falling back to next provider')

55-69: Consider adding error handling for consistency.

Unlike xAI and Gemini providers, the Bedrock block lacks a try-catch. If Bedrock initialization fails, the function won't fall back to OpenAI. Adding error handling would improve reliability and consistency, especially for vision-required scenarios.

🔎 Suggested improvement
   if (awsAccessKeyId && awsSecretAccessKey) {
-    const bedrock = createAmazonBedrock({
-      bedrockOptions: {
-        region: awsRegion,
-        credentials: {
-          accessKeyId: awsAccessKeyId,
-          secretAccessKey: awsSecretAccessKey,
-        },
-      },
-    })
-    const model = bedrock(bedrockModelId, {
-      additionalModelRequestFields: { top_k: 350 },
-    })
-    return model
+    try {
+      const bedrock = createAmazonBedrock({
+        bedrockOptions: {
+          region: awsRegion,
+          credentials: {
+            accessKeyId: awsAccessKeyId,
+            secretAccessKey: awsSecretAccessKey,
+          },
+        },
+      })
+      const model = bedrock(bedrockModelId, {
+        additionalModelRequestFields: { top_k: 350 },
+      })
+      return model
+    } catch (error) {
+      console.warn('AWS Bedrock unavailable, falling back to next provider:', error)
+    }
   }
📜 Review details

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f89ae98 and 8ec42ae.

📒 Files selected for processing (1)
  • lib/utils/index.ts
🔇 Additional comments (4)
lib/utils/index.ts (4)

19-19: LGTM! Clean function signature with clear intent.

The requireVision parameter with a sensible default provides an explicit, backward-compatible API for vision-aware model selection.


71-76: LGTM! Clear documentation of vision support.

The updated comment accurately reflects that gpt-4o supports vision, making it a suitable fallback for multimodal inputs.


41-51: Gemini 3 Pro vision support is confirmed.

Gemini 3 Pro is natively multimodal and supports image inputs (up to 900 images per prompt, with support for PNG, JPEG, WebP, HEIC, and HEIF formats). The code correctly does not skip Gemini when requireVision=true.


25-25: No issues found. The model ID 'anthropic.claude-3-5-sonnet-20241022-v2:0' is valid and documented in AWS Bedrock, and Claude 3.5 Sonnet supports vision/multimodal capabilities as required by the requireVision parameter in the getModel() function.

Comment thread lib/utils/index.ts
Comment on lines +27 to +28
// If vision is required, skip models that don't support it
if (!requireVision && xaiApiKey) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Good vision-gating logic.

Correctly skips xAI when vision is required, as xAI models don't support multimodal inputs.

Optional: Consider clarifying the comment
-  // If vision is required, skip models that don't support it
+  // If vision is required, skip xAI (no vision support) and try vision-capable providers
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// If vision is required, skip models that don't support it
if (!requireVision && xaiApiKey) {
// If vision is required, skip xAI (no vision support) and try vision-capable providers
if (!requireVision && xaiApiKey) {
🤖 Prompt for AI Agents
In lib/utils/index.ts around lines 27 to 28, the existing comment "If vision is
required, skip models that don't support it" is misleading given the
conditional; update the comment to accurately describe the logic (e.g., explain
that when vision is not required and an xAI API key is present, xAI-only models
are considered, otherwise xAI is skipped) so future readers understand the
condition and intent.

@ngoiyaeric ngoiyaeric merged commit 13697f3 into main Dec 30, 2025
5 checks passed
ngoiyaeric added a commit that referenced this pull request Dec 30, 2025
This commit documents that the branch has been synchronized with the latest
changes from main branch. All recent updates have been merged including:

- Gemini 3 Pro model support (PR #389)
- Image attachment token fixes (PR #388)
- Comprehensive E2E test suite (PR #350)
- Playwright GitHub Actions CI/CD
- All dependency updates and bug fixes

The Supabase backend implementation and collaboration features from this
PR have been preserved and are compatible with the latest main branch changes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant