Skip to content

Fix: decouple chat from local Ollama, route inference through Rust#369

Merged
senamakel merged 8 commits intotinyhumansai:mainfrom
senamakel:fix/ollama-2
Apr 6, 2026
Merged

Fix: decouple chat from local Ollama, route inference through Rust#369
senamakel merged 8 commits intotinyhumansai:mainfrom
senamakel:fix/ollama-2

Conversation

@senamakel
Copy link
Copy Markdown
Member

@senamakel senamakel commented Apr 6, 2026

Summary

Fixes #366 — Chat was blocked while Ollama downloaded; local model was used as the primary chat backend.

  • Chat no longer blocked by Ollama status — removed the gate that required either local model readiness OR socket connection. Chat always works via cloud backend when socket is connected.
  • All inference routed through Rust RPCchatSend() and chatCancel() now call openhuman.channel_web_chat / openhuman.channel_web_cancel via core RPC instead of emitting socket events directly. Zero inference logic in TypeScript.
  • Rust presentation layer for message segmentation + emoji reactions — new presentation.rs module runs on local Ollama (zero cloud cost). Smart segmentation: only splits prose into human-feeling bubbles; never splits code blocks, structured content (lists/tables), or short messages. Emits chat_segment events with natural delays.
  • Removed Ollama as primary chat fallback — local model is now supplementary only (reactions, message splitting, autocomplete). No fallback to Ollama on rate limits or cloud errors.

Test plan

  • Verify chat works when Ollama is not installed / still downloading
  • Verify chat messages route through cloud backend (check Rust logs for channel_web_chat)
  • Verify chat_segment events deliver multi-bubble responses when local model is ready
  • Verify emoji reactions appear on user messages when local model is ready
  • Verify code block responses are NOT split into multiple bubbles
  • Verify structured content (lists, tables) is NOT split
  • Run cargo test --lib 'presentation::tests' — 7 unit tests pass
  • Run yarn typecheck and yarn lint — clean

🤖 Generated with Claude Code

Summary by CodeRabbit

Release Notes

  • New Features

    • Chat responses now stream in multiple segments as they arrive
    • Added automatic reaction emoji support from the service
    • Updated typing indicators to display only during active sending
  • Bug Fixes

    • Fixed UTF-8 text truncation in debug output
  • Chores

    • Simplified chat infrastructure by consolidating to cloud-based architecture

senamakel and others added 5 commits April 6, 2026 10:27
…se delivery logic

- Eliminated the `deliverLocalResponse` function, which handled segmenting and dispatching local model responses, simplifying the message delivery process.
- Updated socket status checks to remove unnecessary local model condition, ensuring clarity in connection handling.
- Adjusted comments to reflect the new routing logic for cloud backend interactions, emphasizing that local models are now used only for supplementary features.
- Replaced Socket.IO message sending with core RPC calls for sending and canceling chat messages, enhancing the communication mechanism.
- Improved error handling for socket connection checks, ensuring robust event routing.
- Updated documentation to reflect the new RPC-based approach for chat message processing.
…tions

- Introduced a new `ChatSegmentEvent` interface to support segmented message delivery, allowing for more natural chat interactions.
- Updated the `Conversations` component to handle segmented responses and apply emoji reactions based on user messages.
- Refactored the message delivery logic to improve clarity and maintainability, ensuring that responses are dispatched correctly based on segmentation.
- Enhanced the chat service to emit segment events, facilitating a smoother user experience during conversations.
… and optimizing rendering logic

- Removed unnecessary imports related to local model status and message segmentation, streamlining the codebase.
- Updated the rendering logic to enhance clarity by simplifying conditions for displaying sending and delivering states in the chat interface.
- Improved overall readability and maintainability of the Conversations component.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 6, 2026

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 13354d95-c4f3-4a45-85d5-7e10db0cf4c0

📥 Commits

Reviewing files that changed from the base of the PR and between 0f9d8bb and ec718e8.

📒 Files selected for processing (1)
  • src/openhuman/subconscious/integration_test.rs

📝 Walkthrough

Walkthrough

This PR removes the client-side local model chat path and migrates message handling through cloud socket RPC calls, introducing server-side response segmentation and emoji reactions delivered via new chat_segment events.

Changes

Cohort / File(s) Summary
Chat Frontend & Service
app/src/pages/Conversations.tsx, app/src/services/chatService.ts
Removed local model state and delivery; replaced Socket.IO emissions with callCoreRpc for openhuman.channel_web_chat and openhuman.channel_web_cancel; added ChatSegmentEvent support with onSegment handler for dispatching incremental responses and reaction emojis; simplified onDone to finalize sending state only.
Event Model & Serialization
src/core/socketio.rs
Added optional fields to WebChannelEvent: reaction_emoji, segment_index, segment_total (serialized with snake_case, skipped when unset).
Response Delivery
src/openhuman/channels/providers/presentation.rs
New module implementing deliver_response() with paragraph/sentence-based segmentation logic (respecting code fences, structured content, character limits) and optional emoji reactions; includes unit tests for segmentation rules and delay enforcement.
Provider Integration
src/openhuman/channels/providers/mod.rs, src/openhuman/channels/providers/web.rs
Added presentation module declaration; integrated presentation::deliver_response() for emitting segmented chat_segment events and final chat_done with reaction/segment metadata; extended all event publications with new optional fields.
Test Utilities
src/openhuman/subconscious/integration_test.rs
Added truncate_at_char_boundary() helper to prevent UTF-8 boundary panics when truncating debug output.
Documentation
docs/TODO.md
Added memory-skill integration TODO.

Sequence Diagram

sequenceDiagram
    participant Client as Client (Frontend)
    participant Socket as Socket.IO
    participant CoreRPC as Core RPC Layer
    participant Web as Web Provider
    participant Present as Presentation Module
    
    Client->>Socket: connect (with socket.id)
    Socket-->>Client: connected
    
    Client->>CoreRPC: callCoreRpc('openhuman.channel_web_chat', {client_id, thread_id, message})
    CoreRPC->>Web: route to web.rs chat handler
    Web->>Web: run_chat_task (get full_response)
    Web->>Present: deliver_response(client_id, thread_id, full_response, user_message)
    
    Present->>Present: segment_for_delivery (check code/length/structure)
    
    alt Multiple Segments
        Present->>Web: emit chat_segment events (with segment_index, reaction_emoji on first)
        Web->>Socket: emit 'chat:segment' for each segment
        Socket-->>Client: onSegment fires for each segment
        Client->>Client: dispatch addInferenceResponse
        Present->>Web: emit chat_done (segment_total, full_response)
        Web->>Socket: emit 'chat:done'
        Socket-->>Client: onDone finalizes state
    else Single Segment (≤1)
        Present->>Web: emit chat_done (no segment events)
        Web->>Socket: emit 'chat:done'
        Socket-->>Client: onDone adds response bubble
    end
Loading

Estimated Code Review Effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly Related PRs

Poem

🐰 Hopping past the local model's slow embrace,
Cloud sockets now win the message race,
Segments flutter in from Rust's swift hand,
Reactions bloom where delivery was planned,
Chat flows free, no more standing still! 🌟

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly summarizes the main change: decoupling chat from local Ollama and routing inference through Rust. It directly reflects the core transformation in the changeset.
Linked Issues check ✅ Passed The PR meets all coding requirements from issue #366: chat is decoupled from local Ollama (Conversations.tsx, chatService.ts), cloud routing via Rust RPC is implemented (chatService.ts, web.rs), and local presentation layer runs independently (presentation.rs). Message segmentation, emoji reactions, and smart parsing (code blocks, structured content) are implemented with unit tests.
Out of Scope Changes check ✅ Passed All changes are in scope for the objectives: chat decoupling (Conversations.tsx, chatService.ts), Rust RPC routing (chatService.ts, web.rs, socketio.rs), local presentation layer (presentation.rs), and code cleanup. One minor documentation update (TODO.md) tangentially relates to memory skill integration.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@senamakel senamakel marked this pull request as ready for review April 6, 2026 18:23
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (5)
docs/TODO.md (1)

68-68: Consider making the TODO more specific and actionable.

The TODO "integrate memory to all the various skills, like autocomplete and others" is somewhat vague. For better tracking and actionability, consider specifying:

  • What "integrate memory" means concretely (e.g., "use memory embeddings for context-aware autocomplete suggestions")
  • Which specific skills should be prioritized
  • Who owns this work (like other TODOs that include names)

Example:

-  [] integrate memory to all the various skills, like autocomplete and others.
+  [] integrate memory embeddings into autocomplete suggestions for context-aware completions (owner: TBD)
+  [] extend memory integration to skill execution context (segmentation, reactions)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/TODO.md` at line 68, Replace the vague TODO "integrate memory to all the
various skills, like autocomplete and others." with a specific, actionable task:
define exactly what "integrate memory" means (e.g., "use vector embeddings +
short-term session store for context-aware autocomplete and conversation
history"), list prioritized skills to update (e.g., "autocomplete, code
completion, search suggestions"), assign an owner and target milestone, and
include clear acceptance criteria (examples, performance constraints, and tests)
so the task is ready to pick up; use the existing TODO text as the identifier
when updating the entry.
src/openhuman/channels/providers/presentation.rs (2)

199-200: Minor: Numbered list detection may have false positives.

The condition trimmed.chars().next().map_or(false, |c| c.is_ascii_digit()) && trimmed.contains(". ") would match lines like "2023. A great year" or "100. Some text" that aren't list items. Consider tightening to match typical list patterns like 1. , 2. , etc.

🔧 Stricter numbered list detection
-                || trimmed.chars().next().map_or(false, |c| c.is_ascii_digit())
-                    && trimmed.contains(". ")
+                || is_numbered_list_item(trimmed)

Add helper:

fn is_numbered_list_item(line: &str) -> bool {
    let mut chars = line.chars().peekable();
    // Consume leading digits
    let mut has_digit = false;
    while chars.peek().map_or(false, |c| c.is_ascii_digit()) {
        chars.next();
        has_digit = true;
    }
    // Must have at least one digit followed by ". "
    has_digit && chars.next() == Some('.') && chars.next() == Some(' ')
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/openhuman/channels/providers/presentation.rs` around lines 199 - 200, The
current numbered-list test using trimmed.chars().next().map_or(false, |c|
c.is_ascii_digit()) && trimmed.contains(". ") is too permissive and can match
dates or decimals; introduce a helper like is_numbered_list_item(line: &str)
that consumes one or more leading ASCII digits and then requires an exact ". "
sequence, and replace the existing condition (where trimmed is checked) with a
call to is_numbered_list_item(trimmed) so only classic list prefixes such as "1.
" or "12. " are accepted.

33-34: Misleading comment: reaction is awaited sequentially, not in parallel.

The comment says "Fire-and-forget reaction decision (runs in parallel)" but try_reaction is awaited directly before segmentation begins, making it sequential. This blocks the first segment delivery until the local model reaction decision completes (or times out).

If parallel execution is desired, use tokio::spawn or futures::join!. If sequential is intentional (to attach reaction to the first segment), update the comment.

🔧 Option A: Fix comment to reflect actual behavior
-    // Fire-and-forget reaction decision (runs in parallel).
+    // Get reaction decision from local model (blocks until complete or timeout).
     let reaction_emoji = try_reaction(user_message).await;
🔧 Option B: Actually run in parallel with segmentation
+use futures::future::join;
+
 pub async fn deliver_response(
     client_id: &str,
     thread_id: &str,
     request_id: &str,
     full_response: &str,
     user_message: &str,
 ) {
-    // Fire-and-forget reaction decision (runs in parallel).
-    let reaction_emoji = try_reaction(user_message).await;
-
-    // Attempt segmentation.
-    let segments = segment_for_delivery(full_response);
+    // Compute segments synchronously, reaction async in parallel.
+    let segments = segment_for_delivery(full_response);
+    let (reaction_emoji, _) = join(
+        try_reaction(user_message),
+        std::future::ready(()),
+    ).await;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/openhuman/channels/providers/presentation.rs` around lines 33 - 34, The
comment is wrong: try_reaction(user_message) is awaited sequentially which
blocks segmentation; either update the comment to reflect sequential behavior or
run the reaction decision in parallel by spawning it and awaiting when
needed—e.g., call tokio::spawn(async move { try_reaction(user_message).await })
(store the JoinHandle) before starting segmentation, continue producing
segments, then await the handle when you need the reaction (e.g., to attach to
the first_segment); reference try_reaction, user_message, segmentation, and
first_segment when making the change.
app/src/services/chatService.ts (1)

41-49: Field naming may cause confusion: full_response contains segment text.

The ChatSegmentEvent.full_response field actually contains the segment text, not the full response. The inline comment clarifies this, but the field name inherited from the Rust WebChannelEvent struct could mislead developers. Consider whether a type alias or wrapper interface with clearer naming would help.

💡 Alternative: Add a clearer accessor
/** A single segment of a multi-bubble response, emitted before `chat_done`. */
export interface ChatSegmentEvent {
  thread_id: string;
  /** The segment text (named full_response for wire compatibility with WebChannelEvent). */
  full_response: string;
  request_id: string;
  segment_index: number;
  segment_total: number;
  reaction_emoji?: string | null;
}

/** Helper to extract segment content with clearer semantics. */
export const getSegmentText = (event: ChatSegmentEvent): string => event.full_response;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/src/services/chatService.ts` around lines 41 - 49, The field name
ChatSegmentEvent.full_response is misleading because it holds a segment's text,
so add a clear accessor (e.g., getSegmentText(event: ChatSegmentEvent): string)
that returns event.full_response and use it throughout the codebase, or
introduce a thin wrapper/alias interface (e.g., ChatSegmentView with
segment_text) that maps to the original ChatSegmentEvent for wire compatibility;
ensure the original full_response is kept for compatibility but all internal
consumers call the new accessor/wrapper to avoid confusion.
app/src/pages/Conversations.tsx (1)

502-505: Verify socket connection check aligns with RPC-based sending.

The guard checks socketStatus !== 'connected' before sending, but chatSend now uses core RPC (HTTP) rather than socket emit. The socket is still needed for receiving events and for the client_id, but consider whether the error message accurately reflects the failure mode.

💡 Consider clarifying the error context
   if (socketStatus !== 'connected') {
-    setSendError(chatSendError('socket_disconnected', 'Realtime socket is not connected.'));
+    setSendError(chatSendError('socket_disconnected', 'Realtime connection unavailable — needed for receiving responses.'));
     return;
   }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/src/pages/Conversations.tsx` around lines 502 - 505, The current guard
treats any non-'connected' socketStatus as a send failure, but chatSend now uses
core RPC (HTTP) and only requires the socket for receive/client_id-related
realtime features; update the check around socketStatus in Conversations (the
block that calls setSendError and chatSendError) to validate the actual
requirement: if client_id is missing or realtime features are required and
socketStatus !== 'connected' then set a clearer error via
setSendError(chatSendError(...)) that mentions missing client_id or realtime
socket being disconnected; otherwise allow RPC send to proceed and only log/warn
that realtime delivery/acknowledgement may be unavailable. Ensure references
include socketStatus, client_id, chatSend, setSendError, and chatSendError so
the fix is applied in the right code path.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@app/src/pages/Conversations.tsx`:
- Around line 502-505: The current guard treats any non-'connected' socketStatus
as a send failure, but chatSend now uses core RPC (HTTP) and only requires the
socket for receive/client_id-related realtime features; update the check around
socketStatus in Conversations (the block that calls setSendError and
chatSendError) to validate the actual requirement: if client_id is missing or
realtime features are required and socketStatus !== 'connected' then set a
clearer error via setSendError(chatSendError(...)) that mentions missing
client_id or realtime socket being disconnected; otherwise allow RPC send to
proceed and only log/warn that realtime delivery/acknowledgement may be
unavailable. Ensure references include socketStatus, client_id, chatSend,
setSendError, and chatSendError so the fix is applied in the right code path.

In `@app/src/services/chatService.ts`:
- Around line 41-49: The field name ChatSegmentEvent.full_response is misleading
because it holds a segment's text, so add a clear accessor (e.g.,
getSegmentText(event: ChatSegmentEvent): string) that returns
event.full_response and use it throughout the codebase, or introduce a thin
wrapper/alias interface (e.g., ChatSegmentView with segment_text) that maps to
the original ChatSegmentEvent for wire compatibility; ensure the original
full_response is kept for compatibility but all internal consumers call the new
accessor/wrapper to avoid confusion.

In `@docs/TODO.md`:
- Line 68: Replace the vague TODO "integrate memory to all the various skills,
like autocomplete and others." with a specific, actionable task: define exactly
what "integrate memory" means (e.g., "use vector embeddings + short-term session
store for context-aware autocomplete and conversation history"), list
prioritized skills to update (e.g., "autocomplete, code completion, search
suggestions"), assign an owner and target milestone, and include clear
acceptance criteria (examples, performance constraints, and tests) so the task
is ready to pick up; use the existing TODO text as the identifier when updating
the entry.

In `@src/openhuman/channels/providers/presentation.rs`:
- Around line 199-200: The current numbered-list test using
trimmed.chars().next().map_or(false, |c| c.is_ascii_digit()) &&
trimmed.contains(". ") is too permissive and can match dates or decimals;
introduce a helper like is_numbered_list_item(line: &str) that consumes one or
more leading ASCII digits and then requires an exact ". " sequence, and replace
the existing condition (where trimmed is checked) with a call to
is_numbered_list_item(trimmed) so only classic list prefixes such as "1. " or
"12. " are accepted.
- Around line 33-34: The comment is wrong: try_reaction(user_message) is awaited
sequentially which blocks segmentation; either update the comment to reflect
sequential behavior or run the reaction decision in parallel by spawning it and
awaiting when needed—e.g., call tokio::spawn(async move {
try_reaction(user_message).await }) (store the JoinHandle) before starting
segmentation, continue producing segments, then await the handle when you need
the reaction (e.g., to attach to the first_segment); reference try_reaction,
user_message, segmentation, and first_segment when making the change.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 27b328f1-5574-426c-9ace-3391c39cd45b

📥 Commits

Reviewing files that changed from the base of the PR and between 9b73d00 and 422a424.

📒 Files selected for processing (10)
  • app/src/components/settings/panels/ScreenIntelligencePanel.tsx
  • app/src/pages/Conversations.tsx
  • app/src/services/chatService.ts
  • docs/TODO.md
  • src/core/screen_intelligence_cli.rs
  • src/core/socketio.rs
  • src/openhuman/channels/providers/mod.rs
  • src/openhuman/channels/providers/presentation.rs
  • src/openhuman/channels/providers/web.rs
  • src/openhuman/screen_intelligence/engine.rs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] Chat blocked until local/Ollama model finishes; users stuck while large pull runs

1 participant