Fix: decouple chat from local Ollama, route inference through Rust by senamakel · Pull Request #369 · tinyhumansai/openhuman

senamakel · 2026-04-06T18:11:03Z

Summary

Fixes #366 — Chat was blocked while Ollama downloaded; local model was used as the primary chat backend.

Chat no longer blocked by Ollama status — removed the gate that required either local model readiness OR socket connection. Chat always works via cloud backend when socket is connected.
All inference routed through Rust RPC — chatSend() and chatCancel() now call openhuman.channel_web_chat / openhuman.channel_web_cancel via core RPC instead of emitting socket events directly. Zero inference logic in TypeScript.
Rust presentation layer for message segmentation + emoji reactions — new presentation.rs module runs on local Ollama (zero cloud cost). Smart segmentation: only splits prose into human-feeling bubbles; never splits code blocks, structured content (lists/tables), or short messages. Emits chat_segment events with natural delays.
Removed Ollama as primary chat fallback — local model is now supplementary only (reactions, message splitting, autocomplete). No fallback to Ollama on rate limits or cloud errors.

Test plan

Verify chat works when Ollama is not installed / still downloading
Verify chat messages route through cloud backend (check Rust logs for channel_web_chat)
Verify chat_segment events deliver multi-bubble responses when local model is ready
Verify emoji reactions appear on user messages when local model is ready
Verify code block responses are NOT split into multiple bubbles
Verify structured content (lists, tables) is NOT split
Run cargo test --lib 'presentation::tests' — 7 unit tests pass
Run yarn typecheck and yarn lint — clean

🤖 Generated with Claude Code

Summary by CodeRabbit

Release Notes

New Features
- Chat responses now stream in multiple segments as they arrive
- Added automatic reaction emoji support from the service
- Updated typing indicators to display only during active sending
Bug Fixes
- Fixed UTF-8 text truncation in debug output
Chores
- Simplified chat infrastructure by consolidating to cloud-based architecture

…se delivery logic - Eliminated the `deliverLocalResponse` function, which handled segmenting and dispatching local model responses, simplifying the message delivery process. - Updated socket status checks to remove unnecessary local model condition, ensuring clarity in connection handling. - Adjusted comments to reflect the new routing logic for cloud backend interactions, emphasizing that local models are now used only for supplementary features.

- Replaced Socket.IO message sending with core RPC calls for sending and canceling chat messages, enhancing the communication mechanism. - Improved error handling for socket connection checks, ensuring robust event routing. - Updated documentation to reflect the new RPC-based approach for chat message processing.

…tions - Introduced a new `ChatSegmentEvent` interface to support segmented message delivery, allowing for more natural chat interactions. - Updated the `Conversations` component to handle segmented responses and apply emoji reactions based on user messages. - Refactored the message delivery logic to improve clarity and maintainability, ensuring that responses are dispatched correctly based on segmentation. - Enhanced the chat service to emit segment events, facilitating a smoother user experience during conversations.

… and optimizing rendering logic - Removed unnecessary imports related to local model status and message segmentation, streamlining the codebase. - Updated the rendering logic to enhance clarity by simplifying conditions for displaying sending and delivering states in the chat interface. - Improved overall readability and maintainability of the Conversations component.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

coderabbitai · 2026-04-06T18:11:12Z

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 13354d95-c4f3-4a45-85d5-7e10db0cf4c0

📥 Commits

Reviewing files that changed from the base of the PR and between 0f9d8bb and ec718e8.

📒 Files selected for processing (1)

src/openhuman/subconscious/integration_test.rs

📝 Walkthrough

Walkthrough

This PR removes the client-side local model chat path and migrates message handling through cloud socket RPC calls, introducing server-side response segmentation and emoji reactions delivered via new chat_segment events.

Changes

Cohort / File(s)	Summary
Chat Frontend & Service `app/src/pages/Conversations.tsx`, `app/src/services/chatService.ts`	Removed local model state and delivery; replaced Socket.IO emissions with `callCoreRpc` for `openhuman.channel_web_chat` and `openhuman.channel_web_cancel`; added `ChatSegmentEvent` support with `onSegment` handler for dispatching incremental responses and reaction emojis; simplified `onDone` to finalize sending state only.
Event Model & Serialization `src/core/socketio.rs`	Added optional fields to `WebChannelEvent`: `reaction_emoji`, `segment_index`, `segment_total` (serialized with snake_case, skipped when unset).
Response Delivery `src/openhuman/channels/providers/presentation.rs`	New module implementing `deliver_response()` with paragraph/sentence-based segmentation logic (respecting code fences, structured content, character limits) and optional emoji reactions; includes unit tests for segmentation rules and delay enforcement.
Provider Integration `src/openhuman/channels/providers/mod.rs`, `src/openhuman/channels/providers/web.rs`	Added `presentation` module declaration; integrated `presentation::deliver_response()` for emitting segmented `chat_segment` events and final `chat_done` with reaction/segment metadata; extended all event publications with new optional fields.
Test Utilities `src/openhuman/subconscious/integration_test.rs`	Added `truncate_at_char_boundary()` helper to prevent UTF-8 boundary panics when truncating debug output.
Documentation `docs/TODO.md`	Added memory-skill integration TODO.

Sequence Diagram

sequenceDiagram
    participant Client as Client (Frontend)
    participant Socket as Socket.IO
    participant CoreRPC as Core RPC Layer
    participant Web as Web Provider
    participant Present as Presentation Module
    
    Client->>Socket: connect (with socket.id)
    Socket-->>Client: connected
    
    Client->>CoreRPC: callCoreRpc('openhuman.channel_web_chat', {client_id, thread_id, message})
    CoreRPC->>Web: route to web.rs chat handler
    Web->>Web: run_chat_task (get full_response)
    Web->>Present: deliver_response(client_id, thread_id, full_response, user_message)
    
    Present->>Present: segment_for_delivery (check code/length/structure)
    
    alt Multiple Segments
        Present->>Web: emit chat_segment events (with segment_index, reaction_emoji on first)
        Web->>Socket: emit 'chat:segment' for each segment
        Socket-->>Client: onSegment fires for each segment
        Client->>Client: dispatch addInferenceResponse
        Present->>Web: emit chat_done (segment_total, full_response)
        Web->>Socket: emit 'chat:done'
        Socket-->>Client: onDone finalizes state
    else Single Segment (≤1)
        Present->>Web: emit chat_done (no segment events)
        Web->>Socket: emit 'chat:done'
        Socket-->>Client: onDone adds response bubble
    end

Estimated Code Review Effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly Related PRs

Fix: decouple chat from local Ollama, route inference through Rust #369: Implements the same decoupling of chat from local Ollama, RPC routing, presentation delivery, and segment/reaction event handling—directly related and parallel.
feat: local model auto-reaction on user messages by channel type #181: Introduces local-model auto-reaction logic for chat; this PR relocates reaction decision-making from client-side to server-side Rust presentation module.
Feat/humanlike replies #168: Adds client-side local-model chat path with Tauri RPCs and segmentation; this PR removes that path in favor of cloud socket + server-side Rust delivery—conflicting refactor.

Poem

🐰 Hopping past the local model's slow embrace,
Cloud sockets now win the message race,
Segments flutter in from Rust's swift hand,
Reactions bloom where delivery was planned,
Chat flows free, no more standing still! 🌟

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly summarizes the main change: decoupling chat from local Ollama and routing inference through Rust. It directly reflects the core transformation in the changeset.
Linked Issues check	✅ Passed	The PR meets all coding requirements from issue `#366`: chat is decoupled from local Ollama (Conversations.tsx, chatService.ts), cloud routing via Rust RPC is implemented (chatService.ts, web.rs), and local presentation layer runs independently (presentation.rs). Message segmentation, emoji reactions, and smart parsing (code blocks, structured content) are implemented with unit tests.
Out of Scope Changes check	✅ Passed	All changes are in scope for the objectives: chat decoupling (Conversations.tsx, chatService.ts), Rust RPC routing (chatService.ts, web.rs, socketio.rs), local presentation layer (presentation.rs), and code cleanup. One minor documentation update (TODO.md) tangentially relates to memory skill integration.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

🧹 Nitpick comments (5)

docs/TODO.md (1)

68-68: Consider making the TODO more specific and actionable.

The TODO "integrate memory to all the various skills, like autocomplete and others" is somewhat vague. For better tracking and actionability, consider specifying:

What "integrate memory" means concretely (e.g., "use memory embeddings for context-aware autocomplete suggestions")

Which specific skills should be prioritized

Who owns this work (like other TODOs that include names)

Example:
-  [] integrate memory to all the various skills, like autocomplete and others.
+  [] integrate memory embeddings into autocomplete suggestions for context-aware completions (owner: TBD)
+  [] extend memory integration to skill execution context (segmentation, reactions)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/TODO.md` at line 68, Replace the vague TODO "integrate memory to all the
various skills, like autocomplete and others." with a specific, actionable task:
define exactly what "integrate memory" means (e.g., "use vector embeddings +
short-term session store for context-aware autocomplete and conversation
history"), list prioritized skills to update (e.g., "autocomplete, code
completion, search suggestions"), assign an owner and target milestone, and
include clear acceptance criteria (examples, performance constraints, and tests)
so the task is ready to pick up; use the existing TODO text as the identifier
when updating the entry.

src/openhuman/channels/providers/presentation.rs (2)

199-200: Minor: Numbered list detection may have false positives.

The condition trimmed.chars().next().map_or(false, |c| c.is_ascii_digit()) && trimmed.contains(". ") would match lines like "2023. A great year" or "100. Some text" that aren't list items. Consider tightening to match typical list patterns like 1. , 2. , etc.

🔧 Stricter numbered list detection

-                || trimmed.chars().next().map_or(false, |c| c.is_ascii_digit())
-                    && trimmed.contains(". ")
+                || is_numbered_list_item(trimmed)

Add helper:

fn is_numbered_list_item(line: &str) -> bool {
    let mut chars = line.chars().peekable();
    // Consume leading digits
    let mut has_digit = false;
    while chars.peek().map_or(false, |c| c.is_ascii_digit()) {
        chars.next();
        has_digit = true;
    }
    // Must have at least one digit followed by ". "
    has_digit && chars.next() == Some('.') && chars.next() == Some(' ')
}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/openhuman/channels/providers/presentation.rs` around lines 199 - 200, The
current numbered-list test using trimmed.chars().next().map_or(false, |c|
c.is_ascii_digit()) && trimmed.contains(". ") is too permissive and can match
dates or decimals; introduce a helper like is_numbered_list_item(line: &str)
that consumes one or more leading ASCII digits and then requires an exact ". "
sequence, and replace the existing condition (where trimmed is checked) with a
call to is_numbered_list_item(trimmed) so only classic list prefixes such as "1.
" or "12. " are accepted.

33-34: Misleading comment: reaction is awaited sequentially, not in parallel.

The comment says "Fire-and-forget reaction decision (runs in parallel)" but try_reaction is awaited directly before segmentation begins, making it sequential. This blocks the first segment delivery until the local model reaction decision completes (or times out).

If parallel execution is desired, use tokio::spawn or futures::join!. If sequential is intentional (to attach reaction to the first segment), update the comment.

🔧 Option A: Fix comment to reflect actual behavior

-    // Fire-and-forget reaction decision (runs in parallel).
+    // Get reaction decision from local model (blocks until complete or timeout).
     let reaction_emoji = try_reaction(user_message).await;

🔧 Option B: Actually run in parallel with segmentation

+use futures::future::join;
+
 pub async fn deliver_response(
     client_id: &str,
     thread_id: &str,
     request_id: &str,
     full_response: &str,
     user_message: &str,
 ) {
-    // Fire-and-forget reaction decision (runs in parallel).
-    let reaction_emoji = try_reaction(user_message).await;
-
-    // Attempt segmentation.
-    let segments = segment_for_delivery(full_response);
+    // Compute segments synchronously, reaction async in parallel.
+    let segments = segment_for_delivery(full_response);
+    let (reaction_emoji, _) = join(
+        try_reaction(user_message),
+        std::future::ready(()),
+    ).await;

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/openhuman/channels/providers/presentation.rs` around lines 33 - 34, The
comment is wrong: try_reaction(user_message) is awaited sequentially which
blocks segmentation; either update the comment to reflect sequential behavior or
run the reaction decision in parallel by spawning it and awaiting when
needed—e.g., call tokio::spawn(async move { try_reaction(user_message).await })
(store the JoinHandle) before starting segmentation, continue producing
segments, then await the handle when you need the reaction (e.g., to attach to
the first_segment); reference try_reaction, user_message, segmentation, and
first_segment when making the change.

app/src/services/chatService.ts (1)

41-49: Field naming may cause confusion: full_response contains segment text.

The ChatSegmentEvent.full_response field actually contains the segment text, not the full response. The inline comment clarifies this, but the field name inherited from the Rust WebChannelEvent struct could mislead developers. Consider whether a type alias or wrapper interface with clearer naming would help.

💡 Alternative: Add a clearer accessor

/** A single segment of a multi-bubble response, emitted before `chat_done`. */
export interface ChatSegmentEvent {
  thread_id: string;
  /** The segment text (named full_response for wire compatibility with WebChannelEvent). */
  full_response: string;
  request_id: string;
  segment_index: number;
  segment_total: number;
  reaction_emoji?: string | null;
}

/** Helper to extract segment content with clearer semantics. */
export const getSegmentText = (event: ChatSegmentEvent): string => event.full_response;

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@app/src/services/chatService.ts` around lines 41 - 49, The field name
ChatSegmentEvent.full_response is misleading because it holds a segment's text,
so add a clear accessor (e.g., getSegmentText(event: ChatSegmentEvent): string)
that returns event.full_response and use it throughout the codebase, or
introduce a thin wrapper/alias interface (e.g., ChatSegmentView with
segment_text) that maps to the original ChatSegmentEvent for wire compatibility;
ensure the original full_response is kept for compatibility but all internal
consumers call the new accessor/wrapper to avoid confusion.

app/src/pages/Conversations.tsx (1)

502-505: Verify socket connection check aligns with RPC-based sending.

The guard checks socketStatus !== 'connected' before sending, but chatSend now uses core RPC (HTTP) rather than socket emit. The socket is still needed for receiving events and for the client_id, but consider whether the error message accurately reflects the failure mode.

💡 Consider clarifying the error context

   if (socketStatus !== 'connected') {
-    setSendError(chatSendError('socket_disconnected', 'Realtime socket is not connected.'));
+    setSendError(chatSendError('socket_disconnected', 'Realtime connection unavailable — needed for receiving responses.'));
     return;
   }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@app/src/pages/Conversations.tsx` around lines 502 - 505, The current guard
treats any non-'connected' socketStatus as a send failure, but chatSend now uses
core RPC (HTTP) and only requires the socket for receive/client_id-related
realtime features; update the check around socketStatus in Conversations (the
block that calls setSendError and chatSendError) to validate the actual
requirement: if client_id is missing or realtime features are required and
socketStatus !== 'connected' then set a clearer error via
setSendError(chatSendError(...)) that mentions missing client_id or realtime
socket being disconnected; otherwise allow RPC send to proceed and only log/warn
that realtime delivery/acknowledgement may be unavailable. Ensure references
include socketStatus, client_id, chatSend, setSendError, and chatSendError so
the fix is applied in the right code path.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@app/src/pages/Conversations.tsx`:
- Around line 502-505: The current guard treats any non-'connected' socketStatus
as a send failure, but chatSend now uses core RPC (HTTP) and only requires the
socket for receive/client_id-related realtime features; update the check around
socketStatus in Conversations (the block that calls setSendError and
chatSendError) to validate the actual requirement: if client_id is missing or
realtime features are required and socketStatus !== 'connected' then set a
clearer error via setSendError(chatSendError(...)) that mentions missing
client_id or realtime socket being disconnected; otherwise allow RPC send to
proceed and only log/warn that realtime delivery/acknowledgement may be
unavailable. Ensure references include socketStatus, client_id, chatSend,
setSendError, and chatSendError so the fix is applied in the right code path.

In `@app/src/services/chatService.ts`:
- Around line 41-49: The field name ChatSegmentEvent.full_response is misleading
because it holds a segment's text, so add a clear accessor (e.g.,
getSegmentText(event: ChatSegmentEvent): string) that returns
event.full_response and use it throughout the codebase, or introduce a thin
wrapper/alias interface (e.g., ChatSegmentView with segment_text) that maps to
the original ChatSegmentEvent for wire compatibility; ensure the original
full_response is kept for compatibility but all internal consumers call the new
accessor/wrapper to avoid confusion.

In `@docs/TODO.md`:
- Line 68: Replace the vague TODO "integrate memory to all the various skills,
like autocomplete and others." with a specific, actionable task: define exactly
what "integrate memory" means (e.g., "use vector embeddings + short-term session
store for context-aware autocomplete and conversation history"), list
prioritized skills to update (e.g., "autocomplete, code completion, search
suggestions"), assign an owner and target milestone, and include clear
acceptance criteria (examples, performance constraints, and tests) so the task
is ready to pick up; use the existing TODO text as the identifier when updating
the entry.

In `@src/openhuman/channels/providers/presentation.rs`:
- Around line 199-200: The current numbered-list test using
trimmed.chars().next().map_or(false, |c| c.is_ascii_digit()) &&
trimmed.contains(". ") is too permissive and can match dates or decimals;
introduce a helper like is_numbered_list_item(line: &str) that consumes one or
more leading ASCII digits and then requires an exact ". " sequence, and replace
the existing condition (where trimmed is checked) with a call to
is_numbered_list_item(trimmed) so only classic list prefixes such as "1. " or
"12. " are accepted.
- Around line 33-34: The comment is wrong: try_reaction(user_message) is awaited
sequentially which blocks segmentation; either update the comment to reflect
sequential behavior or run the reaction decision in parallel by spawning it and
awaiting when needed—e.g., call tokio::spawn(async move {
try_reaction(user_message).await }) (store the JoinHandle) before starting
segmentation, continue producing segments, then await the handle when you need
the reaction (e.g., to attach to the first_segment); reference try_reaction,
user_message, segmentation, and first_segment when making the change.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 27b328f1-5574-426c-9ace-3391c39cd45b

📥 Commits

Reviewing files that changed from the base of the PR and between 9b73d00 and 422a424.

📒 Files selected for processing (10)

app/src/components/settings/panels/ScreenIntelligencePanel.tsx
app/src/pages/Conversations.tsx
app/src/services/chatService.ts
docs/TODO.md
src/core/screen_intelligence_cli.rs
src/core/socketio.rs
src/openhuman/channels/providers/mod.rs
src/openhuman/channels/providers/presentation.rs
src/openhuman/channels/providers/web.rs
src/openhuman/screen_intelligence/engine.rs

senamakel and others added 5 commits April 6, 2026 10:27

style: apply formatter fixes (prettier + cargo fmt)

422a424

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

senamakel marked this pull request as ready for review April 6, 2026 18:23

coderabbitai Bot reviewed Apr 6, 2026

View reviewed changes

senamakel added 3 commits April 6, 2026 12:18

updated convos

0f9d8bb

Merge remote-tracking branch 'upstream/main' into fix/ollama-2

a38062b

fix test

ec718e8

senamakel merged commit df503a2 into tinyhumansai:main Apr 6, 2026
7 of 8 checks passed

This was referenced Apr 8, 2026

fix(chat): deduplicate assistant messages by subscribing to canonical events only (#432) #439

Merged

feat: real-time inference progress events for web channel #514

Merged

Feat: smart routing #569

Merged

This was referenced Apr 15, 2026

fix(chat): keep inference alive across tab switches #583

Closed

fix: keep chat processing alive across tab switches #587

Merged

coderabbitai Bot mentioned this pull request Apr 26, 2026

feat(conversations): remove suggested-question prompt bubbles #918

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix: decouple chat from local Ollama, route inference through Rust#369

Fix: decouple chat from local Ollama, route inference through Rust#369
senamakel merged 8 commits intotinyhumansai:mainfrom
senamakel:fix/ollama-2

senamakel commented Apr 6, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Apr 6, 2026 •

edited

Loading

Review failed

Walkthrough

Changes

Sequence Diagram

Estimated Code Review Effort

Possibly Related PRs

Poem

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

senamakel commented Apr 6, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai Bot commented Apr 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Walkthrough

Changes

Sequence Diagram

Estimated Code Review Effort

Possibly Related PRs

Poem

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

senamakel commented Apr 6, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 6, 2026 •

edited

Loading