Feat/343 screen intelligence e2e tests by YellowSnnowmann · Pull Request #359 · tinyhumansai/openhuman

YellowSnnowmann · 2026-04-06T14:19:17Z

Summary

Deliver a verifiable end-to-end screen intelligence pipeline: active window capture → compress → local LLM vision → structured summary (parse_vision_summary_output) → persist_vision_summary into unified memory.
Add automated proof tests (unit/integration with mocks or fixtures, plus scripted E2E where supported) so regressions in capture, vision, or persistence fail loudly instead of silently.
Improve observability along the path with structured [screen_intelligence] logs and visible failure modes in tests.
V1 scope: macOS-only for this subsystem; document how tests behave on Linux CI vs local macOS (permissions, Screen Recording, OPENHUMAN_WORKSPACE isolation).

Problem

Screen intelligence code exists under src/openhuman/screen_intelligence/ (capture workers, vision worker, persist_vision_summary), but operators cannot rely on the full chain: permissions, scheduling, model wiring, and memory writes may fail silently or go untested.
Without scripted proof, regressions in screenshot capture, LLM calls, or persistence are likely.
Local vision must not silently fall back to remote unless explicitly configured and documented.

Solution

Pipeline: Foreground window bounds + capture → compress → local vision model inference (same configuration story as existing local AI / vision paths) → parse structured summary → persist_vision_summary (unified memory namespace).
Local LLM: Vision step uses the local model path; remote fallback only if explicitly configured and documented.
Tests: Layered proof — unit/integration (mock vision backend or fixture images, assert memory writes); scripted E2E (Rust JSON-RPC or harness) with OPENHUMAN_WORKSPACE isolation; optional manual macOS checklist for real Screen Recording permission where automation cannot grant it in CI.
Privacy / safety: Denylist / policy behavior unchanged or improved; no secrets in logs; screenshot retention follows keep_screenshots and related config.

Key files (reference):

Engine and workers: src/openhuman/screen_intelligence/engine.rs
Memory persistence: src/openhuman/screen_intelligence/helpers.rs (persist_vision_summary)
Tests: extend src/openhuman/screen_intelligence/tests.rs with E2E-style coverage

Submission Checklist

Unit tests — Vitest (app/) and/or cargo test (core) for logic you add or change
E2E / integration — Where behavior is user-visible or crosses UI → Tauri → sidecar → JSON-RPC; use existing harnesses (app/test/e2e, mock backend, tests/json_rpc_e2e.rs as appropriate)
N/A — If truly not applicable, say why (e.g. change is documentation-only)
Doc comments — /// / //! (Rust), JSDoc or brief file/module headers (TS) on public APIs and non-obvious modules
Inline comments — Where logic, invariants, or edge cases aren’t clear from names alone (keep them grep-friendly; avoid restating the code)

Feature-specific (issue #343):

E2E pipeline verified on macOS with screen intelligence enabled and permissions granted: capture cycle produces a vision summary stored in unified memory (queryable via defined RPC/memory read path).
Proof tests documented: how to run locally vs CI; CI limitations on non-macOS hosts called out.
No silent remote vision fallback unless explicitly configured and documented.

Impact

Platform: Screen intelligence V1 is macOS-only; Linux CI runs mocked or limited tests per documented matrix.
Security / privacy: Screenshot handling and retention remain governed by existing config (keep_screenshots, etc.); logs must not contain secrets.

Summary by CodeRabbit

New Features
- Added vision persistence metrics to session status, including persist count, last persisted key, and persistence error tracking.
- Enhanced screen recording permission validation with explicit error messaging during session startup.
Bug Fixes
- Improved error handling and control flow in vision analysis and persistence operations with stricter validation.
Tests
- Added comprehensive end-to-end tests for vision analysis, persistence, and screen capture workflows.

…sai#343 Add layered test coverage proving the full capture → vision → memory pipeline: screenshot save/cleanup disk paths, VisionSummary serde roundtrip, JSON-RPC shape tests for status and vision_recent endpoints. - tests/screen_intelligence_vision_e2e.rs: save_screenshot_to_disk creates a PNG and keep_screenshots=false cleanup removes it; VisionSummary struct serializes/persists/is queryable end-to-end; platform support table + macOS checklist added to module doc - tests/json_rpc_e2e.rs: screen_intelligence_status shape test (platform_supported, session.active, permissions.screen_recording); vision_recent returns empty summaries without an active session - src/openhuman/screen_intelligence/tests.rs: save_screenshot_to_disk unit tests for the write path and the no-image-ref error path Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…or handling - Added new fields to track vision persistence count, last persisted key, and last persist error in SessionRuntime and SessionStatus. - Implemented error handling for vision summary persistence, ensuring errors are logged and state is updated accordingly. - Introduced a new method to analyze a frame and persist the summary, improving the vision processing pipeline. - Updated tests to validate the new functionality and ensure proper behavior with mocked vision outputs. This commit improves the robustness of the screen intelligence pipeline by enhancing the tracking and handling of vision summary persistence.

coderabbitai · 2026-04-06T14:22:42Z

📝 Walkthrough

Walkthrough

This PR extends the screen intelligence engine with persistent vision analysis capabilities. It introduces synchronous vision frame analysis and memory persistence, adds state tracking for vision persistence metrics, refines session start permission checks, and expands test coverage with unit and E2E tests for the new functionality.

Changes

Cohort / File(s)	Summary
Core Vision Persistence Pipeline `src/openhuman/screen_intelligence/engine.rs`	Added `analyze_and_persist_frame()` public method combining analysis and persistence. SessionRuntime extended with `vision_persist_count`, `last_vision_persisted_key`, and `last_vision_persist_error` fields. `vision_flush()` refactored to two-stage synchronous flow with explicit error handling at analysis and persistence stages. Vision worker updated to await persistence synchronously and track session metrics on completion. `analyze_frame_with_vision()` now validates config (requires `local_ai.enabled=true` and `provider="ollama"`) and supports mocked vision via environment variable.
Persistence Helpers & Types `src/openhuman/screen_intelligence/helpers.rs`, `src/openhuman/screen_intelligence/types.rs`	`persist_vision_summary()` signature changed from `async fn() -> ()` to `async fn() -> Result<PersistVisionSummaryResult, String>` with propagated error handling. Added public constants (`VISION_MEMORY_NAMESPACE`, `VISION_MEMORY_SOURCE_TYPE`, `VISION_MEMORY_CATEGORY`, `VISION_MEMORY_TAG`) and result type `PersistVisionSummaryResult`. Extended `SessionStatus` public API with three new vision persistence fields: `vision_persist_count`, `last_vision_persisted_key`, `last_vision_persist_error`.
Test Infrastructure & Unit Tests `src/openhuman/screen_intelligence/tests.rs`	Added test synchronization mechanism (`SCREEN_INTELLIGENCE_ENV_LOCK`) for environment mutation safety. Introduced helpers: `EnvVarGuard` for env var restoration, `write_screen_intelligence_test_config()` for per-test config, `make_test_png_uri()` for in-memory PNG generation. Added unit tests for `save_screenshot_to_disk()` and async pipeline tests for `analyze_and_persist_frame()` covering mocked vision, local AI provider validation, and error cases.
E2E Test Coverage `tests/json_rpc_e2e.rs`, `tests/screen_intelligence_vision_e2e.rs`	Added two JSON-RPC E2E tests (`openhuman.screen_intelligence_status` and `openhuman.screen_intelligence_vision_recent`) validating response schema including new `vision_persist_count` and `last_vision_persist_error` fields. Expanded `screen_intelligence_vision_e2e.rs` with disk I/O tests, serde roundtrip validation, engine pipeline test using mocked vision, provider-guard test, and macOS-only real capture test. Enhanced test helpers with env lock poisoning recovery and config helper functions.

Sequence Diagram(s)

sequenceDiagram
    participant Client as Client/Test
    participant Engine as AccessibilityEngine
    participant Vision as Vision Analyzer
    participant Memory as UnifiedMemory
    participant Disk as Disk I/O

    Client->>Engine: analyze_and_persist_frame(frame)
    activate Engine
    
    Engine->>Engine: validate config<br/>(local_ai enabled,<br/>provider = ollama)
    alt Config Invalid
        Engine-->>Client: Err(message)
    else Config Valid
        rect rgba(100, 150, 200, 0.5)
        Note over Engine,Vision: Analysis Phase
        Engine->>Vision: analyze_frame_with_vision(frame)
        activate Vision
        Vision-->>Engine: VisionSummary
        deactivate Vision
        end
        
        alt Analysis Error
            Engine->>Engine: update state.last_error
            Engine-->>Client: Err(analysis_error)
        else Analysis Success
            rect rgba(150, 100, 200, 0.5)
            Note over Engine,Memory: Persistence Phase
            Engine->>Memory: persist_vision_summary(summary)
            activate Memory
            Memory->>Disk: write document
            activate Disk
            Disk-->>Memory: ok/err
            deactivate Disk
            Memory-->>Engine: Result<key, error>
            deactivate Memory
            end
            
            alt Persist Error
                Engine->>Engine: update state.last_error
                Engine->>Engine: vision_persist_count++
                Engine-->>Client: Err(persist_error)
            else Persist Success
                Engine->>Engine: vision_persist_count++
                Engine->>Engine: last_vision_persisted_key = key
                Engine-->>Client: Ok(VisionSummary)
            end
        end
    end
    deactivate Engine

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related issues

[Feature] Screen intelligence E2E: capture, local LLM vision, memory persistence, and proof tests #343: Directly implements the same screen-intelligence pipeline code including engine analysis/persist flow and the new analyze_and_persist_frame method with accompanying test coverage.
tinyhumansai/openhuman#227: Implements fixes and expanded test coverage for vision worker behavior and E2E tests as described in the feature issue.
tinyhumansai/openhuman#247: Implements screenshot capture, vision-summary persistence, and comprehensive unit/E2E test workflows for the screen intelligence pipeline.

Possibly related PRs

refactor: extract accessibility middleware module #184: Related through screen_intelligence code refactoring and platform-specific middleware changes affecting capture/permission logic.
fix: stabilize screen intelligence screenshot capture and settings flow #275: Related through modifications to session start/enable logic and SessionRuntime field initialization.
feat: screen intelligence pipeline + CLI + keep_screenshots config #339: Related through vision pipeline implementation and screenshot persistence/summary analysis in engine.rs.

Poem

🐰 The vision persists, frame by frame,
Through memory's halls it finds its aim.
Sync'd with care, no race in sight,
Config guarded, tests shining bright,
Persistence blooms—the state's delight! ✨

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'Feat/343 screen intelligence e2e tests' accurately describes the primary focus of the PR: adding end-to-end tests for screen intelligence functionality. It directly reflects the core contribution of the changeset.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Brings in Discord server/channel picker (tinyhumansai#349) and autocomplete observability improvements (tinyhumansai#308) from main. Resolves conflict in tests/json_rpc_e2e.rs by keeping both the screen intelligence tests (this branch) and the autocomplete runtime settings test (main). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

src/openhuman/screen_intelligence/tests.rs (1)

643-646: Remove redundant imports - already imported at file scope.

These imports duplicate lines 6-9 at the top of the file.

♻️ Proposed cleanup

 #[test]
 fn save_screenshot_to_disk_writes_png_to_workspace() {
-    use base64::{engine::general_purpose::STANDARD as B64, Engine};
-    use image::codecs::png::PngEncoder;
-    use image::{ImageBuffer, Rgb, RgbImage};
-    use tempfile::tempdir;
-
     let tmp = tempdir().expect("tempdir");

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/openhuman/screen_intelligence/tests.rs` around lines 643 - 646, Duplicate
imports of B64/Engine, PngEncoder, ImageBuffer/Rgb/RgbImage, and tempdir are
present inside the test scope and already imported at file scope; remove the
redundant inner use statements (those containing
base64::{engine::general_purpose::STANDARD as B64, Engine},
image::codecs::png::PngEncoder, image::{ImageBuffer, Rgb, RgbImage}, and
tempfile::tempdir) so the test reuses the top-level imports instead.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/json_rpc_e2e.rs`:
- Around line 916-919: The expect message on the chain calling
recent_result.get("summaries").and_then(Value::as_array).expect(...) is a plain
string containing `{recent_result}` that won't interpolate; change it to use a
formatted panic message (e.g., use format! or panic! to include recent_result)
so the actual recent_result value is shown on failure—update the expect argument
for the summaries extraction to something like expect(&format!("expected
summaries array: {:?}", recent_result)).

---

Nitpick comments:
In `@src/openhuman/screen_intelligence/tests.rs`:
- Around line 643-646: Duplicate imports of B64/Engine, PngEncoder,
ImageBuffer/Rgb/RgbImage, and tempdir are present inside the test scope and
already imported at file scope; remove the redundant inner use statements (those
containing base64::{engine::general_purpose::STANDARD as B64, Engine},
image::codecs::png::PngEncoder, image::{ImageBuffer, Rgb, RgbImage}, and
tempfile::tempdir) so the test reuses the top-level imports instead.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 80a9968a-d314-4505-af57-447d30ccc263

📥 Commits

Reviewing files that changed from the base of the PR and between 0c14aea and 240aba0.

📒 Files selected for processing (6)

src/openhuman/screen_intelligence/engine.rs
src/openhuman/screen_intelligence/helpers.rs
src/openhuman/screen_intelligence/tests.rs
src/openhuman/screen_intelligence/types.rs
tests/json_rpc_e2e.rs
tests/screen_intelligence_vision_e2e.rs

coderabbitai · 2026-04-06T15:35:02Z

+    let summaries = recent_result
+        .get("summaries")
+        .and_then(Value::as_array)
+        .expect("expected summaries array: {recent_result}");


⚠️ Potential issue | 🟡 Minor

Fix expect! panic message - variable not interpolated in string literal.

The expect call uses a string literal that won't interpolate {recent_result}. Use a formatted panic message instead.

🐛 Proposed fix

let summaries = recent_result .get("summaries") .and_then(Value::as_array) - .expect("expected summaries array: {recent_result}"); + .unwrap_or_else(|| panic!("expected summaries array: {recent_result}"));

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

let summaries = recent_result

.get("summaries")

.and_then(Value::as_array)

.expect("expected summaries array: {recent_result}");

let summaries = recent_result

.get("summaries")

.and_then(Value::as_array)

.unwrap_or_else(|| panic!("expected summaries array: {}", recent_result));

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@tests/json_rpc_e2e.rs` around lines 916 - 919, The expect message on the chain calling recent_result.get("summaries").and_then(Value::as_array).expect(...) is a plain string containing `{recent_result}` that won't interpolate; change it to use a formatted panic message (e.g., use format! or panic! to include recent_result) so the actual recent_result value is shown on failure—update the expect argument for the summaries extraction to something like expect(&format!("expected summaries array: {:?}", recent_result)).

YellowSnnowmann and others added 2 commits April 6, 2026 15:53

YellowSnnowmann marked this pull request as ready for review April 6, 2026 15:28

coderabbitai Bot reviewed Apr 6, 2026

View reviewed changes

senamakel approved these changes Apr 6, 2026

View reviewed changes

graycyrus merged commit faa881c into tinyhumansai:main Apr 6, 2026
8 of 9 checks passed

This was referenced Apr 6, 2026

Fix: decouple chat from local Ollama, route inference through Rust #369

Merged

feat(screen-intelligence): standalone server with OCR + vision pipeline #382

Merged

feat(memory): global singleton, CLI, graph extraction fixes & light storage #383

Merged

YellowSnnowmann linked an issue Apr 7, 2026 that may be closed by this pull request

[Feature] Screen intelligence E2E: capture, local LLM vision, memory persistence, and proof tests #343

Closed

4 tasks

This was referenced Apr 8, 2026

feat(screen-intelligence): OCR-only mode without vision model #424

Merged

fix(voice): cross-platform microphone permission handling (#489) #491

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat/343 screen intelligence e2e tests#359

Feat/343 screen intelligence e2e tests#359
graycyrus merged 3 commits intotinyhumansai:mainfrom
YellowSnnowmann:feat/343-screen-intelligence-e2e-tests

YellowSnnowmann commented Apr 6, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Apr 6, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related issues

Possibly related PRs

Poem

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Apr 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

YellowSnnowmann commented Apr 6, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Problem

Solution

Submission Checklist

Impact

Related

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Apr 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related issues

Possibly related PRs

Poem

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

YellowSnnowmann commented Apr 6, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 6, 2026 •

edited

Loading