Skip to content

Feat/343 screen intelligence e2e tests#359

Merged
graycyrus merged 3 commits intotinyhumansai:mainfrom
YellowSnnowmann:feat/343-screen-intelligence-e2e-tests
Apr 6, 2026
Merged

Feat/343 screen intelligence e2e tests#359
graycyrus merged 3 commits intotinyhumansai:mainfrom
YellowSnnowmann:feat/343-screen-intelligence-e2e-tests

Conversation

@YellowSnnowmann
Copy link
Copy Markdown
Contributor

@YellowSnnowmann YellowSnnowmann commented Apr 6, 2026

Summary

  • Deliver a verifiable end-to-end screen intelligence pipeline: active window capture → compress → local LLM vision → structured summary (parse_vision_summary_output) → persist_vision_summary into unified memory.
  • Add automated proof tests (unit/integration with mocks or fixtures, plus scripted E2E where supported) so regressions in capture, vision, or persistence fail loudly instead of silently.
  • Improve observability along the path with structured [screen_intelligence] logs and visible failure modes in tests.
  • V1 scope: macOS-only for this subsystem; document how tests behave on Linux CI vs local macOS (permissions, Screen Recording, OPENHUMAN_WORKSPACE isolation).

Problem

  • Screen intelligence code exists under src/openhuman/screen_intelligence/ (capture workers, vision worker, persist_vision_summary), but operators cannot rely on the full chain: permissions, scheduling, model wiring, and memory writes may fail silently or go untested.
  • Without scripted proof, regressions in screenshot capture, LLM calls, or persistence are likely.
  • Local vision must not silently fall back to remote unless explicitly configured and documented.

Solution

  • Pipeline: Foreground window bounds + capture → compress → local vision model inference (same configuration story as existing local AI / vision paths) → parse structured summary → persist_vision_summary (unified memory namespace).
  • Local LLM: Vision step uses the local model path; remote fallback only if explicitly configured and documented.
  • Tests: Layered proof — unit/integration (mock vision backend or fixture images, assert memory writes); scripted E2E (Rust JSON-RPC or harness) with OPENHUMAN_WORKSPACE isolation; optional manual macOS checklist for real Screen Recording permission where automation cannot grant it in CI.
  • Privacy / safety: Denylist / policy behavior unchanged or improved; no secrets in logs; screenshot retention follows keep_screenshots and related config.

Key files (reference):

  • Engine and workers: src/openhuman/screen_intelligence/engine.rs
  • Memory persistence: src/openhuman/screen_intelligence/helpers.rs (persist_vision_summary)
  • Tests: extend src/openhuman/screen_intelligence/tests.rs with E2E-style coverage

Submission Checklist

  • Unit tests — Vitest (app/) and/or cargo test (core) for logic you add or change
  • E2E / integration — Where behavior is user-visible or crosses UI → Tauri → sidecar → JSON-RPC; use existing harnesses (app/test/e2e, mock backend, tests/json_rpc_e2e.rs as appropriate)
  • N/A — If truly not applicable, say why (e.g. change is documentation-only)
  • Doc comments/// / //! (Rust), JSDoc or brief file/module headers (TS) on public APIs and non-obvious modules
  • Inline comments — Where logic, invariants, or edge cases aren’t clear from names alone (keep them grep-friendly; avoid restating the code)

Feature-specific (issue #343):

  • E2E pipeline verified on macOS with screen intelligence enabled and permissions granted: capture cycle produces a vision summary stored in unified memory (queryable via defined RPC/memory read path).
  • Proof tests documented: how to run locally vs CI; CI limitations on non-macOS hosts called out.
  • No silent remote vision fallback unless explicitly configured and documented.

Impact

  • Platform: Screen intelligence V1 is macOS-only; Linux CI runs mocked or limited tests per documented matrix.
  • Security / privacy: Screenshot handling and retention remain governed by existing config (keep_screenshots, etc.); logs must not contain secrets.

Related


Branch: feat/343-screen-intelligence-e2e-tests

Summary by CodeRabbit

  • New Features

    • Added vision persistence metrics to session status, including persist count, last persisted key, and persistence error tracking.
    • Enhanced screen recording permission validation with explicit error messaging during session startup.
  • Bug Fixes

    • Improved error handling and control flow in vision analysis and persistence operations with stricter validation.
  • Tests

    • Added comprehensive end-to-end tests for vision analysis, persistence, and screen capture workflows.

YellowSnnowmann and others added 2 commits April 6, 2026 15:53
…sai#343

Add layered test coverage proving the full capture → vision → memory
pipeline: screenshot save/cleanup disk paths, VisionSummary serde
roundtrip, JSON-RPC shape tests for status and vision_recent endpoints.

- tests/screen_intelligence_vision_e2e.rs: save_screenshot_to_disk
  creates a PNG and keep_screenshots=false cleanup removes it;
  VisionSummary struct serializes/persists/is queryable end-to-end;
  platform support table + macOS checklist added to module doc
- tests/json_rpc_e2e.rs: screen_intelligence_status shape test
  (platform_supported, session.active, permissions.screen_recording);
  vision_recent returns empty summaries without an active session
- src/openhuman/screen_intelligence/tests.rs: save_screenshot_to_disk
  unit tests for the write path and the no-image-ref error path

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…or handling

- Added new fields to track vision persistence count, last persisted key, and last persist error in SessionRuntime and SessionStatus.
- Implemented error handling for vision summary persistence, ensuring errors are logged and state is updated accordingly.
- Introduced a new method  to analyze a frame and persist the summary, improving the vision processing pipeline.
- Updated tests to validate the new functionality and ensure proper behavior with mocked vision outputs.

This commit improves the robustness of the screen intelligence pipeline by enhancing the tracking and handling of vision summary persistence.
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 6, 2026

📝 Walkthrough

Walkthrough

This PR extends the screen intelligence engine with persistent vision analysis capabilities. It introduces synchronous vision frame analysis and memory persistence, adds state tracking for vision persistence metrics, refines session start permission checks, and expands test coverage with unit and E2E tests for the new functionality.

Changes

Cohort / File(s) Summary
Core Vision Persistence Pipeline
src/openhuman/screen_intelligence/engine.rs
Added analyze_and_persist_frame() public method combining analysis and persistence. SessionRuntime extended with vision_persist_count, last_vision_persisted_key, and last_vision_persist_error fields. vision_flush() refactored to two-stage synchronous flow with explicit error handling at analysis and persistence stages. Vision worker updated to await persistence synchronously and track session metrics on completion. analyze_frame_with_vision() now validates config (requires local_ai.enabled=true and provider="ollama") and supports mocked vision via environment variable.
Persistence Helpers & Types
src/openhuman/screen_intelligence/helpers.rs, src/openhuman/screen_intelligence/types.rs
persist_vision_summary() signature changed from async fn() -> () to async fn() -> Result<PersistVisionSummaryResult, String> with propagated error handling. Added public constants (VISION_MEMORY_NAMESPACE, VISION_MEMORY_SOURCE_TYPE, VISION_MEMORY_CATEGORY, VISION_MEMORY_TAG) and result type PersistVisionSummaryResult. Extended SessionStatus public API with three new vision persistence fields: vision_persist_count, last_vision_persisted_key, last_vision_persist_error.
Test Infrastructure & Unit Tests
src/openhuman/screen_intelligence/tests.rs
Added test synchronization mechanism (SCREEN_INTELLIGENCE_ENV_LOCK) for environment mutation safety. Introduced helpers: EnvVarGuard for env var restoration, write_screen_intelligence_test_config() for per-test config, make_test_png_uri() for in-memory PNG generation. Added unit tests for save_screenshot_to_disk() and async pipeline tests for analyze_and_persist_frame() covering mocked vision, local AI provider validation, and error cases.
E2E Test Coverage
tests/json_rpc_e2e.rs, tests/screen_intelligence_vision_e2e.rs
Added two JSON-RPC E2E tests (openhuman.screen_intelligence_status and openhuman.screen_intelligence_vision_recent) validating response schema including new vision_persist_count and last_vision_persist_error fields. Expanded screen_intelligence_vision_e2e.rs with disk I/O tests, serde roundtrip validation, engine pipeline test using mocked vision, provider-guard test, and macOS-only real capture test. Enhanced test helpers with env lock poisoning recovery and config helper functions.

Sequence Diagram(s)

sequenceDiagram
    participant Client as Client/Test
    participant Engine as AccessibilityEngine
    participant Vision as Vision Analyzer
    participant Memory as UnifiedMemory
    participant Disk as Disk I/O

    Client->>Engine: analyze_and_persist_frame(frame)
    activate Engine
    
    Engine->>Engine: validate config<br/>(local_ai enabled,<br/>provider = ollama)
    alt Config Invalid
        Engine-->>Client: Err(message)
    else Config Valid
        rect rgba(100, 150, 200, 0.5)
        Note over Engine,Vision: Analysis Phase
        Engine->>Vision: analyze_frame_with_vision(frame)
        activate Vision
        Vision-->>Engine: VisionSummary
        deactivate Vision
        end
        
        alt Analysis Error
            Engine->>Engine: update state.last_error
            Engine-->>Client: Err(analysis_error)
        else Analysis Success
            rect rgba(150, 100, 200, 0.5)
            Note over Engine,Memory: Persistence Phase
            Engine->>Memory: persist_vision_summary(summary)
            activate Memory
            Memory->>Disk: write document
            activate Disk
            Disk-->>Memory: ok/err
            deactivate Disk
            Memory-->>Engine: Result<key, error>
            deactivate Memory
            end
            
            alt Persist Error
                Engine->>Engine: update state.last_error
                Engine->>Engine: vision_persist_count++
                Engine-->>Client: Err(persist_error)
            else Persist Success
                Engine->>Engine: vision_persist_count++
                Engine->>Engine: last_vision_persisted_key = key
                Engine-->>Client: Ok(VisionSummary)
            end
        end
    end
    deactivate Engine
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related issues

  • [Feature] Screen intelligence E2E: capture, local LLM vision, memory persistence, and proof tests #343: Directly implements the same screen-intelligence pipeline code including engine analysis/persist flow and the new analyze_and_persist_frame method with accompanying test coverage.
  • tinyhumansai/openhuman#227: Implements fixes and expanded test coverage for vision worker behavior and E2E tests as described in the feature issue.
  • tinyhumansai/openhuman#247: Implements screenshot capture, vision-summary persistence, and comprehensive unit/E2E test workflows for the screen intelligence pipeline.

Possibly related PRs

Poem

🐰 The vision persists, frame by frame,
Through memory's halls it finds its aim.
Sync'd with care, no race in sight,
Config guarded, tests shining bright,
Persistence blooms—the state's delight!

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Feat/343 screen intelligence e2e tests' accurately describes the primary focus of the PR: adding end-to-end tests for screen intelligence functionality. It directly reflects the core contribution of the changeset.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Brings in Discord server/channel picker (tinyhumansai#349) and autocomplete
observability improvements (tinyhumansai#308) from main. Resolves conflict in
tests/json_rpc_e2e.rs by keeping both the screen intelligence tests
(this branch) and the autocomplete runtime settings test (main).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@YellowSnnowmann YellowSnnowmann marked this pull request as ready for review April 6, 2026 15:28
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
src/openhuman/screen_intelligence/tests.rs (1)

643-646: Remove redundant imports - already imported at file scope.

These imports duplicate lines 6-9 at the top of the file.

♻️ Proposed cleanup
 #[test]
 fn save_screenshot_to_disk_writes_png_to_workspace() {
-    use base64::{engine::general_purpose::STANDARD as B64, Engine};
-    use image::codecs::png::PngEncoder;
-    use image::{ImageBuffer, Rgb, RgbImage};
-    use tempfile::tempdir;
-
     let tmp = tempdir().expect("tempdir");
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/openhuman/screen_intelligence/tests.rs` around lines 643 - 646, Duplicate
imports of B64/Engine, PngEncoder, ImageBuffer/Rgb/RgbImage, and tempdir are
present inside the test scope and already imported at file scope; remove the
redundant inner use statements (those containing
base64::{engine::general_purpose::STANDARD as B64, Engine},
image::codecs::png::PngEncoder, image::{ImageBuffer, Rgb, RgbImage}, and
tempfile::tempdir) so the test reuses the top-level imports instead.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/json_rpc_e2e.rs`:
- Around line 916-919: The expect message on the chain calling
recent_result.get("summaries").and_then(Value::as_array).expect(...) is a plain
string containing `{recent_result}` that won't interpolate; change it to use a
formatted panic message (e.g., use format! or panic! to include recent_result)
so the actual recent_result value is shown on failure—update the expect argument
for the summaries extraction to something like expect(&format!("expected
summaries array: {:?}", recent_result)).

---

Nitpick comments:
In `@src/openhuman/screen_intelligence/tests.rs`:
- Around line 643-646: Duplicate imports of B64/Engine, PngEncoder,
ImageBuffer/Rgb/RgbImage, and tempdir are present inside the test scope and
already imported at file scope; remove the redundant inner use statements (those
containing base64::{engine::general_purpose::STANDARD as B64, Engine},
image::codecs::png::PngEncoder, image::{ImageBuffer, Rgb, RgbImage}, and
tempfile::tempdir) so the test reuses the top-level imports instead.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 80a9968a-d314-4505-af57-447d30ccc263

📥 Commits

Reviewing files that changed from the base of the PR and between 0c14aea and 240aba0.

📒 Files selected for processing (6)
  • src/openhuman/screen_intelligence/engine.rs
  • src/openhuman/screen_intelligence/helpers.rs
  • src/openhuman/screen_intelligence/tests.rs
  • src/openhuman/screen_intelligence/types.rs
  • tests/json_rpc_e2e.rs
  • tests/screen_intelligence_vision_e2e.rs

Comment thread tests/json_rpc_e2e.rs
Comment on lines +916 to +919
let summaries = recent_result
.get("summaries")
.and_then(Value::as_array)
.expect("expected summaries array: {recent_result}");
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix expect! panic message - variable not interpolated in string literal.

The expect call uses a string literal that won't interpolate {recent_result}. Use a formatted panic message instead.

🐛 Proposed fix
     let summaries = recent_result
         .get("summaries")
         .and_then(Value::as_array)
-        .expect("expected summaries array: {recent_result}");
+        .unwrap_or_else(|| panic!("expected summaries array: {recent_result}"));
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
let summaries = recent_result
.get("summaries")
.and_then(Value::as_array)
.expect("expected summaries array: {recent_result}");
let summaries = recent_result
.get("summaries")
.and_then(Value::as_array)
.unwrap_or_else(|| panic!("expected summaries array: {}", recent_result));
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/json_rpc_e2e.rs` around lines 916 - 919, The expect message on the
chain calling
recent_result.get("summaries").and_then(Value::as_array).expect(...) is a plain
string containing `{recent_result}` that won't interpolate; change it to use a
formatted panic message (e.g., use format! or panic! to include recent_result)
so the actual recent_result value is shown on failure—update the expect argument
for the summaries extraction to something like expect(&format!("expected
summaries array: {:?}", recent_result)).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature] Screen intelligence E2E: capture, local LLM vision, memory persistence, and proof tests

3 participants