Skip to content

Latest commit

 

History

History
128 lines (103 loc) · 4.41 KB

File metadata and controls

128 lines (103 loc) · 4.41 KB

AGENTS.md

This file provides guidance to Claude Code (claude.ai/code) and other similar tools when working with code in this repository.

Project Overview

Ghostwriter is a Vision-LLM agent for the reMarkable tablet that watches handwritten input and responds by drawing or typing back to the screen. It takes screenshots, sends them to LLM APIs (OpenAI, Anthropic, Google), and uses the responses to interact with the device through simulated pen and keyboard input.

Core Architecture

  • main.rs: Main application entry point with CLI argument parsing and orchestration
  • LLM Engine Layer (src/llm_engine/): Pluggable backends for different AI providers
    • openai.rs: OpenAI API integration (GPT models)
    • anthropic.rs: Anthropic API integration (Claude models)
    • google.rs: Google API integration (Gemini models)
    • mod.rs: Common LLM engine trait and interface
  • Device Interaction (src/):
    • screenshot.rs: Screen capture functionality for both reMarkable2 and Paper Pro
    • pen.rs: SVG drawing and bitmap rendering to screen
    • keyboard.rs: Virtual keyboard input via uinput
    • touch.rs: Touch event detection and gesture recognition
  • Image Processing:
    • segmenter.rs: Image segmentation to provide spatial context to LLMs
    • util.rs: SVG to bitmap conversion and other utilities
  • Configuration:
    • prompts/: JSON-based prompt templates and tool definitions
    • embedded_assets.rs: Bundled configuration files

Common Development Commands

Building

# Local development build
cargo build --release
# or
./build.sh local

# Cross-compile for reMarkable2 (armv7)
cross build --release --target=armv7-unknown-linux-gnueabihf
# or
./build.sh

# Cross-compile for reMarkable Paper Pro (aarch64)  
cross build --release --target=aarch64-unknown-linux-gnu
# or
./build.sh rmpp

Code Quality and Formatting

# Format code with rustfmt
cargo fmt

# Check formatting without applying changes
cargo fmt -- --check

# Run clippy linting
cargo clippy

# Run clippy with stricter warnings
cargo clippy -- -D warnings

# Check code compiles
cargo check --all-targets --all-features

Testing and Evaluation

# Run evaluation suite across multiple models and configurations
./run_eval.sh

# Test with local input file (no device required)
./target/release/ghostwriter \
  --input-png evaluations/test_case/input.png \
  --output-file tmp/result.out \
  --save-bitmap tmp/result.png \
  --no-draw --no-loop --no-trigger

# Run with specific model and options
./ghostwriter --model gpt-4o-mini --apply-segmentation --thinking

Deployment

# Deploy to reMarkable (replace IP address)
scp target/armv7-unknown-linux-gnueabihf/release/ghostwriter root@192.168.1.117:
# or for Paper Pro
scp target/aarch64-unknown-linux-gnu/release/ghostwriter root@192.168.1.117:

Key Development Notes

Cross-compilation Setup Required

  • Install cross: cargo install cross --git https://github.com/cross-rs/cross
  • Add targets: rustup target add armv7-unknown-linux-gnueabihf aarch64-unknown-linux-gnu
  • Docker required for cross-compilation

Device-Specific Considerations

  • reMarkable2 uses armv7 architecture, Paper Pro uses aarch64
  • Paper Pro requires uinput kernel module to be loaded (handled automatically)
  • Screen resolutions and input handling differ between devices
  • Virtual display size is normalized to 768x1024 pixels

Environment Variables

Set API keys for different LLM providers:

export OPENAI_API_KEY=your-key-here
export ANTHROPIC_API_KEY=your-key-here
export GOOGLE_API_KEY=your-key-here

Prompt System

  • Prompts are defined in prompts/ directory as JSON files
  • Tools (draw_text, draw_svg) are defined separately and referenced by prompts
  • Runtime prompt overrides supported by copying files to device
  • Default prompt is general.json

Testing and Evaluation Framework

  • evaluations/ contains test scenarios with input images
  • run_eval.sh runs systematic evaluations across models and configurations
  • Results include merged images showing original input + AI output in red
  • Supports segmentation analysis for improved spatial awareness

reMarkable Integration

  • Touch trigger in upper-right corner activates the assistant
  • Progress indication through keyboard text and screen taps
  • Supports both SVG drawing (via pen simulation) and text output (via virtual keyboard)
  • Background execution supported with nohup ./ghostwriter &