|
| 1 | +# Local Browser - On-Device AI Web Automation |
| 2 | + |
| 3 | +# Launching support for runanywhere-web-sdk soon in our main repo: please go check it out: https://github.com/RunanywhereAI/runanywhere-sdks |
| 4 | + |
| 5 | +A Chrome extension that uses WebLLM to run AI-powered web automation entirely on-device. No cloud APIs, no API keys, fully private. |
| 6 | + |
| 7 | +## Demo |
| 8 | + |
| 9 | +https://github.com/user-attachments/assets/898cc5c2-db77-4067-96e6-233c5da2bae5 |
| 10 | + |
| 11 | + |
| 12 | +## Features |
| 13 | + |
| 14 | +- **On-Device AI**: Uses WebLLM with WebGPU acceleration for local LLM inference |
| 15 | +- **Multi-Agent System**: Planner + Navigator agents for intelligent task execution |
| 16 | +- **Browser Automation**: Navigate, click, type, extract data from web pages |
| 17 | +- **Privacy-First**: All AI runs locally, no data leaves your device |
| 18 | +- **Offline Support**: Works offline after initial model download |
| 19 | + |
| 20 | +## Quick Start |
| 21 | + |
| 22 | +### Prerequisites |
| 23 | + |
| 24 | +- **Chrome 124+** (required for WebGPU in service workers) |
| 25 | +- **Node.js 18+** and npm |
| 26 | +- **GPU with WebGPU support** (most modern GPUs work) |
| 27 | + |
| 28 | +### Installation |
| 29 | + |
| 30 | +1. **Clone and install dependencies**: |
| 31 | + ```bash |
| 32 | + cd local-browser |
| 33 | + npm install |
| 34 | + ``` |
| 35 | + |
| 36 | +2. **Build the extension**: |
| 37 | + ```bash |
| 38 | + npm run build |
| 39 | + ``` |
| 40 | + |
| 41 | +3. **Load in Chrome**: |
| 42 | + - Open `chrome://extensions` |
| 43 | + - Enable "Developer mode" (top right) |
| 44 | + - Click "Load unpacked" |
| 45 | + - Select the `dist` folder from this project |
| 46 | + |
| 47 | +4. **First run**: |
| 48 | + - Click the extension icon in your toolbar |
| 49 | + - The first run will download the AI model (~1GB) |
| 50 | + - This is cached for future use |
| 51 | + |
| 52 | +### Usage |
| 53 | + |
| 54 | +1. Navigate to any webpage |
| 55 | +2. Click the Local Browser extension icon |
| 56 | +3. Type a task like: |
| 57 | + - "Search for 'WebGPU' on Wikipedia and extract the first paragraph" |
| 58 | + - "Go to example.com and tell me what's there" |
| 59 | + - "Find the search box and search for 'AI news'" |
| 60 | +4. Watch the AI execute the task step by step |
| 61 | + |
| 62 | +## Development |
| 63 | + |
| 64 | +### Development Mode |
| 65 | + |
| 66 | +```bash |
| 67 | +npm run dev |
| 68 | +``` |
| 69 | + |
| 70 | +This watches for changes and rebuilds automatically. |
| 71 | + |
| 72 | +### Project Structure |
| 73 | + |
| 74 | +``` |
| 75 | +local-browser/ |
| 76 | +├── manifest.json # Chrome extension manifest (MV3) |
| 77 | +├── src/ |
| 78 | +│ ├── background/ # Service worker |
| 79 | +│ │ ├── index.ts # Entry point & message handling |
| 80 | +│ │ ├── llm-engine.ts # WebLLM wrapper |
| 81 | +│ │ └── agents/ # AI agent system |
| 82 | +│ │ ├── base-agent.ts |
| 83 | +│ │ ├── planner-agent.ts |
| 84 | +│ │ ├── navigator-agent.ts |
| 85 | +│ │ └── executor.ts |
| 86 | +│ ├── content/ # Content scripts |
| 87 | +│ │ ├── dom-observer.ts # Page state extraction |
| 88 | +│ │ └── action-executor.ts |
| 89 | +│ ├── popup/ # React popup UI |
| 90 | +│ │ ├── App.tsx |
| 91 | +│ │ └── components/ |
| 92 | +│ └── shared/ # Shared types & constants |
| 93 | +└── dist/ # Build output |
| 94 | +``` |
| 95 | + |
| 96 | +### How It Works |
| 97 | + |
| 98 | +1. **User enters a task** in the popup UI |
| 99 | +2. **Planner Agent** analyzes the task and creates a high-level strategy |
| 100 | +3. **Navigator Agent** examines the current page DOM and decides on the next action |
| 101 | +4. **Content Script** executes the action (click, type, extract, etc.) |
| 102 | +5. Loop continues until task is complete or fails |
| 103 | + |
| 104 | +### Agent System |
| 105 | + |
| 106 | +The extension uses a two-agent architecture inspired by Nanobrowser: |
| 107 | + |
| 108 | +- **PlannerAgent**: Strategic planning, creates step-by-step approach |
| 109 | +- **NavigatorAgent**: Tactical execution, chooses specific actions based on page state |
| 110 | + |
| 111 | +Both agents output structured JSON that is parsed and executed. |
| 112 | + |
| 113 | +## Model Configuration |
| 114 | + |
| 115 | +Default model: `Qwen2.5-1.5B-Instruct-q4f16_1-MLC` (~1GB) |
| 116 | + |
| 117 | +Alternative models (configured in `src/shared/constants.ts`): |
| 118 | +- `Phi-3.5-mini-instruct-q4f16_1-MLC` (~2GB, better reasoning) |
| 119 | +- `Llama-3.2-1B-Instruct-q4f16_1-MLC` (~0.7GB, smaller) |
| 120 | + |
| 121 | +## Troubleshooting |
| 122 | + |
| 123 | +### WebGPU not supported |
| 124 | +- Update Chrome to version 124 or later |
| 125 | +- Check `chrome://gpu` to verify WebGPU status |
| 126 | +- Some GPUs may not support WebGPU |
| 127 | + |
| 128 | +### Model fails to load |
| 129 | +- Ensure you have enough disk space (~2GB free) |
| 130 | +- Check browser console for errors |
| 131 | +- Try clearing the extension's storage and reloading |
| 132 | + |
| 133 | +### Actions not executing |
| 134 | +- Some pages block content scripts (chrome://, extension pages) |
| 135 | +- Try on a regular webpage like wikipedia.org |
| 136 | + |
| 137 | +### Extension not working after Chrome update |
| 138 | +- Go to `chrome://extensions` |
| 139 | +- Click the reload button on the extension |
| 140 | + |
| 141 | +## Limitations |
| 142 | + |
| 143 | +- **POC Scope**: This is a proof-of-concept, not production software |
| 144 | +- **No Vision**: Uses text-only DOM analysis (no screenshot understanding) |
| 145 | +- **Single Tab**: Only works with the currently active tab |
| 146 | +- **Basic Actions**: Supports navigate, click, type, extract, scroll, wait |
| 147 | +- **Model Size**: Smaller models may struggle with complex tasks |
| 148 | + |
| 149 | +## Tech Stack |
| 150 | + |
| 151 | +- **WebLLM**: On-device LLM inference with WebGPU |
| 152 | +- **React**: Popup UI |
| 153 | +- **TypeScript**: Type-safe development |
| 154 | +- **Vite + CRXJS**: Chrome extension bundling |
| 155 | +- **Chrome Extension Manifest V3**: Modern extension architecture |
| 156 | + |
| 157 | +## Credits |
| 158 | + |
| 159 | +This project is inspired by: |
| 160 | +- [Nanobrowser](https://github.com/nanobrowser/nanobrowser) - Multi-agent web automation (MIT License) |
| 161 | +- [WebLLM](https://github.com/mlc-ai/web-llm) - In-browser LLM inference (Apache-2.0 License) |
| 162 | + |
| 163 | +### Dependency Licenses |
| 164 | + |
| 165 | +| Package | License | |
| 166 | +|---------|---------| |
| 167 | +| @mlc-ai/web-llm | Apache-2.0 | |
| 168 | +| React | MIT | |
| 169 | +| Vite | MIT | |
| 170 | +| @crxjs/vite-plugin | MIT | |
| 171 | +| TypeScript | Apache-2.0 | |
| 172 | + |
| 173 | +## License |
| 174 | + |
| 175 | +MIT License - See LICENSE file for details. |
0 commit comments