Skip to content

pablontiv/backscroll

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

331 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Backscroll

CI Go License: PolyForm NC

A full-text search engine for AI assistant sessions — Claude Code, Pi, and any source with an input manifest.

Backscroll treats your local AI sessions as a searchable archive: it indexes conversation logs incrementally, strips machine-generated noise, and provides instant full-text search with relevance ranking.


Table of Contents


Installation

Backscroll ships as a single static binary with no external dependencies. Runtime input manifests are separate user configuration files loaded from <config_dir>/backscroll/inputs/*.inputs.toml.

Install Script (Recommended)

curl -fsSL https://raw.githubusercontent.com/pablontiv/backscroll/master/install.sh | bash

Detects your platform (Linux x86_64 / macOS aarch64), installs the binary to ~/.local/bin/, and installs the shipped Claude, Pi, and OpenCode input presets into the user input config directory without overwriting existing manifests.

Windows (PowerShell):

irm https://raw.githubusercontent.com/pablontiv/backscroll/master/install.ps1 | iex

Installs the binary to %LOCALAPPDATA%\backscroll\bin\, adds it to your PATH, and installs the shipped Claude, Pi, and OpenCode input presets into %APPDATA%\backscroll\inputs\ without overwriting existing manifests. Compatible with Windows PowerShell 5.1+.

Install input presets

Backscroll ships Claude, Pi, and OpenCode input presets at inputs/claude.inputs.toml, inputs/pi.inputs.toml, and inputs/opencode.inputs.toml. The install scripts copy those files into the user input config directory and skip existing manifests by default; set BACKSCROLL_FORCE_INPUTS=1 only when you intentionally want to replace edited presets. Default input config directories:

OS Input manifest directory
Linux ${XDG_CONFIG_HOME:-$HOME/.config}/backscroll/inputs/
macOS $HOME/Library/Application Support/backscroll/inputs/
Windows %APPDATA%\backscroll\inputs\

Set BACKSCROLL_CONFIG_DIR to override the <config_dir> base; manifests are then read from $BACKSCROLL_CONFIG_DIR/backscroll/inputs/.

If you install from a source checkout, copy presets without clobbering existing files:

config_dir="${BACKSCROLL_CONFIG_DIR:-${XDG_CONFIG_HOME:-$HOME/.config}}"
mkdir -p "$config_dir/backscroll/inputs"
cp -n inputs/claude.inputs.toml inputs/pi.inputs.toml inputs/opencode.inputs.toml "$config_dir/backscroll/inputs/"
backscroll inputs validate
backscroll inputs list
$configDir = if ($env:BACKSCROLL_CONFIG_DIR) { $env:BACKSCROLL_CONFIG_DIR } else { $env:APPDATA }
$inputsDir = Join-Path $configDir "backscroll\inputs"
New-Item -ItemType Directory -Force $inputsDir | Out-Null
foreach ($name in "claude.inputs.toml", "pi.inputs.toml", "opencode.inputs.toml") {
  $dest = Join-Path $inputsDir $name
  if (-not (Test-Path $dest)) { Copy-Item (Join-Path "inputs" $name) $dest }
}
backscroll inputs validate
backscroll inputs list

From Source

go install github.com/pablontiv/backscroll/cmd/backscroll@latest

Quick Start

# 1. Confirm global input manifests are installed and valid
backscroll inputs validate
backscroll inputs list

# 2. Sync — index files declared in <config_dir>/backscroll/inputs/*.inputs.toml
backscroll sync

# 3. Search — find past conversations by keyword
backscroll search "migration plan"

# 4. Search by project — limit results to a specific project
backscroll search "error handling" --project "backscroll"

# 5. Path lookup — narrow search to an indexed source path
backscroll search "migration" --source-path "*/session.jsonl" --robot

# 6. Status — check index health
backscroll status

Core Idea

AI assistants like Claude Code, Pi, and OpenCode produce valuable reasoning logs, but they are scattered across session files with no built-in way to search across them. Backscroll makes them searchable, persistent, and fast.

  • Sessions are indexed incrementally — only changed files are re-processed
  • Noise is stripped automatically — system-reminders, task-notifications, subagent chatter
  • Search uses BM25 ranking with highlighted snippets
  • Output adapts to the consumer — human-readable, JSON, or compact LLM format

Backscroll does not modify your logs. It indexes them.


The Session Index

Each AI assistant stores conversations in its own format. Backscroll normalizes them via input manifests — shipped presets exist for Claude, Pi (both JSONL), and OpenCode (SQLite via decode.format = "opencode"), and any source with a compatible manifest is supported.

Backscroll reads these files and extracts the conversation: user and assistant messages only. Everything else — tool calls, system-reminders, task-notifications, local command output — is stripped as noise.

Incremental sync

Backscroll computes a SHA-256 hash for each session file. On subsequent syncs, only files whose content has changed are re-processed — syncing thousands of sessions takes seconds after the initial run.

backscroll inputs validate
backscroll sync

Subagent handling is controlled by the active input manifest. The shipped Claude preset excludes subagents paths with a discovery glob, and you can edit your installed preset if you intentionally want a different corpus.

See Sync & Indexing docs for input manifests, noise filtering, and project metadata behavior. See Downstream audit integration contract for deterministic indexed-only status/session/event queries.


CLI

# Input config and indexing
backscroll inputs validate                             # Validate global input manifests
backscroll inputs list                                 # List loaded manifests and inputs
backscroll inputs test --input claude --file <PATH>    # Dry-run one file without writing SQLite
backscroll sync                                        # Index files declared by active inputs
backscroll status [--json]                            # Show index health and metrics

# Retrieval
backscroll search <QUERY> [--project] [--json|--robot] [--fields] [--max-tokens] [--source-path <PATH_OR_PATTERN>]
backscroll list --indexed-only --json                  # Query the existing index without auto-sync
backscroll sessions query --jsonl --all-projects       # Stream indexed records in deterministic order
backscroll events query --jsonl --indexed-only          # Stream normalized events without auto-sync

Output Formats

Search results can be consumed in three formats, depending on whether the reader is a human, a script, or an LLM:

# Human-readable (default) — terminal bold for match highlights
backscroll search "query terms"

# JSON lines — one JSON object per result, for pipelines and scripting
backscroll search "query terms" --json

# Robot — compact tab-separated format, designed for LLM consumption
backscroll search "query terms" --robot --max-tokens 2000

The --fields flag controls field density (minimal or full), and --max-tokens caps output by approximate token count. See Search docs for output shapes and flag reference.

Indexed path lookup

Use backscroll search ... --source-path <PATH_OR_PATTERN> to retrieve matching messages from an already indexed file path through SQLite. Patterns may use * globs, so UUID-like session filenames can be found with --source-path '*019e0d38-c437-7565-ba11-5dd57d516744*'. For exhaustive local tooling, use backscroll sessions query --jsonl to stream indexed records in deterministic source_path, ordinal, timestamp order without a search term. For audit tooling that needs tool calls/results and command/error metadata, use backscroll events query --jsonl --indexed-only. See Path lookup docs and the audit integration contract.

Status

backscroll status shows index health: files indexed, message count, projects discovered, database size, and last sync time. Use backscroll status --json for a versioned machine-readable status document; add --indexed-only to avoid auto-syncing while inspecting the current SQLite snapshot.


AI-Native

Backscroll is designed as a retrieval layer for AI assistants. The --robot and --json output formats produce stable, compact results suitable for tool use and automation.

Use --max-tokens to fit results within a context window:

# Feed search results into an LLM pipeline
backscroll search "architecture decisions" --robot --max-tokens 4000

# Structured output for programmatic consumption
backscroll search "migration plan" --json --fields full | jq '.snippet'

# Project-scoped retrieval
backscroll search "error handling" --project "backscroll" --robot

All output is deterministic and machine-parseable. No ANSI escape codes in --json or --robot modes.


Configuration

Backscroll separates application configuration from input configuration.

  • Application config (backscroll.toml) controls database and embedding settings. By default, Backscroll creates an index at ~/.backscroll.db.
  • Input config (*.inputs.toml) controls what files are ingested. The canonical runtime location is <config_dir>/backscroll/inputs/*.inputs.toml, where <config_dir> is the OS config directory or BACKSCROLL_CONFIG_DIR when set.

Override app settings by creating ~/.config/backscroll/config.toml or backscroll.toml in the current directory:

database_path = "/home/user/.backscroll.db"

[embedding]
model_name = "all-MiniLM-L6-v2"
similarity_threshold = 0.3

Environment variables are also supported:

export BACKSCROLL_DATABASE_PATH="/tmp/custom.db"

Canonical ingestion inputs live in global user-scoped manifests:

version = 1

[[inputs]]
id = "claude"
source = "session"
active = true

[inputs.discover]
roots = ["/home/user/.claude/projects"]
include = ["**/*.jsonl"]
exclude = ["**/subagents/**"]

[inputs.decode]
format = "jsonl"

[inputs.map]
role = "$.message.role"

[inputs.content]
selector = "$.message.content"

The repository presets are examples to install into the global input directory; Backscroll does not read the repository inputs/ directory at runtime. Historical app-config ingestion keys such as session_dir/session_dirs are not canonical input config and do not silently feed sync.

See Configuration docs for the full resolution order and all options.


Documentation

Topic Description
Sync & Indexing Incremental sync, noise filtering, project detection
Search Engine BM25 ranking, output formats, token limiting
Indexed Path Lookup DB-backed lookup using search_items.source_path
Configuration Config resolution, TOML format, environment variables
Generic Input Contract Global *.inputs.toml contract for provider-neutral ingestion
Session Search Research Feasibility study: axioms, evidence tables, capabilities matrix

Development

just check              # gofmt --check + go vet
just test               # Run all tests
just fmt                # Auto-format code (gofmt -w)
just build              # Build binary
just coverage-summary   # Go test coverage report
just audit              # go mod verify

Commits follow Conventional Commits (type(scope): description).


License

PolyForm Noncommercial 1.0.0 — free for non-commercial use.

About

No description, website, or topics provided.

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors