📚 Knowledge Forge

A persistent, compounding knowledge base maintained by LLMs.
Drop sources in. Watch a wiki build itself.

Quick Start · How It Works · Commands · Architecture · Roadmap

Dark-themed web UI with sidebar, type filters, search, and wiki link navigation

Source pages auto-extract concepts and entities with clickable wiki links

Concept pages accumulate cross-references from multiple sources

Inspired by Andrej Karpathy's LLM Wiki pattern.

"Instead of just retrieving from raw documents at query time, the LLM incrementally builds and maintains a persistent wiki — a structured, interlinked collection of markdown files that sits between you and the raw sources." — Andrej Karpathy

What It Does

Knowledge Forge takes raw documents and turns them into a living, interconnected wiki. Not a one-shot RAG pipeline — a compounding knowledge base that gets richer with every source you feed it.

📥 Ingest markdown/text sources → auto-extracts concepts and entities
🔗 Links related pages together with wiki-style [[links]]
📋 Indexes everything into a navigable catalog
🔍 Lints the wiki: finds orphans, dangling links, missing metadata
🌐 Serves a dark-themed web UI to browse and explore
📝 Logs every operation chronologically

Current Status

This repo is intentionally positioned as a functional concept implementation.

That means it already proves the end-to-end pattern:

raw sources → wiki pages
cross-linking between pages
persistent markdown artifact
index + log
browseable UI
health checks / linting

But it does not yet implement the full autonomous LLM maintainer vision described by Karpathy.

What is already real

A working ingestion pipeline
Persistent wiki generation on disk
Concept and entity page creation
Incremental wiki updates from new sources
A usable local web UI
A concrete repo anyone can clone, run, and extend

What is still missing

LLM-powered semantic extraction
- Right now extraction is heuristic (word frequency + bigrams), not model-based
Natural-language querying
- You can browse the wiki, but not yet ask questions like "compare X vs Y" and have answers filed back automatically
Contradiction handling
- The current version does not yet detect or annotate conflicts between sources
Human-in-the-loop workflows
- No review queue, approval flow, or source triage loop yet
Richer search / retrieval
- No BM25/vector search yet, only file-based navigation and simple UI filtering
Autonomous maintenance loop
- No background agent that continuously ingests, revises, and improves the wiki over time

So the right framing is:

Knowledge Forge is a functional prototype of the LLM Wiki pattern, with the core architecture working today and the full LLM-native maintainer loop left as the next step.

Why Not Just RAG?

	RAG	Knowledge Forge
Knowledge	Re-derived every query	Compiled once, kept current
Cross-references	Missing	Built-in `[[wiki links]]`
Contradictions	Undetected	Flagged on ingest
Accumulation	None — each query is independent	Compounds with every source
Maintenance cost	Low (but shallow)	Near zero (LLM does the bookkeeping)

Quick Start

git clone https://github.com/ESJavadex/knowledge-forge.git
cd knowledge-forge
npm install
npm run demo        # bootstrap + 3 sample sources
npm start           # launch web UI at http://localhost:3000

Open http://localhost:3000 and browse the wiki. The sidebar lets you filter by type, search pages, and navigate through wiki links.

Commands

node src/cli.js init              # Create folder structure + special files
node src/cli.js demo              # Create 3 sample sources and ingest them
node src/cli.js ingest <file.md>  # Ingest a markdown source into the wiki
node src/cli.js lint              # Health-check: orphans, dangling links, metadata
node src/cli.js serve             # Start the web UI (port 3000)

Or via npm scripts:

npm run init
npm run demo
npm run ingest
npm run lint
npm start

How It Works

1. Ingest

Drop a .md file into raw/ and run ingest. The engine:

Reads the source and extracts a summary
Identifies concepts (recurring themes) and entities (named things, tools, products) using frequency analysis + bigram detection
Creates a source summary page in wiki/sources/
Creates or updates concept pages in wiki/concepts/
Creates or updates entity pages in wiki/entities/
Links everything together with [[wiki links]]
Updates the index and appends to the log

A single source can touch 20+ wiki pages.

2. Query (Browse)

Open the web UI and explore. Wiki links are clickable and navigate between related pages. Every page shows its type, creation date, mention count, and linked sources.

3. Lint

Run a health check to find:

👻 Orphan pages — no other page links to them
🔗 Dangling links — [[links]] to pages that don't exist yet
📋 Missing frontmatter — pages without YAML metadata

Architecture

knowledge-forge/
├── raw/                    # 📥 Immutable source documents (never modified)
│   └── *.md
├── wiki/                   # 📚 LLM-generated knowledge base
│   ├── sources/            # Summary pages for each ingested source
│   ├── concepts/           # Recurring themes and topics
│   ├── entities/           # Named things, tools, products
│   ├── analyses/           # Synthesized answers (user queries filed back)
│   ├── index.md            # Catalog of all pages
│   └── log.md              # Append-only chronological record
├── schema/
│   └── AGENTS.md           # Rules for the wiki maintainer agent
├── src/
│   ├── cli.js              # CLI entry point
│   ├── ingest.js           # Source ingestion + extraction engine
│   ├── lint.js             # Wiki health checker
│   ├── server.js           # Express web UI + API
│   └── utils.js            # Shared utilities
├── public/
│   └── index.html          # Single-page web UI
└── package.json

Three Layers

Raw sources — Your curated documents. Immutable. The LLM reads from them but never writes to them.
The wiki — Structured markdown pages maintained entirely by the LLM. Source summaries, concept pages, entity pages, cross-references.
The schema — Configuration (AGENTS.md) that tells the LLM how to structure, maintain, and evolve the wiki.

Wiki Link Format

Pages reference each other with Obsidian-style [[Page Name]] links. The web UI resolves these into clickable navigation. Dangling links (to pages that don't exist yet) are marked with ❓.

Web UI

The built-in UI features:

🌙 Dark theme
📂 Sidebar with type filters (Sources, Concepts, Entities, Analyses)
🔍 Full-text search across all pages
📊 Stats bar showing page counts by type
🔗 SPA navigation through wiki links
📱 Responsive layout

Tech Stack

Runtime: Node.js (ESM)
Server: Express.js
Markdown: marked (rendering) + gray-matter (frontmatter parsing)
UI: Vanilla HTML/CSS/JS — zero build step
VCS: Git (your wiki is a git repo with full history)

Demo Sources Included

Source	Concepts	Entities
Transformer Architecture	10	10
Retrieval-Augmented Generation	10	10
Knowledge Graphs in AI	10	10

Run npm run demo to generate all of them.

Roadmap

Author

Javier Santos
javadex.es · GitHub

Head of AI · Electronic Engineer · Building the future, one repo at a time.

License

MIT — use it, fork it, build on top of it.

Built with ☕ by Javier Santos · Inspired by Andrej Karpathy

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

📚 Knowledge Forge

What It Does

Current Status

What is already real

What is still missing

Why Not Just RAG?

Quick Start

Commands

How It Works

1. Ingest

2. Query (Browse)

3. Lint

Architecture

Three Layers

Wiki Link Format

Web UI

Tech Stack

Demo Sources Included

Roadmap

Author

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
docs		docs
public		public
raw		raw
schema		schema
src		src
wiki		wiki
.gitignore		.gitignore
README.md		README.md
package-lock.json		package-lock.json
package.json		package.json

Folders and files

Latest commit

History

Repository files navigation

📚 Knowledge Forge

What It Does

Current Status

What is already real

What is still missing

Why Not Just RAG?

Quick Start

Commands

How It Works

1. Ingest

2. Query (Browse)

3. Lint

Architecture

Three Layers

Wiki Link Format

Web UI

Tech Stack

Demo Sources Included

Roadmap

Author

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages