Build Andrej Karpathy's LLM Knowledge Base in One Command
Build Andrej Karpathy's LLM Knowledge Base in One Command
Andrej Karpathy, one of the most followed researchers in AI, recently described how he builds personal knowledge bases for LLM research. His workflow has four steps:
- Collect raw sources — web articles, papers, repos — and convert each to
.mdin araw/folder - Feed the raw files to an LLM, which compiles them into a structured wiki: concept articles, backlinks, an index
- Open the wiki in Obsidian to browse and search
- Run Q&A against the wiki; the AI's answers loop back as new entries
He added: "I think there is room here for an incredible new product instead of a hacky collection of scripts."
That comment stuck with us. Web2MD was already the best tool for step 1 — but only for single URLs, and only through a browser extension. Today we are closing that gap with two new capabilities: vault mode for the CLI and Compile Wiki for the extension's Collections panel.
CLI Vault Mode — One Command to Ingest
The new CLI flags let you replace Karpathy's shell scripts with a single npx command.
npx web2md --vault ~/Research/ --batch urls.txt --concurrency 5
This command:
- Reads every URL from
urls.txt(one per line,#for comments) - Converts each page to clean Markdown in parallel (5 at a time)
- Saves each file to
~/Research/raw/YYYY-MM-DD-article-title.mdwith YAML frontmatter - Appends an Obsidian wikilink entry to
~/Research/INDEX.md
No global install. No setup. npx handles everything.
What the output looks like
Each saved file has proper YAML frontmatter that Obsidian reads immediately:
---
title: "Attention Is All You Need"
source: "https://arxiv.org/abs/1706.03762"
date: "2026-04-07"
wordCount: 3841
---
# Attention Is All You Need
...
The INDEX.md file grows as an Obsidian-compatible wikilink index:
# Knowledge Base Index
LLM-maintained index of all raw sources. Do not edit manually.
- [[raw/2026-04-07-attention-is-all-you-need.md|Attention Is All You Need]] — source: https://arxiv.org/abs/1706.03762 · 3841 words · 2026-04-07
- [[raw/2026-04-07-scaling-laws-for-neural-language-models.md|Scaling Laws for Neural Language Models]] — source: https://arxiv.org/abs/2001.08361 · 5210 words · 2026-04-07
Drop the folder into Obsidian, and it immediately renders as a linked knowledge base — clickable wikilinks, full-text search, graph view.
All four new flags
| Flag | What it does |
|------|-------------|
| --batch <file> | Read URLs from a file, one per line. # lines are comments. |
| --output-dir <dir> | Write each URL to a separate .md file in this directory, with YAML frontmatter. |
| --vault <dir> | Obsidian vault mode: saves to <dir>/raw/ and updates <dir>/INDEX.md. |
| --concurrency <n> | Number of parallel fetches (default 3, max 20). |
The flags compose. --batch + --vault together is the Karpathy workflow in one line. --batch + --output-dir without --vault gives you flat files in any directory. Single URLs still work as before.
If any URL fails, it is logged to failed.txt in the output directory with the error message. You can re-run just the failures.
Who this is for
If you are building a personal research library, preparing training data, running a RAG pipeline, or just trying to keep a reading list in a format an AI can query — this is the missing piece. The tooling that used to require custom Python scripts now runs in one terminal command, with no dependencies to install.
For a deeper look at how Markdown reduces token costs and improves LLM output quality, see Why Markdown Improves LLM Output Quality.
Compile Wiki — AI Turns Your Collection into a Knowledge Base
Vault mode handles ingest. Compile Wiki handles the compile step — turning a pile of raw articles into a structured, interlinked wiki.
The button lives in the Collections panel of the Web2MD extension. Hover over any collection to see it (the small brain icon). Click it, and within 20-30 seconds you get a .zip file ready to open in Obsidian.
What is inside the zip:
my-research-wiki.zip
├── INDEX.md ← links to every raw article and every concept
├── raw/
│ ├── 2026-04-07-article-one.md
│ ├── 2026-04-07-article-two.md
│ └── ...
└── concepts/
├── transformer-architecture.md
├── scaling-laws.md
├── attention-mechanism.md
└── ...
Each concept article is a genuine AI-written wiki entry — not just a summary, but a synthesized explanation that draws from all the articles in your collection. At the bottom of each concept article, you get a "Related Articles" section with wikilinks back to the original sources.
The INDEX.md links everything together in Obsidian wikilink format. Open the vault, and Obsidian's graph view shows you exactly how concepts connect to sources.
What makes this different from a summary
A summary tells you what one article says. Compile Wiki identifies the themes that run across your entire collection — things like "this concept appears in five of your twelve articles, and here is the synthesis."
If you collected a dozen articles about transformer models, Compile Wiki might extract concepts like "Attention Mechanism," "Positional Encoding," "Scaling Laws," and "Fine-Tuning Strategies" — each with a full wiki article drawing from everything you saved.
This is Karpathy's step 2, the compile step. It turns a folder of raw sources into something you can actually think with.
The full workflow, end to end
# Step 1: Ingest raw sources via CLI
npx web2md --vault ~/Research/ --batch papers.txt --concurrency 5
# Step 2: Open in Obsidian, browse raw/ articles
# (optional: add more pages via the browser extension)
# Step 3: In the Web2MD extension Collections panel,
# click Compile Wiki on your collection
# → downloads my-collection-wiki.zip
# Step 4: Unzip into Obsidian vault
# → browse concepts/, query with an AI, let answers loop back in
A note on the CLI
Web2MD ships as an npm package called web2md-cli. You do not need to install anything globally — npx web2md works on any machine with Node.js 18+.
The CLI was introduced in a previous release for single-URL conversions and agent-friendly pipelines. Vault mode is the natural extension: the same zero-setup philosophy, now supporting the full Karpathy workflow.
Build: cd packages/cli && pnpm build for local development. Published automatically on npm with each release.
What is next
The current Compile Wiki button runs AI on demand, per collection. The next step is making this ambient — an option to auto-compile after each new article is added, so the wiki stays current without any manual steps.
We are also looking at making the concept articles editable within the extension, so you can annotate, correct, or expand them before the next compile cycle.
If you are using vault mode or Compile Wiki in your workflow, we want to hear about it. The best feedback is specific: what is the collection about, how many articles, what the AI got right, what it missed.
Web2MD is a free Chrome extension. The CLI is open source. Compile Wiki is a Pro feature (the AI calls are not free). Try it.