What is Andrej Karpathy's LLM knowledge base workflow?

Four steps: (1) collect raw sources (articles, papers, repos) as .md in a `raw/` folder; (2) feed raw files to an LLM that compiles a structured wiki with concept articles and backlinks; (3) browse the wiki in Obsidian; (4) run Q&A against the wiki, looping new answers back as entries. Karpathy called it 'a hacky collection of scripts' that deserved a real product.

How does Web2MD's vault mode replace Karpathy's shell scripts?

`npx web2md --vault ~/Research/ --batch urls.txt --concurrency 5` does the full step-1 ingest in one command. Reads URLs from `urls.txt`, converts each to Markdown in parallel, saves to `~/Research/raw/YYYY-MM-DD-title.md` with YAML frontmatter, appends Obsidian wikilink entries to `INDEX.md`. No global install, no setup.

What does the Compile Wiki feature do?

It takes your `raw/` Markdown collection and runs an LLM pass that synthesizes concept articles, backlinks, and an index — automating Karpathy's step 2. Launched from the extension's Collections panel. The output is Obsidian-compatible Markdown you can browse, search, and extend with new captures.

Do I need Obsidian to use Web2MD's vault mode?

No — the output is plain Markdown with YAML frontmatter. Works in any editor (VS Code, Typora) or as input to other RAG systems. Obsidian is the recommended viewer because its graph view and backlinks match Karpathy's workflow design, but the .md files are portable.

How is this different from just using Firecrawl or Jina Reader?

Firecrawl and Jina handle the fetch+convert step but stop there. Web2MD's CLI adds Obsidian-aware output (YAML frontmatter, wikilink INDEX.md, file naming for vault conventions) and the Compile Wiki button automates the LLM-synthesis step. End-to-end knowledge base pipeline in one tool.

What does the YAML frontmatter look like for ingested files?

Each saved file gets `title`, `source` URL, `date`, and `wordCount` in YAML at the top — Obsidian renders these as properties immediately. Example: `title: 'Attention Is All You Need'`, `source: 'https://arxiv.org/abs/1706.03762'`, `date: '2026-04-07'`, `wordCount: 3841`. Standard Obsidian properties syntax.

Build Andrej Karpathy's LLM Knowledge Base in One Command

Andrej Karpathy, one of the most followed researchers in AI, recently described how he builds personal knowledge bases for LLM research. His workflow has four steps:

Collect raw sources — web articles, papers, repos — and convert each to .md in a raw/ folder
Feed the raw files to an LLM, which compiles them into a structured wiki: concept articles, backlinks, an index
Open the wiki in Obsidian to browse and search
Run Q&A against the wiki; the AI's answers loop back as new entries

He added: "I think there is room here for an incredible new product instead of a hacky collection of scripts."

That comment stuck with us. Web2MD was already the best tool for step 1 — but only for single URLs, and only through a browser extension. Today we are closing that gap with two new capabilities: vault mode for the CLI and Compile Wiki for the extension's Collections panel.

CLI Vault Mode — One Command to Ingest

The new CLI flags let you replace Karpathy's shell scripts with a single npx command.

npx web2md --vault ~/Research/ --batch urls.txt --concurrency 5

This command:

Reads every URL from urls.txt (one per line, # for comments)
Converts each page to clean Markdown in parallel (5 at a time)
Saves each file to ~/Research/raw/YYYY-MM-DD-article-title.md with YAML frontmatter
Appends an Obsidian wikilink entry to ~/Research/INDEX.md

No global install. No setup. npx handles everything.

What the output looks like

Each saved file has proper YAML frontmatter that Obsidian reads immediately:

---
title: "Attention Is All You Need"
source: "https://arxiv.org/abs/1706.03762"
date: "2026-04-07"
wordCount: 3841
---

# Attention Is All You Need

...

The INDEX.md file grows as an Obsidian-compatible wikilink index:

# Knowledge Base Index

LLM-maintained index of all raw sources. Do not edit manually.

- [[raw/2026-04-07-attention-is-all-you-need.md|Attention Is All You Need]] — source: https://arxiv.org/abs/1706.03762 · 3841 words · 2026-04-07
- [[raw/2026-04-07-scaling-laws-for-neural-language-models.md|Scaling Laws for Neural Language Models]] — source: https://arxiv.org/abs/2001.08361 · 5210 words · 2026-04-07

Drop the folder into Obsidian, and it immediately renders as a linked knowledge base — clickable wikilinks, full-text search, graph view.

All four new flags

| Flag | What it does | |------|-------------| | --batch <file> | Read URLs from a file, one per line. # lines are comments. | | --output-dir <dir> | Write each URL to a separate .md file in this directory, with YAML frontmatter. | | --vault <dir> | Obsidian vault mode: saves to <dir>/raw/ and updates <dir>/INDEX.md. | | --concurrency <n> | Number of parallel fetches (default 3, max 20). |

The flags compose. --batch + --vault together is the Karpathy workflow in one line. --batch + --output-dir without --vault gives you flat files in any directory. Single URLs still work as before.

If any URL fails, it is logged to failed.txt in the output directory with the error message. You can re-run just the failures.

Who this is for

If you are building a personal research library, preparing training data, running a RAG pipeline, or just trying to keep a reading list in a format an AI can query — this is the missing piece. The tooling that used to require custom Python scripts now runs in one terminal command, with no dependencies to install.

For a deeper look at how Markdown reduces token costs and improves LLM output quality, see Why Markdown Improves LLM Output Quality.

Compile Wiki — AI Turns Your Collection into a Knowledge Base

Vault mode handles ingest. Compile Wiki handles the compile step — turning a pile of raw articles into a structured, interlinked wiki.

The button lives in the Collections panel of the Web2MD extension. Hover over any collection to see it (the small brain icon). Click it, and within 20-30 seconds you get a .zip file ready to open in Obsidian.

What is inside the zip:

my-research-wiki.zip
├── INDEX.md              ← links to every raw article and every concept
├── raw/
│   ├── 2026-04-07-article-one.md
│   ├── 2026-04-07-article-two.md
│   └── ...
└── concepts/
    ├── transformer-architecture.md
    ├── scaling-laws.md
    ├── attention-mechanism.md
    └── ...

Each concept article is a genuine AI-written wiki entry — not just a summary, but a synthesized explanation that draws from all the articles in your collection. At the bottom of each concept article, you get a "Related Articles" section with wikilinks back to the original sources.

The INDEX.md links everything together in Obsidian wikilink format. Open the vault, and Obsidian's graph view shows you exactly how concepts connect to sources.

What makes this different from a summary

A summary tells you what one article says. Compile Wiki identifies the themes that run across your entire collection — things like "this concept appears in five of your twelve articles, and here is the synthesis."

If you collected a dozen articles about transformer models, Compile Wiki might extract concepts like "Attention Mechanism," "Positional Encoding," "Scaling Laws," and "Fine-Tuning Strategies" — each with a full wiki article drawing from everything you saved.

This is Karpathy's step 2, the compile step. It turns a folder of raw sources into something you can actually think with.

The full workflow, end to end

# Step 1: Ingest raw sources via CLI
npx web2md --vault ~/Research/ --batch papers.txt --concurrency 5

# Step 2: Open in Obsidian, browse raw/ articles
# (optional: add more pages via the browser extension)

# Step 3: In the Web2MD extension Collections panel,
# click Compile Wiki on your collection
# → downloads my-collection-wiki.zip

# Step 4: Unzip into Obsidian vault
# → browse concepts/, query with an AI, let answers loop back in

A note on the CLI

Web2MD ships as an npm package called web2md-cli. You do not need to install anything globally — npx web2md works on any machine with Node.js 18+.

The CLI was introduced in a previous release for single-URL conversions and agent-friendly pipelines. Vault mode is the natural extension: the same zero-setup philosophy, now supporting the full Karpathy workflow.

Build: cd packages/cli && pnpm build for local development. Published automatically on npm with each release.

What is next

The current Compile Wiki button runs AI on demand, per collection. The next step is making this ambient — an option to auto-compile after each new article is added, so the wiki stays current without any manual steps.

We are also looking at making the concept articles editable within the extension, so you can annotate, correct, or expand them before the next compile cycle.

If you are using vault mode or Compile Wiki in your workflow, we want to hear about it. The best feedback is specific: what is the collection about, how many articles, what the AI got right, what it missed.

Web2MD is a free Chrome extension. The CLI is open source. Compile Wiki is a Pro feature (the AI calls are not free). Try it.

Build Andrej Karpathy's LLM Knowledge Base in One Command

Build Andrej Karpathy's LLM Knowledge Base in One Command

CLI Vault Mode — One Command to Ingest

What the output looks like

All four new flags

Who this is for

Compile Wiki — AI Turns Your Collection into a Knowledge Base

What makes this different from a summary

The full workflow, end to end

A note on the CLI

What is next

Related Articles

Web2MD v0.4.0: Power User Features — Prompt Templates, Batch Convert, Site Extractors, and More

Best Markdown Apps for AI in 2026

Obsidian Web Clipper + Web2MD: The Complete Clipping Stack for AI Workflows in 2026

Most Read

Latest Articles