CLI - Web2MD

Overview

The Web2MD CLI lets you convert any URL to clean Markdown directly from your terminal. Pipe output to LLMs, batch-process URL lists, or ingest content into your Obsidian vault — all without opening a browser.

npx web2md https://example.com/article

The CLI requires Node.js 18+. Run node -v to check your version.

Installation

No installation needed — just run with npx:

npx web2md <url> [options]

Or install globally for faster startup:

npm install -g web2md
web2md <url> [options]

Modes

Web2MD CLI operates in three modes depending on your configuration:

Local

Default mode. No API key required. Fetches pages and converts locally. Works for most public websites.

Server

With API key. Set WEB2MD_API_KEY to unlock Reddit, Fandom/Wikia, and other restricted sites that require server-side handling.

Bridge

With --bridge flag. Uses your Chrome extension to fetch JS-rendered or login-protected pages that static fetching cannot reach.

Flags

Flag	Description
`--no-images`	Strip image references from output
`--no-links`	Strip hyperlinks from output
`--meta`	Add YAML frontmatter (title, source, wordCount, tokenCount, readingTime, date)
`--json`	Output as JSON `{ markdown, metadata }`
`-o, --output <file>`	Write output to a file
`--output-dir <dir>`	Write each URL to a separate `.md` file
`--batch <file>`	Read URLs from a file (one per line, `#` = comment)
`--vault <dir>`	Obsidian vault mode: saves to `<dir>/raw/` and updates `<dir>/INDEX.md`
`--concurrency <n>`	Max parallel fetches (default: 3, max: 20)
`--bridge`	Use Chrome extension for JS-rendered or login-protected sites
`-q, --quiet`	Suppress progress messages

Environment variables

Variable	Description
`WEB2MD_API_KEY`	API key (`w2m_xxx`) for Reddit and restricted sites
`WEB2MD_API_URL`	Override the API base URL
`WEB2MD_EXTENSION_ID`	Override the Chrome extension ID for `--bridge` mode

Add these to your shell profile (~/.zshrc or ~/.bashrc) so they persist across sessions:

export WEB2MD_API_KEY="w2m_your_key_here"

Usage examples

Basic conversion

npx web2md https://example.com/article

Prints Markdown to stdout.

Pipe to an LLM

npx web2md https://react.dev/learn/thinking-in-react | llm "Summarize this page"

npx web2md https://docs.python.org/3/tutorial/classes.html | claude "Explain the key concepts"

Save to file

npx web2md https://example.com/article -o article.md

npx web2md https://example.com/article --meta -o article.md

The --meta flag prepends YAML frontmatter with title, source URL, word count, token count, reading time, and date.

Batch from file

Create a file urls.txt:

# Research papers
https://arxiv.org/abs/2301.00001
https://arxiv.org/abs/2301.00002

# Blog posts
https://example.com/blog/post-1
https://example.com/blog/post-2

Then run:

npx web2md --batch urls.txt --output-dir ./research --concurrency 5

Each URL is saved as a separate .md file in the ./research directory.

Obsidian vault ingestion

npx web2md --batch urls.txt --vault ~/Documents/MyVault

This saves each page to ~/Documents/MyVault/raw/ and updates ~/Documents/MyVault/INDEX.md with links to all converted pages.

Reddit with API key

export WEB2MD_API_KEY="w2m_your_key_here"
npx web2md https://www.reddit.com/r/LocalLLaMA/comments/example

Reddit requires a valid API key. Without one, Reddit URLs will fail due to Reddit’s bot restrictions.

Bridge mode

Use the Chrome extension to handle JS-rendered or login-protected pages:

npx web2md --bridge https://app.example.com/dashboard

Bridge mode requires the Web2MD Chrome extension to be installed and Chrome to be running. The CLI communicates with the extension via Chrome’s native messaging protocol.

JSON output

npx web2md --json https://example.com/article

Returns structured output:

{
  "markdown": "# Article Title\n\nContent here...",
  "metadata": {
    "title": "Article Title",
    "source": "https://example.com/article",
    "wordCount": 1250,
    "tokenCount": 1680,
    "readingTime": 5,
    "date": "2026-04-11T10:30:00.000Z"
  }
}

Useful for programmatic consumption or piping to jq:

npx web2md --json https://example.com/article | jq '.metadata.tokenCount'

Optimized sites

Web2MD includes built-in adapters for these sites, producing cleaner output than generic conversion:

Sites with optimized support

Wikipedia — clean article extraction, infobox handling
arXiv — paper abstracts and metadata
Hacker News — threads with comments
GitHub — Issues and Pull Requests
Stack Overflow — questions and answers
dev.to — blog posts
Medium — articles (bypasses paywall preview)
Substack — newsletter posts
OpenAI Docs — documentation pages
Mintlify-based docs — documentation sites built on Mintlify
Reddit — posts and comments (requires API key)

Common workflows

Feed documentation to an AI agent

npx web2md --batch docs-urls.txt --output-dir ./context --quiet

Point your AI agent’s context directory at ./context for grounded answers.

Build a research corpus

npx web2md --batch papers.txt --vault ~/Obsidian/Research --meta --concurrency 10

Creates an indexed, searchable research vault in Obsidian.

Strip formatting for LLM input

npx web2md --no-images --no-links https://example.com/article | llm "Analyze this"

Removes images and links to reduce token usage when piping to LLMs.

​Overview

​Installation

​Modes

Local

Server

Bridge

​Flags

​Environment variables

​Usage examples

​Basic conversion

​Pipe to an LLM

​Save to file

​Batch from file

​Obsidian vault ingestion

​Reddit with API key

​Bridge mode

​JSON output

​Optimized sites

​Common workflows

Overview

Installation

Modes

Flags

Environment variables

Usage examples

Basic conversion

Pipe to an LLM

Save to file

Batch from file

Obsidian vault ingestion

Reddit with API key

Bridge mode

JSON output

Optimized sites

Common workflows