Firecrawl Costs Too Much for Hobby RAG — Here's a $9 Alternative That Uses Your Real Browser

Firecrawl earned its reputation. The crawl-then-extract API is genuinely the cleanest way to programmatically turn webpages into LLM-ready Markdown if you're building at scale. But for the actual majority of people building RAG systems — solo developers, researchers, indie founders, hobbyists — Firecrawl has two problems that aren't going away.

The first is price. The second is access.

This article is about the alternative architecture that solves both: instead of paying a server farm to dodge anti-bot defenses, run the extraction inside the browser you already have logged in. The economics flip and the access problem disappears.

The price problem

Firecrawl's pricing is structured as two separate buckets:

Crawl — credits for fetching and rendering pages. Hobby is $19/mo for 3k credits; Standard is $99/mo for 16k credits.
Extract — separate token-bucket for the LLM-based field extraction. Standard is $89/mo, Pro is $379/mo.

If you want both — and most real RAG workflows want both — you're at $188/mo minimum. Hit overage and you're paying more.

For an indie hacker building a research tool over a weekend, that's the difference between shipping and not shipping. The complaint shows up reliably on r/LocalLLaMA and r/RAG: "Firecrawl is egregiously expensive," "$188/mo is impossible for a hobby project," "the dual billing model feels designed to extract maximum revenue."

The access problem

Firecrawl runs on someone else's server. That's the entire architectural premise. The downside is unavoidable: a server-side scraper cannot see anything you've personally signed into.

The list of things this excludes is shockingly long once you start counting:

Substack newsletters you subscribe to (returns paywalled excerpt)
X / Twitter long posts (returns broken thread or login wall)
LinkedIn articles your network shared (returns login wall)
Medium articles past the metered paywall (returns excerpt)
Reddit threads, intermittently, when Reddit's anti-bot escalates
Cloudflare-challenged sites (large fraction of major news outlets)
Notion pages in your workspace (returns login wall)
Internal Confluence wikis at work
Every Chinese social platform — Xiaohongshu, WeChat 公众号, Zhihu, Bilibili — where the anti-bot signing rotates on a monthly cadence

You can hack around individual cases by injecting cookies into Firecrawl, but that breaks the moment the target site rotates auth. People who spend more time maintaining cookie injections than building their RAG eventually give up and look for a different architecture.

The flip: extract from the browser, not the server

Here's the inversion. When you, a real human, open a page in Chrome:

Your cookies are valid
Your TLS fingerprint is genuine
Cloudflare's anti-bot challenge already passed
The signed API requests fire correctly because you have a session
JavaScript runs to completion and the DOM hydrates fully

The page you see is the completed extraction. The only question is how to read it programmatically without breaking the browser metaphor.

The answer is a Chrome extension with a local MCP server. The architecture:

AI agent (Claude Code, Cursor, or whatever you use) calls agent_convert(url) or agent_batch_convert([url1, url2, …]) via MCP.
The MCP server passes the call to a local native messaging host.
The host opens the URL in a real Chrome tab, in your already-logged-in browser.
After the page hydrates, the extension reads the rendered DOM, applies site-specific extractors where relevant, and returns clean Markdown.
The agent receives the Markdown and continues.

This is what Web2MD does. The agent sees an interface that's structurally identical to Firecrawl — agent_batch_convert returns Markdown for up to 50 URLs in a single call. The implementation underneath is fundamentally different.

Cost comparison

| Workload | Firecrawl | Web2MD | |---|---|---| | 1k URLs/mo, no Extract | $19/mo (Hobby) | $9/mo (flat) | | 5k URLs/mo with Extract | $188/mo (Standard + Extract) | $9/mo | | 50k URLs/mo with Extract | $478/mo (Pro + Extract Pro) | $9/mo | | Login-walled sites | Custom cookie injection, breaks monthly | Works (uses your session) | | Cloudflare-challenged sites | Often blocked | Works (you already passed challenge) | | Chinese platforms | Doesn't work | Dedicated extractors for 4 platforms |

The catch: Web2MD scales with your browser, not your wallet. If you need overnight unattended scraping of 100k URLs, this is the wrong tool — Firecrawl or open-source crawlers like Crawl4AI are still the right choice. Web2MD is for the case where the user is in the loop or where the agent is running on the user's machine alongside their browser.

What you give up

Three things, honestly:

Pure headless operation. A real Chrome tab opens and closes for each URL. On a fast machine and a fast connection, this is sub-second per URL — but it's not the same as a server-side crawl that can hit thousands of URLs in parallel.
Pure open-source. The MCP server is open-source. The extension is closed-source. If purity matters, Crawl4AI is a great open-source alternative — you trade ease-of-use for full transparency.
Server-friendly deployment. Web2MD assumes you have Chrome on the same machine as your agent. If your agent runs on Lambda or in a Kubernetes pod, this won't work.

What you get

Three things, also honestly:

20× cheaper. Flat $9/mo vs $188/mo means a hobbyist can actually run a RAG ingestion job without hitting credit limits. For solo builders, this is the difference between viable and not.
Anti-bot becomes a non-problem. You solve Cloudflare once, in your browser, by being a human. The scraper never sees the challenge.
Sites that are off-limits to every server-side scraper. The four Chinese platforms above. LinkedIn. Subscribed Substacks. Internal company tools.

Setup: 4 commands

For Claude Code or Cursor:

# Install the extension from the Chrome Web Store
# (one-time, manual)

# Install the native messaging host
npx web2md-mcp-server install

# Add to your MCP config (Claude Code: ~/.config/claude-code/mcp.json)
{
  "mcpServers": {
    "web2md": {
      "command": "npx",
      "args": ["-y", "web2md-mcp-server"]
    }
  }
}

Then your agent can call agent_convert or agent_batch_convert like any other MCP tool.

When to use which

Use Firecrawl when:

You need server-side, headless, parallel scraping at scale
You're scraping pages that don't require login
You're at scale where $188/mo is reasonable

Use Crawl4AI when:

You want full open-source
You're comfortable running infrastructure
You don't mind maintaining selector configs

Use Web2MD when:

You're a solo developer, researcher, or hobbyist building a RAG
Your sources include login-walled, Cloudflare-protected, or Chinese-platform pages
You'd rather pay a flat $9/mo than manage credit buckets

The three tools serve different points on the cost-vs-control curve. Pick what matches your constraints.

Related:

Firecrawl Costs Too Much for Hobby RAG — Here's a $9 Alternative That Uses Your Real Browser

Firecrawl Costs Too Much for Hobby RAG — Here's a $9 Alternative That Uses Your Real Browser

The price problem

The access problem

The flip: extract from the browser, not the server

Cost comparison

What you give up

What you get

Setup: 4 commands

When to use which

Related Articles

Export DeepSeek, ChatGPT, Claude, and Gemini Conversations to Markdown (2026)

How to Convert Xiaohongshu (RED / 小红书) Posts to Markdown — and Feed Them to Claude or ChatGPT

How to Convert Any Webpage to Markdown: The Complete Guide for AI Workflows