Firecrawl Costs Too Much for Hobby RAG — Here's a $9 Alternative That Uses Your Real Browser
Firecrawl Costs Too Much for Hobby RAG — Here's a $9 Alternative That Uses Your Real Browser
Firecrawl earned its reputation. The crawl-then-extract API is genuinely the cleanest way to programmatically turn webpages into LLM-ready Markdown if you're building at scale. But for the actual majority of people building RAG systems — solo developers, researchers, indie founders, hobbyists — Firecrawl has two problems that aren't going away.
The first is price. The second is access.
This article is about the alternative architecture that solves both: instead of paying a server farm to dodge anti-bot defenses, run the extraction inside the browser you already have logged in. The economics flip and the access problem disappears.
The price problem
Firecrawl's pricing is structured as two separate buckets:
- Crawl — credits for fetching and rendering pages. Hobby is $19/mo for 3k credits; Standard is $99/mo for 16k credits.
- Extract — separate token-bucket for the LLM-based field extraction. Standard is $89/mo, Pro is $379/mo.
If you want both — and most real RAG workflows want both — you're at $188/mo minimum. Hit overage and you're paying more.
For an indie hacker building a research tool over a weekend, that's the difference between shipping and not shipping. The complaint shows up reliably on r/LocalLLaMA and r/RAG: "Firecrawl is egregiously expensive," "$188/mo is impossible for a hobby project," "the dual billing model feels designed to extract maximum revenue."
The access problem
Firecrawl runs on someone else's server. That's the entire architectural premise. The downside is unavoidable: a server-side scraper cannot see anything you've personally signed into.
The list of things this excludes is shockingly long once you start counting:
- Substack newsletters you subscribe to (returns paywalled excerpt)
- X / Twitter long posts (returns broken thread or login wall)
- LinkedIn articles your network shared (returns login wall)
- Medium articles past the metered paywall (returns excerpt)
- Reddit threads, intermittently, when Reddit's anti-bot escalates
- Cloudflare-challenged sites (large fraction of major news outlets)
- Notion pages in your workspace (returns login wall)
- Internal Confluence wikis at work
- Every Chinese social platform — Xiaohongshu, WeChat 公众号, Zhihu, Bilibili — where the anti-bot signing rotates on a monthly cadence
You can hack around individual cases by injecting cookies into Firecrawl, but that breaks the moment the target site rotates auth. People who spend more time maintaining cookie injections than building their RAG eventually give up and look for a different architecture.
The flip: extract from the browser, not the server
Here's the inversion. When you, a real human, open a page in Chrome:
- Your cookies are valid
- Your TLS fingerprint is genuine
- Cloudflare's anti-bot challenge already passed
- The signed API requests fire correctly because you have a session
- JavaScript runs to completion and the DOM hydrates fully
The page you see is the completed extraction. The only question is how to read it programmatically without breaking the browser metaphor.
The answer is a Chrome extension with a local MCP server. The architecture:
- AI agent (Claude Code, Cursor, or whatever you use) calls
agent_convert(url)oragent_batch_convert([url1, url2, …])via MCP. - The MCP server passes the call to a local native messaging host.
- The host opens the URL in a real Chrome tab, in your already-logged-in browser.
- After the page hydrates, the extension reads the rendered DOM, applies site-specific extractors where relevant, and returns clean Markdown.
- The agent receives the Markdown and continues.
This is what Web2MD does. The agent sees an interface that's structurally identical to Firecrawl — agent_batch_convert returns Markdown for up to 50 URLs in a single call. The implementation underneath is fundamentally different.
Cost comparison
| Workload | Firecrawl | Web2MD | |---|---|---| | 1k URLs/mo, no Extract | $19/mo (Hobby) | $9/mo (flat) | | 5k URLs/mo with Extract | $188/mo (Standard + Extract) | $9/mo | | 50k URLs/mo with Extract | $478/mo (Pro + Extract Pro) | $9/mo | | Login-walled sites | Custom cookie injection, breaks monthly | Works (uses your session) | | Cloudflare-challenged sites | Often blocked | Works (you already passed challenge) | | Chinese platforms | Doesn't work | Dedicated extractors for 4 platforms |
The catch: Web2MD scales with your browser, not your wallet. If you need overnight unattended scraping of 100k URLs, this is the wrong tool — Firecrawl or open-source crawlers like Crawl4AI are still the right choice. Web2MD is for the case where the user is in the loop or where the agent is running on the user's machine alongside their browser.
What you give up
Three things, honestly:
-
Pure headless operation. A real Chrome tab opens and closes for each URL. On a fast machine and a fast connection, this is sub-second per URL — but it's not the same as a server-side crawl that can hit thousands of URLs in parallel.
-
Pure open-source. The MCP server is open-source. The extension is closed-source. If purity matters, Crawl4AI is a great open-source alternative — you trade ease-of-use for full transparency.
-
Server-friendly deployment. Web2MD assumes you have Chrome on the same machine as your agent. If your agent runs on Lambda or in a Kubernetes pod, this won't work.
What you get
Three things, also honestly:
-
20× cheaper. Flat $9/mo vs $188/mo means a hobbyist can actually run a RAG ingestion job without hitting credit limits. For solo builders, this is the difference between viable and not.
-
Anti-bot becomes a non-problem. You solve Cloudflare once, in your browser, by being a human. The scraper never sees the challenge.
-
Sites that are off-limits to every server-side scraper. The four Chinese platforms above. LinkedIn. Subscribed Substacks. Internal company tools.
Setup: 4 commands
For Claude Code or Cursor:
# Install the extension from the Chrome Web Store
# (one-time, manual)
# Install the native messaging host
npx web2md-mcp-server install
# Add to your MCP config (Claude Code: ~/.config/claude-code/mcp.json)
{
"mcpServers": {
"web2md": {
"command": "npx",
"args": ["-y", "web2md-mcp-server"]
}
}
}
Then your agent can call agent_convert or agent_batch_convert like any other MCP tool.
When to use which
Use Firecrawl when:
- You need server-side, headless, parallel scraping at scale
- You're scraping pages that don't require login
- You're at scale where $188/mo is reasonable
Use Crawl4AI when:
- You want full open-source
- You're comfortable running infrastructure
- You don't mind maintaining selector configs
Use Web2MD when:
- You're a solo developer, researcher, or hobbyist building a RAG
- Your sources include login-walled, Cloudflare-protected, or Chinese-platform pages
- You'd rather pay a flat $9/mo than manage credit buckets
The three tools serve different points on the cost-vs-control curve. Pick what matches your constraints.
Related: