Send a Reddit Thread to Claude as Context (Without Reddit's Anti-Bot Blocking You)
Send a Reddit Thread to Claude as Context (Without Reddit's Anti-Bot Blocking You)
You're researching something on Reddit. r/LocalLLaMA has 50 great threads about your topic. You want to ask Claude to synthesize them.
Server-side scrapers don't work. Reddit's official API rate-limits you. Manual copy-paste of 50 threads takes an afternoon.
Here's what does work: read the threads from inside your already-logged-in browser, get clean Markdown out, drop into Claude as project context.
Why server-side scrapers fail on Reddit
Reddit's anti-scrape stack:
- Cloudflare WAF: blocks server IPs at the edge. Firecrawl, Jina, Apify all get rate-limited within minutes at any real volume.
- Authentication wall: most threads now require login to view. Anonymous API access is hobbled.
- JavaScript rendering: post content loads via XHR after page load.
curlgets you an empty shell. - CAPTCHA escalation: detected scraping triggers reCAPTCHA, kills the session.
The only reliable path is to use a real browser with a real authenticated session. That browser already exists on your laptop — you're using Reddit normally with it.
The workflow
Step 1: Install Web2MD
Chrome extension with site-specific Reddit extractor. Free 3 conversions/day, $9/mo unlimited.
Step 2: Open the Reddit thread you want
Just visit it. Your normal browser, your normal session. No proxies.
Step 3: Click the Web2MD icon
Markdown of the thread auto-copies to clipboard. Output looks like:
# r/LocalLLaMA — Best practices for chunking PDFs for RAG
**Author**: u/ml_researcher
**Score**: 487 upvotes · 89 comments
**Posted**: 2 weeks ago
## Post body
I've been experimenting with different chunking strategies for PDF documents...
## Top comments
### u/embedding_engineer (94 upvotes)
For technical PDFs specifically, I found that semantic chunking on section
boundaries works much better than fixed-size...
### u/qdrant_user (76 upvotes)
+1 for semantic. Also worth trying overlap-based...
Step 4: Paste into Claude
Either:
- Single thread: Paste directly into a Claude conversation. Ask "summarize the consensus on chunking strategies."
- Multiple threads: Use Claude Projects → Knowledge → drop the merged Markdown file.
Step 5 (advanced): Batch convert via Claude Code
If you have 50 threads and Claude Code installed:
Claude, convert these Reddit threads to Markdown:
agent_batch_convert(urls=[
"https://reddit.com/r/LocalLLaMA/comments/...",
"https://reddit.com/r/RAG/comments/...",
...
])
Then summarize the dominant approaches across them.
Web2MD's Agent Bridge opens the threads in background tabs of your real Chrome, extracts each, returns clean Markdown for all 50. Claude does the synthesis.
What the extracted Markdown looks like vs raw HTML
Token economics on a typical r/LocalLLaMA thread (post + 25 comments):
| Format | Tokens | What's in it |
|---|---|---|
| Raw HTML (from view-source:) | ~28,000 | Markup, CSS classes, sidebar widgets, ads, "related communities" |
| Markdown (Web2MD) | ~6,500 | Post body + comment bodies + scores + author handles |
That's a 4x reduction. Claude's context window fits 4x more threads. Your retrieval cost drops 4x. The synthesis is sharper because the model isn't distracted by Reddit's UI noise.
What about old Reddit (old.reddit.com)?
Old Reddit serves cleaner HTML, so generic scrapers do work better there. But:
- Old Reddit is being slowly deprecated by Reddit
- Modern subs (post-2020) often have new-Reddit-only formatting
- Mod tools and quarantined subs only work on new Reddit
So the browser-extension approach is more future-proof.
Use cases beyond research
- Customer feedback aggregation: convert r/yourproduct threads to Markdown, feed Claude weekly
- Competitive intelligence: track what r/competitor users complain about
- Content research: feed top threads on a topic to Claude as the brief for a blog post
- Personal archive: save threads you want to remember to your Obsidian vault as Markdown
When this is overkill
If you only need 1-2 threads occasionally, manual copy-paste is fine. The browser extension matters when:
- You're doing 5+ thread conversions per session
- You're building a RAG ingest pipeline that includes Reddit
- You're feeding Claude/Cursor batched context for a research task
Try it
Install Web2MD. Free tier covers most casual use. Pro is $9/mo for unlimited + Agent Bridge for batch programmatic conversion.
The pattern generalizes — same approach works on Twitter/X, Hacker News, Discord exports (with the right extension support), and other "scraper-blocked" sites.