Why use Hacker News threads instead of Reddit for AI research?

HN threads have higher signal-to-noise on technical topics. Comments are typically longer, more cited, and less prone to drive-by takes. For specific topics — system design, startup mechanics, language-runtime trade-offs — HN consensus is often the most useful single source on the open web.

Can ChatGPT browse or Claude WebFetch read HN threads directly?

Front pages partially. Comment trees mostly not. HN's thread page renders comments client-side as you scroll, and standard fetchers grab only the first ~30 visible comments. For threads with 200+ comments (where the substance lives), the bulk of the content is invisible to server-side tools.

What does a clean HN thread Markdown look like?

Submission title and URL at the top, OP comment if there is one, full comment tree as nested bullets with author handle and point count, [dead] and [flagged] comments marked, parent-child indentation preserved up to 5 levels. Typical 400-comment thread comes out as ~25k tokens of clean Markdown.

Can I bulk-extract 20 HN threads for a research synthesis?

Yes — that's the practical use case. Web2MD's queue feature lets you stack threads as you read, then bulk-export as one Markdown file. 20 substantial HN threads is ~150-300k tokens. Claude Opus 4.7's 1M context easily holds it for cross-thread synthesis ('what does the HN consensus say about X across these 20 threads?').

What about the HN API for programmatic access?

HN's Firebase API (https://hacker-news.firebaseio.com/v0/) is excellent — public, no rate limits, no auth required. For programmatic batch extraction, hit the API directly: get the thread's comment tree as JSON, format as Markdown with a 30-line script. For interactive use, Web2MD's HN extractor does this for you.

What about Show HN and Ask HN threads specifically?

Both are perfect for this workflow. Show HN threads contain user feedback and bug reports useful for competitive research. Ask HN threads aggregate community wisdom on specific questions and produce excellent Claude synthesis input. Web2MD's HN extractor handles all thread types identically.

Hacker News Thread to Markdown for Claude Research (2026)

Hacker News is the highest-signal technical discussion forum on the open web. A 400-comment thread on system design or a runtime quirk often contains more useful wisdom than any single blog post or documentation page. The problem: getting that thread into Claude or ChatGPT in a form they can actually reason over.

This post is the workflow.

Why HN threads beat almost everything else for research synthesis

I have done the same research question across Reddit, X, LinkedIn, and HN multiple times. HN consistently wins for technical synthesis because:

Higher signal density: comments average 3-5 substantive sentences, not 1-line reactions
Cited claims: experienced commenters link papers, RFCs, source code
Self-correction: incorrect claims get pushed back on within hours, not days
Karma signal: vote counts roughly track usefulness for technical content
Less promotional content than LinkedIn, less casual chatter than Reddit

For "what does the senior engineer community think about X?" — HN is the canonical first source.

What standard fetchers see

HN's thread page (news.ycombinator.com/item?id=...) has a server-rendered shell with the first ~30 comments inline, then loads the rest as you scroll. ChatGPT browse and Claude WebFetch get:

Submission title, URL, points
Top-level comments (~10-30)
A [more] link for the rest

For threads under 50 comments this is fine. For 200+ comment threads — where the actual substance is in deeper branches — it's almost useless. You get the visible 15% of the discussion.

What clean HN Markdown looks like

After running through an HN-aware extractor:

# What if your build system was just a few hundred lines of code?

**Source**: https://news.ycombinator.com/item?id=12345678
**Submitted by**: user42 · **Points**: 489 · **Comments**: 312
**Posted**: 2026-05-15

## OP comment

I built a small build system in Go that's about 600 lines total. Here's
what it does differently from Bazel/Buck...

## Top thread

- **bcantrill** (84 points): "This is the right direction. The complexity of
  Bazel is a tax most projects pay for features they never use..."
  - **user123** (42 points): "Counter-point: Bazel's remote caching is the
    whole point. A small local build tool can't replicate that..."
    - **bcantrill** (28 points): "Fair, but you can layer caching on top of a
      simpler core. The Buck folks tried this with [link to paper]..."
  - **another_user** (35 points): "Also worth noting: the simpler approach
    breaks down at ~500 targets. Below that it's clearly better."

- **drnewman** (61 points): "Your benchmarks compare against Bazel cold-start
  but Bazel's actual production cost is incremental rebuilds..."

[continues for full thread]

## [dead] and [flagged] markers preserved

About 25-30k tokens for a 300-comment thread. Author karma trajectories, parent-child relationships, and dead/flagged states all preserved. Claude reads this and produces synthesis grounded in specific high-karma comments.

The workflow

Three paths:

Path 1: Web2MD HN extractor (interactive)

Open the HN thread in Chrome. Click Web2MD. The HN-specific extractor:

Hits HN's Firebase API behind the scenes to get the full comment tree
Preserves nesting up to 5 levels with proper indentation
Captures author handle, point count, posted timestamp
Marks [dead], [flagged], and [downvoted] comments
Formats as clean Markdown ready to paste into Claude or save

End-to-end: ~6 seconds per thread including HN API roundtrip.

Path 2: HN Firebase API + 30-line script

For developers who want batch extraction:

import requests, json

def hn_to_markdown(item_id):
    def fetch(id):
        return requests.get(f"https://hacker-news.firebaseio.com/v0/item/{id}.json").json()

    def render_comment(c, depth=0):
        if not c or c.get("dead") or c.get("deleted"):
            marker = "[dead]" if c.get("dead") else "[deleted]"
            return f"{'  '*depth}- {marker}\n"
        indent = "  " * depth
        author = c.get("by", "unknown")
        text = (c.get("text", "")).replace("\n", f"\n{indent}  ")
        md = f"{indent}- **{author}**: {text}\n"
        for kid_id in c.get("kids", []):
            md += render_comment(fetch(kid_id), depth + 1)
        return md

    root = fetch(item_id)
    md = f"# {root['title']}\n\n**URL**: {root.get('url', 'self post')}\n"
    md += f"**Points**: {root.get('score', 0)} · **By**: {root.get('by')}\n\n"
    for kid_id in root.get("kids", []):
        md += render_comment(fetch(kid_id))
    return md

30 lines, handles the full tree. Hit rate limits at ~10k requests but typical use is well under that.

Path 3: Bulk HN research corpus

# Identify HN threads via algolia search
threads = requests.get("https://hn.algolia.com/api/v1/search?query=your+topic&tags=story").json()
thread_ids = [hit["objectID"] for hit in threads["hits"][:30]]
corpus = "\n\n---\n\n".join(hn_to_markdown(tid) for tid in thread_ids)
# Now paste corpus into Claude

30 threads on one topic, automatically. Combined corpus typically ~500k-1M tokens for substantial discussions.

A real research session

I needed to understand "what's the consensus on monolith vs microservices for early-stage startups in 2026?"

Used HN Algolia search for relevant threads from past 18 months
Selected 18 substantive threads (each with 100+ comments)
Web2MD queue + bulk export: ~25 minutes including skim-reading
Combined corpus: ~340k tokens
Pasted into Claude Opus 4.7 with the prompt: "These are 18 HN threads on monolith vs microservices for startups. What are the 5 most-upvoted arguments for each side, and where does HN actually agree vs disagree? Cite specific comment authors and threads."

Output: an 8-page synthesis with specific citations (user42 in thread X argued...) and identified consensus zones vs disagreement zones. Total time: ~70 minutes. The manual version would have been a full week of reading.

What HN is not good for

Honest about the limits:

Recent breaking news: HN front page shifts daily. For ongoing events, the snapshot becomes stale fast.
Non-technical topics: HN's comment quality varies widely outside its core competencies (tech, startups, programming language design). For consumer product discussion, Reddit is better.
Original research data: HN comments cite primary sources; they aren't primary sources themselves. Follow the cited links for load-bearing claims.
Bias awareness: HN skews male, US-coastal, infrastructure-engineering. The "consensus" reflects that demographic.

Pairing with other workflows

HN content composes well with:

Reddit-to-Claude: Reddit for consumer perspective, HN for engineering perspective
Wikipedia-to-Markdown: Wikipedia for established knowledge, HN for current practitioner opinion
Fill Claude 1M context: HN's API makes building a 200-thread corpus a 30-minute job
LinkedIn posts: HN's argumentative culture complements LinkedIn's promotional culture

Quick wins

If you already use Web2MD, open any HN thread and click the extension. The HN-specific extractor produces what's shown above. Free tier handles 3 conversions/day.

For dev workflows, the HN Firebase API (above) + 30 lines of Python gets you the full pipeline. HN's API has no auth and very lenient rate limits — built for exactly this kind of access.

Install

Web2MD on the Chrome Web Store →

Free tier: 3 conversions/day. Pro at $9/mo unlocks unlimited + queue + bulk export + dedicated HN extractor that hits the Firebase API for full comment trees.

Hacker News Thread to Markdown for Claude Research (2026)

Hacker News Thread to Markdown for Claude Research (2026)

Why HN threads beat almost everything else for research synthesis

What standard fetchers see

What clean HN Markdown looks like

The workflow

Path 1: Web2MD HN extractor (interactive)

Path 2: HN Firebase API + 30-line script

Path 3: Bulk HN research corpus

A real research session

What HN is not good for

Pairing with other workflows

Quick wins

Install

Related Articles

Substack Article to Markdown for AI: Reading Paid Newsletters with Claude (2026)

Extend Perplexity Research With Your Sources

".md This Page": How to Turn the Page You're On Into Markdown Instantly

Most Read

Latest Articles