How much do HTML vs Markdown input formats differ in token cost for Claude Opus 4.7?

In my 12-page test, HTML input ran 33-58% more tokens than Markdown for the same content. The variance comes from how much navigation, ads, and tracking markup the source HTML contained. Median was 42% more tokens for HTML.

Does the answer quality differ between HTML and Markdown input?

Yes. With HTML input, Claude's answers were measurably less precise — citation accuracy dropped from 89% (Markdown) to 71% (HTML) across 36 follow-up questions in my test. The model spends attention on parsing tags and ignoring noise, attention that could go to the actual content.

Why does HTML produce worse answers, not just more expensive ones?

Two reasons. First, HTML contains semantic noise (nav, footer, sidebar, ads) that competes with real content for the model's attention. Second, HTML tags themselves are tokenized as discrete units that interrupt the natural flow of sentences. Both effects degrade comprehension.

Is HTML ever better than Markdown for LLM input?

Almost never for typical document content. The one edge case: HTML preserves precise table structure that Markdown sometimes flattens. If you're feeding structured data (financial tables, scientific datasets) where colspan/rowspan matters, HTML may produce more accurate output. For ~95% of webpage content, Markdown wins on both cost and quality.

What's the cleanest way to get good Markdown out of a webpage?

Use a browser-side converter with smart extractors. Web2MD has site-specific extractors for Reddit, X, Stack Overflow, Wikipedia, arXiv, GitHub, Xiaohongshu, WeChat, and others — each strips known noise patterns. The output is what you'd write by hand if you read the page and transcribed only the substance.

How big can a Markdown-converted page get for Claude's 200k Pro context?

Typical long article post-conversion: 8-15k tokens. Long-form research paper: 25-40k tokens. Reddit thread with 100 comments: 5-12k tokens. You can fit 15-25 substantial pages in Claude Pro's context for a deep multi-source synthesis.

HTML vs Markdown for Claude: Token Test Results from 12 Real Webpages (2026)

There are plenty of blog posts saying "Markdown is more efficient than HTML for LLMs." Most don't show real numbers. This one does.

I ran 12 real-world webpages — the kind of content people actually feed into Claude for research — in both their raw HTML form and a clean Markdown conversion. Same Claude Opus 4.7 model, same question, controlled comparison. Here are the numbers.

Methodology

For each of 12 pages:

Fetch the rendered HTML in Chrome (real browser, real DOM).
Test A: Copy the raw HTML (document.documentElement.outerHTML) and paste into Claude.
Test B: Convert with Web2MD's site-specific extractor and paste the Markdown into Claude.
Ask the same 3 follow-up questions per page.
Measure: input tokens, output tokens, answer accuracy (was the cited fact actually in the source), response time.

The 12 pages span representative content types:

| Page type | Example | |---|---| | Reddit thread (50+ comments) | r/MachineLearning on RoPE scaling | | Wikipedia article | "Transformer (machine learning model)" | | Substack post | Lenny's Newsletter "PMF metrics" | | Stack Overflow question | "Why is Python's GIL still here?" | | arXiv paper abstract | LoRA paper | | GitHub README | langchain repo | | MDN docs | Web Components spec | | News article | Bloomberg AI coverage | | Long-form blog | Stratechery Ben Thompson piece | | Xiaohongshu post | Lifestyle review | | WeChat public article | Tech analysis (mp.weixin.qq.com) | | Documentation page | Anthropic's tool-use docs |

Token count results

Median input token count for each page type:

| Page type | HTML tokens | Markdown tokens | Markdown saves | |---|---|---|---| | Reddit thread | 18,400 | 11,200 | 39% | | Wikipedia article | 24,800 | 16,300 | 34% | | Substack post | 9,200 | 5,700 | 38% | | Stack Overflow | 6,400 | 3,800 | 41% | | arXiv abstract | 3,200 | 2,100 | 34% | | GitHub README | 7,800 | 5,200 | 33% | | MDN docs | 12,100 | 7,300 | 40% | | News article | 8,400 | 4,700 | 44% | | Long-form blog | 14,200 | 8,600 | 39% | | Xiaohongshu post | 5,800 | 2,400 | 59% | | WeChat article | 11,200 | 6,400 | 43% | | Documentation | 9,800 | 6,000 | 39% |

Median: HTML costs 42% more tokens than Markdown for the same content.

The Xiaohongshu and WeChat results are especially dramatic — 59% and 43% — because Chinese-platform HTML carries heavy embedded JavaScript and tracking markup that Web2MD's Chinese-platform extractors strip cleanly.

Cost in dollars

At Claude Opus 4.7 input pricing ($15/M tokens), reading these 12 pages once:

HTML version: $1.97 in input cost
Markdown version: $1.21 in input cost
Savings: $0.76 per multi-page session ≈ 39%

Scale to "I do 20 research sessions like this a month" → $15/mo savings just from format choice. At $9/mo Pro pricing for Web2MD, the tool pays for itself purely on token savings before considering quality differences.

Answer quality results

Three follow-up questions per page × 12 pages = 36 question-answer pairs per format. I scored each answer for:

Factual accuracy — was the cited fact actually in the source?
Specificity — did the answer reference specific passages, numbers, names from the source?
Hallucination rate — did the answer invent facts not in the source?

Aggregate results:

| Metric | HTML input | Markdown input | |---|---|---| | Factual accuracy | 71% | 89% | | Specificity score (1-5) | 3.2 | 4.4 | | Hallucination rate | 14% | 6% |

The accuracy gap (71% vs 89%) was the biggest surprise. I expected token savings; I didn't expect the answer-quality difference to be this stark.

Two hypotheses for why HTML hurts quality:

Attention dilution: HTML pages carry navigation, footer text, related-article widgets, comment count badges, social-share buttons, embedded scripts. The model's attention is finite — when 30-50% of input is non-content, the model gets less signal per token of "thinking budget."
Tokenizer fragmentation: tags like <span class="hljs-keyword"> get split into 6-8 tokens that interrupt sentence flow. The model processes sentences differently when they're spliced with markup tokens.

When HTML actually wins

One edge case where HTML produced better results: a structured financial data table with rowspan/colspan that Markdown's GFM table syntax couldn't represent cleanly. The Markdown version flattened a multi-level header into single-level, losing the column-group context. The HTML version preserved it. Claude's answer on that page was more accurate with HTML.

This is rare. For most webpage content — articles, documentation, threads, social posts — Markdown is the right choice. For complex tabular data, consider keeping the HTML or asking your converter to preserve the table structure explicitly.

What about other formats

I also tested:

Plain text (stripped of all formatting): ~10% smaller than Markdown but accuracy dropped to 76% — losing structure hurts comprehension.
JSON (page content serialized as {"title": "...", "body": "..."}): roughly same tokens as Markdown, accuracy similar. Useful if you're building structured pipelines, but no clear win over Markdown for raw context.

The takeaway: structure helps comprehension, syntax doesn't have to cost tokens. Markdown is the sweet spot.

Reproducing the test

The harness is straightforward if you want to validate on your own content:

import anthropic
client = anthropic.Anthropic()

def test_page(html_source, markdown_source, question):
    for label, source in [("HTML", html_source), ("Markdown", markdown_source)]:
        r = client.messages.create(
            model="claude-opus-4-7",
            max_tokens=1000,
            messages=[
                {"role": "user", "content": f"{source}\n\n---\n\n{question}"}
            ],
        )
        print(f"{label}: input_tokens={r.usage.input_tokens}, response={r.content[0].text[:200]}")

The HTML side comes from your browser's document.documentElement.outerHTML. The Markdown side comes from a clipper or a converter — Web2MD, Jina Reader, or a custom Pandoc/Turndown pipeline. Save both per page, run the loop, score by hand for a sample of 5-10 questions per format.

Practical takeaways

Always feed Markdown to Claude, not HTML. 42% token savings + 25% accuracy improvement is not marginal.
The clipper choice matters. A clipper that leaves syntax-highlight residue and navigation noise eats half the benefit. Use one with strong site-specific extractors.
For Chinese content the gap widens. WeChat / Xiaohongshu HTML is ~50% noise. Markdown conversion is closer to 60% savings.
For multi-page research sessions, savings compound. A 20-page corpus that costs $4 in HTML costs $2.40 in Markdown — and produces sharper synthesis.

Install

Web2MD on the Chrome Web Store →

Free tier: 3 conversions/day. Pro at $9/mo unlocks unlimited + queue + token estimates + bulk export.

HTML vs Markdown for Claude: Token Test Results from 12 Real Webpages (2026)

HTML vs Markdown for Claude: Token Test Results from 12 Real Webpages (2026)

Methodology

Token count results

Cost in dollars

Answer quality results

When HTML actually wins

What about other formats

Reproducing the test

Practical takeaways

Install

Related Articles

Use Web2MD with Manus for Logged-In Pages

GPT-5.5 Browse vs Web2MD: When the Built-in Search Wins, and When It Doesn't

How to Actually Fill Claude's 1M Context Window (Without Copy-Pasting 200 Webpages)

Most Read

Latest Articles