Why do code blocks lose their formatting when I paste into ChatGPT?

When you copy from a webpage, your OS captures the rendered HTML — including the syntax-highlighted span tags from Highlight.js, Prism, or Shiki. Pasting into ChatGPT strips the visual styling but keeps the underlying token spans, which then look like wall-of-text without language hints. ChatGPT then has to guess whether 'def foo()' is Python or just text starting with 'def'. The fix is to convert the page to Markdown first — proper fenced blocks with language tags survive intact.

Why do tables collapse into prose when I paste from a webpage?

Browser clipboard preserves grid structure as tab-separated text, not as Markdown table syntax. ChatGPT and Claude can't reliably reconstruct an N×M grid from 'Cell1 Cell2 Cell3 Row2Cell1...'. Small tables sometimes survive; nested or large ones always break. Converting to GFM Markdown tables before pasting preserves the grid.

Why does the AI's web search not see paywalled or login-walled pages?

ChatGPT/Claude/Perplexity browse from server IPs without your cookies. Subscribed Substacks return paywalled excerpts. X long posts hit a login wall. Cloudflare-challenged news sites block them. Internal Confluence wikis return 401. The pages you see logged-in are not the pages the AI sees. A browser extension that reads the rendered DOM after you've authenticated solves this for every site.

Does Markdown actually save tokens versus raw HTML?

Yes — typically 60-80% reduction. Raw HTML pages carry navigation, ads, scripts, styling, and structural noise that LLMs spend tokens parsing but never reference in their output. Clean Markdown carries only the content. For long articles fed into Claude or ChatGPT, the savings translate directly to lower API costs and more room for actual analysis.

What about LaTeX math — does it survive the conversion?

Yes, when extracted properly. KaTeX renders LaTeX as styled HTML in the page; copying gives you Unicode glyphs that drop the original TeX source. A Markdown converter that knows about KaTeX/MathJax pulls the original TeX from the application/x-tex annotation element and emits proper $...$ or $$...$$ syntax. Web2MD v1.1.0 added this for Wikipedia, arXiv, and Stack Overflow specifically.

Why Copy-Pasting Webpages into ChatGPT Looks Terrible — and How to Fix It

If you have ever pasted a webpage into ChatGPT or Claude, you already know the moment. The page in your browser is beautifully formatted — code blocks, tables, nested bullet lists, syntax highlighting. You select-all, copy, paste into the chat box, and what comes out the other side is a wall of plain text that the model can't reason about.

Then you spend ten minutes trying to manually reconstruct what the structure was. Or you give up and screenshot the page. Or you paste the URL and hope the model's web search can find it (it often can't, especially for paywalled or login-walled content).

This is one of the most viral complaints in the AI workflow space — the Hacker News thread "Tell HN: Copying and pasting from ChatGPT unsolicited sucks" and the Dev.to post "I was tired of copy-pasting to ChatGPT, so I built a Chrome extension" keep showing up because the underlying problem hasn't been solved cleanly. Yet.

This article explains exactly why the copy-paste workflow breaks, and the only architecture that survives.

What's actually breaking

When you select text in a webpage and copy it, what your operating system puts in the clipboard is rendered HTML, not Markdown source. The webpage is a tree of styled elements. Your browser renders that tree visually, but copy doesn't preserve the structure — it preserves the visual content in a format that pastes cleanly into Word and Google Docs.

ChatGPT and Claude don't read Word format. They read text. So the rich-text clipboard content gets stripped down to plain text on paste, and three specific kinds of structure die in the process:

1. Code blocks lose their identity

A <pre><code class="language-python">def foo():\n return 42</code></pre> element renders as a syntax-highlighted block in the browser. When you copy it, you get the indented text without any indication that it's code. Paste it into Claude, and Claude has to guess whether def foo() is Python code or just text that happens to start with "def." The guess is usually right but not always, and the syntax highlighting is gone forever.

Worse: when the page uses Highlight.js, Prism, or Shiki — the three most common syntax-highlighting libraries — the code is wrapped in dozens of nested <span> elements, one per token. Copying that produces a clean visual paste, but if you then convert the rich-text clipboard to Markdown for an AI, the conversion inherits the spans and the model sees noise.

2. Tables collapse into prose

A simple HTML table renders as a grid in the browser. When you copy it, the clipboard has tab-separated text. Paste into ChatGPT and you get something like:

Column1 Column2 Column3 Row1Cell1 Row1Cell2 Row1Cell3 Row2Cell1 Row2Cell2 Row2Cell3

The grid structure is gone. The model sees nine consecutive words and has to reverse-engineer that they were originally a 3×3 grid. With small tables this works; with larger or nested tables, it fails.

3. LaTeX math becomes Unicode soup

A page using KaTeX or MathJax renders $$\nabla \cdot \vec{E} = \frac{\rho}{\epsilon_0}$$ as a beautiful equation. When you copy it, you get the rendered Unicode glyphs — the actual mathematical symbols, but no longer in TeX form. The model can read them, sometimes, but if you ask it to manipulate the equation symbolically (rearrange, integrate, solve), it has to first reverse-engineer the TeX from the Unicode, and that goes wrong frequently.

The naive fix: paste the URL

You'd think the URL would be enough. ChatGPT can browse, Claude can browse, Perplexity is a browser. Why not just paste the link?

Because the AI's web access is a different browser than yours. A different cookie jar. A different IP. A different session.

What this means in practice:

Subscribed Substacks → the AI sees the paywall, you see the article
X / Twitter long posts → the AI sees the login wall, you see the post
Subscribed newsletters / paywalled news → AI sees the excerpt
Cloudflare-challenged sites → AI gets blocked, you got through hours ago
Internal company docs → AI gets a 401 forbidden
Reddit threads (intermittently when Reddit's anti-bot escalates) → AI sees a stub, you see the thread
All Chinese social platforms — Xiaohongshu, WeChat 公众号, Zhihu, Bilibili — every server-side scraper hits a wall, you can read the page just fine

The URL-paste workflow only works for content that's truly public and that the AI's specific scraper happens to handle well. That's a smaller set than people expect.

The actual fix

The fix is to convert the page in your own browser, where you've already passed every authentication and anti-bot check, and feed the AI clean Markdown.

A Chrome extension is the right shape for this because it has access to:

The fully rendered DOM after JavaScript hydration
Your cookies and session
The <annotation encoding="application/x-tex"> element where KaTeX hides the original TeX source
The language-X class on code blocks where the language is preserved
Your actual visual reading state — what you see is what gets converted

Web2MD does exactly this. The extension reads the rendered page, applies preprocessing to strip Highlight.js / Prism / Shiki span residue, unwraps GitHub Gist line-number tables, expands colspan attributes for table fidelity, and converts KaTeX or MathJax back to TeX source via the annotation element.

The output is clean GitHub-Flavored Markdown that pastes into ChatGPT or Claude with formatting intact. Code blocks have their language tag. Tables stay grids. LaTeX comes through as $...$ and $$...$$ source.

It also has a "Send to AI" button that opens a new ChatGPT or Claude tab and injects the Markdown directly into the prompt box, so the manual paste step disappears entirely.

What about other tools

Reader View / Readability (built into Firefox and Safari): cleans the page for reading, but copy-paste from Reader View has the same fundamental problem — you're copying rich text, not Markdown source. Code blocks still flatten.

MarkDownload: was the standard solution, but removed from the Chrome Web Store in 2025, unmaintained for 2+ years. Its code-block handling has the residual <span> problem documented in GitHub issues #395 and #371.

Obsidian Web Clipper: solid for general articles, but has known silent data loss bugs and breaks on Reddit threads, paywalled content, and YouTube transcripts.

Jina Reader (r.jina.ai/): server-side, so all the URL-paste limitations apply. Plus rate limits.

Manual paste with cleanup: works, eats 5-10 minutes per article, doesn't scale.

Try it

Web2MD on the Chrome Web Store — free tier, 3 conversions a day. The "code block / table / LaTeX fidelity" preprocessing is in v1.1.0 and applies to every conversion in both free and Pro tiers.

The simplest test: open a Wikipedia article with a complex table or an arXiv paper with LaTeX math, press Ctrl+M, paste into Claude. Compare against the same content pasted via plain copy-paste. The difference is the article you can reason about versus the wall of words you can't.

Related:

Why Copy-Pasting Webpages into ChatGPT Looks Terrible — and How to Fix It

Why Copy-Pasting Webpages into ChatGPT Looks Terrible — and How to Fix It

What's actually breaking

1. Code blocks lose their identity

2. Tables collapse into prose

3. LaTeX math becomes Unicode soup

The naive fix: paste the URL

The actual fix

What about other tools

Try it

Related Articles

Cheap Firecrawl Alternative for Hobby RAG

Chrome MCP Webpage to Markdown with Web2MD

Chrome MCP Webpage to Markdown Workflow

Most Read

Latest Articles