Overview
While Web2MD works on any webpage, certain sites benefit from specialized extraction logic. Site adapters use each platform’s API or DOM structure to produce better Markdown output than generic extraction.
Supported sites
- Post title, author, subreddit, and score
- Self-text or link content
- Up to 30 comments with author and score
- Image and media links
/.json) for reliable extraction.
GitHub
Extracts Issues and Pull Requests including:- Title, status (open/closed/merged), and labels
- Original description/body
- Up to 30 comments with author info
YouTube
Extracts video metadata:- Video title and channel name
- Duration and view count
- English subtitles/captions (when available)
Stack Overflow
Extracts Q&A content:- Question title, body, tags, and vote count
- Accepted answer (highlighted)
- Top answers with vote counts
Twitter / X
Extracts tweet content:- Tweet text and media
- Author info and engagement metrics
Notion
Extracts content from public and logged-in Notion pages (notion.so, notion.site):- Dual extraction: block-level DOM parsing (
data-block-id) with container fallback - Headings, text, code blocks, lists, toggles, callouts, quotes
- Images, tables, bookmarks, and embeds
Feishu / Lark
Extracts content from Feishu Docx and Wiki pages (feishu.cn, larksuite.com):- Dual extraction: PageMain block model with DOM fallback
- Headings, text, code blocks, lists, tables, images
- Callouts and checkboxes
How adapters work
When you convert a page, Web2MD automatically:- Detects the site from the URL
- Selects the best adapter if one exists
- Extracts content using the site-specific method
- Falls back to generic extraction if the adapter fails
Site adapters are available to all users (Free and Pro). They run before the main conversion pipeline, so they work regardless of your plan.
Comparison
| Feature | Generic extraction | Site adapter |
|---|---|---|
| Content accuracy | Good for articles | Optimized for site structure |
| Comments/replies | Not extracted | Included (Reddit, GitHub, SO) |
| Metadata | Basic (title, URL) | Rich (author, score, status) |
| Media | Images only | Videos, captions, embeds |
| Reliability | Depends on page structure | Uses stable APIs |