What are the current input prices per million tokens for the top models?

As of June 2026: Claude Opus 4.7 = $15 input / $75 output. GPT-5.5 = $12 input / $60 output. DeepSeek R2 = $0.5 input / $2 output. Gemini 2 Pro = $7 input / $35 output. Kimi K2 = $0.6 input / $2.5 output. DeepSeek and Kimi are roughly 30x cheaper than Claude on a pure per-token basis.

Should I always use DeepSeek because it's cheapest?

No. The per-token math hides reasoning quality differences. For English-language deep reasoning tasks (math, code generation, multi-step planning), Claude Opus 4.7 and GPT-5.5 still meaningfully outperform DeepSeek on output quality. The right framing is 'cheap enough for which task' — not 'always cheap.'

How much does prompt caching save in practice?

Anthropic and OpenAI both offer prompt caching at ~10% of the standard input rate on cache hits. For workflows where the same context (a 200k-token research corpus, a long system prompt) is reused across multiple turns within ~5 minutes, expect 70-85% real-world cost savings on subsequent turns.

What's the typical cost of a multi-document research session?

A research session loading 30 webpages (~200k tokens of context) and asking 8 synthesis questions: Claude Opus 4.7 ≈ $4-6 / GPT-5.5 ≈ $3.50-5 / DeepSeek R2 ≈ $0.20-0.30. Claude Max and ChatGPT Plus subscriptions bundle these in monthly pricing instead.

When does subscription pricing (Claude Pro, ChatGPT Plus) beat API pricing?

If you run more than 3-5 multi-document research sessions per day, subscription pricing wins by a wide margin. For occasional use (1-2 sessions a week, mostly short chats), API + pay-as-you-go is cheaper. Most knowledge workers cross into 'subscription wins' territory by month 1 of serious AI-assisted research.

How does the Markdown vs HTML format choice affect these costs?

Markdown input runs ~40% fewer tokens than HTML across all models (verified in my [token test](/blog/html-vs-markdown-claude-token-test-2026)). So 'Claude Opus 4.7 with Markdown input' costs roughly the same as 'GPT-5.5 with HTML input' — format choice cancels out one model's price advantage.

Claude vs GPT-5.5 vs DeepSeek R2 Token Costs: Real Numbers for Research Workflows (June 2026)

The pricing tables on each model's website are easy to find. What's harder to find: what these prices actually mean for the workflows knowledge workers run every day.

This post is the practical-cost version. Real workflows, real numbers, real takeaways.

Current input/output pricing (June 2026)

| Model | Input ($/M tokens) | Output ($/M tokens) | Cache hit (input) | |---|---|---|---| | Claude Opus 4.7 | $15.00 | $75.00 | $1.50 | | Claude Sonnet 4.6 | $3.00 | $15.00 | $0.30 | | Claude Haiku 4.5 | $0.80 | $4.00 | $0.08 | | GPT-5.5 | $12.00 | $60.00 | $1.20 | | GPT-5.5 Mini | $1.50 | $7.50 | $0.15 | | Gemini 2 Pro | $7.00 | $35.00 | $0.70 | | Gemini 2 Flash | $0.20 | $1.00 | $0.02 | | DeepSeek R2 | $0.50 | $2.00 | n/a (no cache yet) | | DeepSeek V3 | $0.27 | $1.10 | n/a | | Kimi K2 | $0.60 | $2.50 | n/a | | Qwen 3 Max | $1.50 | $6.00 | $0.15 |

Cache hit prices assume Anthropic-style "ephemeral" caching with 5-minute TTL. Real-world cache hit rates for repeated multi-question sessions are typically 60-85%.

Cost per research session (real numbers)

I measured 5 representative workflows. Each workflow is "load a research corpus, ask 6-8 questions across multiple follow-up turns, get final synthesis." All measurements use Markdown input (not HTML — see token comparison) and prompt caching where available.

Workflow 1: Light research (5 webpages, 4 questions)

Roughly 25k input tokens cached + 8k output total.

| Model | Total cost | Notes | |---|---|---| | Claude Opus 4.7 | $0.65 | Best output quality | | Claude Sonnet 4.6 | $0.18 | 80% of Opus quality on simple synthesis | | GPT-5.5 | $0.51 | | | Gemini 2 Pro | $0.32 | Built-in long context | | DeepSeek R2 | $0.04 | Cheapest by far | | Kimi K2 | $0.05 | Chinese-language reasoning competitive |

Workflow 2: Deep research (30 webpages, 8 questions)

Roughly 200k input tokens cached + 25k output.

| Model | Total cost | Notes | |---|---|---| | Claude Opus 4.7 | $5.40 | Quality matters here | | Claude Sonnet 4.6 | $1.45 | Acceptable for most synthesis | | GPT-5.5 | $4.20 | | | Gemini 2 Pro | $2.85 | | | DeepSeek R2 | $0.32 | Run 16 of these for one Opus session | | Kimi K2 | $0.36 | |

Workflow 3: Chinese-content research (40 articles from 小红书/微信/知乎)

Roughly 280k input tokens (Chinese tokenizes ~1.5x English in Western models) + 30k output.

| Model | Total cost | Notes | |---|---|---| | Claude Opus 4.7 | $9.60 | Chinese tokens cost more | | GPT-5.5 | $7.30 | | | Gemini 2 Pro | $4.95 | | | DeepSeek R2 | $0.42 | Best Chinese tokenizer + cheapest | | Kimi K2 | $0.48 | Tied with DeepSeek for Chinese |

DeepSeek's combination of Chinese-tokenizer-efficiency + low per-token price makes Chinese-content workflows 20-25x cheaper than Western frontier models.

Workflow 4: Daily monitoring (run automated, 1 session per day for 30 days)

20 articles per session, light synthesis, 2 questions each. 30 sessions × ($0.40 average) = monthly cost.

| Model | Monthly cost | |---|---| | Claude Opus 4.7 | $108 | | Claude Sonnet 4.6 | $36 | | GPT-5.5 | $84 | | Gemini 2 Pro | $54 | | DeepSeek R2 | $4.50 |

At "$4.50/month for daily automated monitoring," DeepSeek R2 makes recurring workflows viable that were prohibitively expensive at frontier prices.

Workflow 5: Subscription vs API breakeven

Both Claude Pro ($20/mo) and ChatGPT Plus ($20/mo) bundle generous monthly usage. Breakeven analysis vs API pricing:

| Usage pattern | Claude Pro vs Claude API | ChatGPT Plus vs GPT-5.5 API | |---|---|---| | 5 light sessions/week | Pro wins by ~3x | Plus wins by ~3x | | 20 light sessions/week | Pro wins by ~12x | Plus wins by ~12x | | 5 deep sessions/week | Pro wins by ~5x | Plus wins by ~5x | | 20 deep sessions/week | Pro wins by ~25x | Plus wins by ~25x |

For anything beyond casual use, subscription pricing dominates API pricing for ChatGPT and Claude. For DeepSeek / Kimi / Qwen, API stays cheap enough that subscriptions are less of a slam-dunk.

Where each model is the right choice

After all the cost math:

Default to Claude (Opus or Sonnet)

English-language deep reasoning
Multi-step planning, code generation, judgment calls
Anything where output quality dominates input cost
MCP and Skills ecosystem matters

Default to DeepSeek R2

Chinese-language source material
High-volume monitoring / batch jobs
Cost-sensitive personal research
Anywhere reasoning quality is "good enough" not "frontier"

Default to GPT-5.5

Heavy browse / Deep Research usage
ChatGPT Plus subscription you already pay for
OpenAI ecosystem tools (Code Interpreter, DALL-E)
Quick conversational tasks

Default to Gemini 2

Need long context at lower cost than Claude
Already in Google Cloud ecosystem
NotebookLM workflows

Default to Kimi K2 / Qwen 3

Chinese-language workflows where DeepSeek doesn't fit (latency, quotas)
Code generation in Chinese-context environments

Tokenization choice cancels out price differences

A surprising practical finding: HTML input vs Markdown input is a ~40% cost variable. So:

Claude Opus 4.7 with Markdown input ≈ GPT-5.5 with HTML input (roughly same dollar cost)
DeepSeek R2 with HTML input ≈ DeepSeek with Markdown but using fewer queries

The takeaway: before optimizing model choice, optimize input format. Feeding HTML to Claude is paying Claude prices for a GPT-5.5 result. Feeding Markdown to GPT-5.5 is paying GPT-5.5 prices for near-Claude results.

See HTML vs Markdown token test for the controlled comparison and markdown tokenization deep dive for the why.

A practical multi-model setup

What I actually run:

Claude Code subscription for development work and most interactive research (Sonnet for most things, Opus when stuck)
DeepSeek R2 API for automated Chinese-content monitoring (weekly cron, ~$2-3/mo)
ChatGPT Plus for casual conversation + DALL-E + Code Interpreter (legacy habit, ~$20/mo)
No Gemini (I don't use NotebookLM enough to justify)

Total monthly cost: ~$60. Replaces maybe $300-500/mo of pure API usage if I tried to run everything via pay-as-you-go.

The honest summary

For most knowledge workers in 2026:

Pick a subscription model (Claude Pro/Max or ChatGPT Plus) for daily interactive work
Add DeepSeek R2 API for Chinese-source workflows and cost-sensitive batch jobs
Use Markdown input religiously — that single choice cuts effective cost by 40%
Don't chase per-token savings if it costs you a Claude/GPT subscription's quality on critical tasks

The cheapest model is rarely the right answer. The right model + right input format + right caching is.

Install

Web2MD on the Chrome Web Store →

Free tier: 3 conversions/day. Pro at $9/mo unlocks unlimited + bulk export + token estimates so you can budget per session.

Claude vs GPT-5.5 vs DeepSeek R2 Token Costs: Real Numbers for Research Workflows (June 2026)

Claude vs GPT-5.5 vs DeepSeek R2 Token Costs: Real Numbers for Research Workflows (June 2026)

Current input/output pricing (June 2026)

Cost per research session (real numbers)

Workflow 1: Light research (5 webpages, 4 questions)

Workflow 2: Deep research (30 webpages, 8 questions)

Workflow 3: Chinese-content research (40 articles from 小红书/微信/知乎)

Workflow 4: Daily monitoring (run automated, 1 session per day for 30 days)

Workflow 5: Subscription vs API breakeven

Where each model is the right choice

Default to Claude (Opus or Sonnet)

Default to DeepSeek R2

Default to GPT-5.5

Default to Gemini 2

Default to Kimi K2 / Qwen 3

Tokenization choice cancels out price differences

A practical multi-model setup

The honest summary

Install

Related Articles

Extend Perplexity Research With Your Sources

".md This Page": How to Turn the Page You're On Into Markdown Instantly

r.jina.ai URL Prefix: How Jina Reader Works (and When It Fails) — 2026 Guide

Most Read

Latest Articles