Free Tool · No signup · Updated 2026-04-28

AI API Cost Calculator

Compare monthly API spend across Claude 4.6 Opus / Sonnet / Haiku, GPT-5 / GPT-5 Mini / GPT-4.1, Gemini 2.5 Pro / Flash, Llama 4 Maverick, and DeepSeek V3.5. Includes prompt caching savings.

Input tokens / callOutput tokens / call# of calls / monthCache hit rate (%)

Spread across 10 frontier models: cheapest = $0.965 (Llama 4 Maverick), most expensive = $67.50 (Claude 4.6 Opus), range = $66.54/month.

Model	Input	Output	Per call	Monthly total
Llama 4 Maverick Meta (via Together) · 1M ctx	$0.27/M	$0.85/M	$0.00096	$0.965
DeepSeek V3.5 DeepSeek · 128K ctx	$0.27/M	$1.10/M	$0.00109	$1.09
Claude Haiku 4.5 Anthropic · 200K ctx	$0.25/M	$1.25/M	$0.00113	$1.13
GPT-5 Mini OpenAI · 1M ctx	$0.30/M	$1.20/M	$0.00120	$1.20
Gemini 2.5 Flash Google · 1M ctx	$0.30/M	$2.50/M	$0.00185	$1.85
Gemini 2.5 Pro Google · 2M ctx	$1.25/M	$10.00/M	$0.00750	$7.50
GPT-4.1 OpenAI · 1M ctx	$2.00/M	$8.00/M	$0.00800	$8.00
Claude 4.6 Sonnet Anthropic · 1M ctx	$3.00/M	$15.00/M	$0.0135	$13.50
GPT-5 OpenAI · 1M ctx	$5.00/M	$20.00/M	$0.02	$20.00
Claude 4.6 Opus Anthropic · 1M ctx	$15.00/M	$75.00/M	$0.0675	$67.50

Pricing verified 2026-04-28 from each vendor's public API page. Cache pricing applies only to repeated input tokens within the cache TTL (typically 5 min – 1 hr). Reasoning tokens (Claude extended thinking, OpenAI o-series) bill as output.

How to estimate your real-world tokens

Input tokens: ~750 tokens per 1000 English words. System prompts + retrieved context typically dominate.
Output tokens: usually 10-30% of input for chat, can be 200%+ for code generation or extended-thinking modes.
Cache hit rate: 0% if every request is unique (chat). 50-90% if you reuse a system prompt or document across many calls.
Calls per month: daily users × interactions per user × 30. Typical SaaS chat = 100-1000 per active user.

Pricing notes

Prices verified 2026-04-28 against each vendor's public API pricing page.
Cache prices apply only inside each vendor's cache TTL (typically 5 min, longer for some).
Reasoning tokens (Claude extended thinking, OpenAI o-series, DeepSeek-R1) bill as output. Add 30-300% to output budget for reasoning workflows.
Some models charge differently above context thresholds (e.g., Gemini doubles past 200K). Check vendor docs.
Open-source models (Llama, DeepSeek, Qwen) priced via Together, Groq, or Fireworks — vendor pricing varies, we use the cheapest.

AI API Cost Calculator

How to estimate your real-world tokens

Pricing notes

Related reading