Free Tool · No signup · Updated 2026-04-28

AI API Cost Calculator

Compare monthly API spend across Claude 4.6 Opus / Sonnet / Haiku, GPT-5 / GPT-5 Mini / GPT-4.1, Gemini 2.5 Pro / Flash, Llama 4 Maverick, and DeepSeek V3.5. Includes prompt caching savings.

Spread across 10 frontier models: cheapest = $0.965 (Llama 4 Maverick), most expensive = $67.50 (Claude 4.6 Opus), range = $66.54/month.
ModelInputOutputPer callMonthly total
Llama 4 Maverick
Meta (via Together) · 1M ctx
$0.27/M$0.85/M$0.00096$0.965
DeepSeek V3.5
DeepSeek · 128K ctx
$0.27/M$1.10/M$0.00109$1.09
Claude Haiku 4.5
Anthropic · 200K ctx
$0.25/M$1.25/M$0.00113$1.13
GPT-5 Mini
OpenAI · 1M ctx
$0.30/M$1.20/M$0.00120$1.20
Gemini 2.5 Flash
Google · 1M ctx
$0.30/M$2.50/M$0.00185$1.85
Gemini 2.5 Pro
Google · 2M ctx
$1.25/M$10.00/M$0.00750$7.50
GPT-4.1
OpenAI · 1M ctx
$2.00/M$8.00/M$0.00800$8.00
Claude 4.6 Sonnet
Anthropic · 1M ctx
$3.00/M$15.00/M$0.0135$13.50
GPT-5
OpenAI · 1M ctx
$5.00/M$20.00/M$0.02$20.00
Claude 4.6 Opus
Anthropic · 1M ctx
$15.00/M$75.00/M$0.0675$67.50

Pricing verified 2026-04-28 from each vendor's public API page. Cache pricing applies only to repeated input tokens within the cache TTL (typically 5 min – 1 hr). Reasoning tokens (Claude extended thinking, OpenAI o-series) bill as output.

How to estimate your real-world tokens

  • Input tokens: ~750 tokens per 1000 English words. System prompts + retrieved context typically dominate.
  • Output tokens: usually 10-30% of input for chat, can be 200%+ for code generation or extended-thinking modes.
  • Cache hit rate: 0% if every request is unique (chat). 50-90% if you reuse a system prompt or document across many calls.
  • Calls per month: daily users × interactions per user × 30. Typical SaaS chat = 100-1000 per active user.

Pricing notes

  • Prices verified 2026-04-28 against each vendor's public API pricing page.
  • Cache prices apply only inside each vendor's cache TTL (typically 5 min, longer for some).
  • Reasoning tokens (Claude extended thinking, OpenAI o-series, DeepSeek-R1) bill as output. Add 30-300% to output budget for reasoning workflows.
  • Some models charge differently above context thresholds (e.g., Gemini doubles past 200K). Check vendor docs.
  • Open-source models (Llama, DeepSeek, Qwen) priced via Together, Groq, or Fireworks — vendor pricing varies, we use the cheapest.

Related reading