Free Tool · No signup · Updated 2026-04-28

AI API Cost Calculator

Compare monthly API spend across Claude 4.6 Opus / Sonnet / Haiku, GPT-5 / GPT-5 Mini / GPT-4.1, Gemini 2.5 Pro / Flash, Llama 4 Maverick, and DeepSeek V3.5. Includes prompt caching savings.

Spread across 10 frontier models: cheapest = $0.965 (Llama 4 Maverick), most expensive = $67.50 (Claude 4.6 Opus), range = $66.54/month.
ModelInputOutputPer callMonthly total
Llama 4 Maverick
Meta (via Together) · 1M ctx
$0.27/M$0.85/M$0.00096$0.965
DeepSeek V3.5
DeepSeek · 128K ctx
$0.27/M$1.10/M$0.00109$1.09
Claude Haiku 4.5
Anthropic · 200K ctx
$0.25/M$1.25/M$0.00113$1.13
GPT-5 Mini
OpenAI · 1M ctx
$0.30/M$1.20/M$0.00120$1.20
Gemini 2.5 Flash
Google · 1M ctx
$0.30/M$2.50/M$0.00185$1.85
Gemini 2.5 Pro
Google · 2M ctx
$1.25/M$10.00/M$0.00750$7.50
GPT-4.1
OpenAI · 1M ctx
$2.00/M$8.00/M$0.00800$8.00
Claude 4.6 Sonnet
Anthropic · 1M ctx
$3.00/M$15.00/M$0.0135$13.50
GPT-5
OpenAI · 1M ctx
$5.00/M$20.00/M$0.02$20.00
Claude 4.6 Opus
Anthropic · 1M ctx
$15.00/M$75.00/M$0.0675$67.50

Pricing verified 2026-04-28 from each vendor's public API page. Cache pricing applies only to repeated input tokens within the cache TTL (typically 5 min – 1 hr). Reasoning tokens (Claude extended thinking, OpenAI o-series) bill as output.

How to estimate your real-world tokens

  • Input tokens: ~750 tokens per 1000 English words. System prompts + retrieved context typically dominate.
  • Output tokens: usually 10-30% of input for chat, can be 200%+ for code generation or extended-thinking modes.
  • Cache hit rate: 0% if every request is unique (chat). 50-90% if you reuse a system prompt or document across many calls.
  • Calls per month: daily users × interactions per user × 30. Typical SaaS chat = 100-1000 per active user.

Pricing notes

  • Prices verified 2026-04-28 against each vendor's public API pricing page.
  • Cache prices apply only inside each vendor's cache TTL (typically 5 min, longer for some).
  • Reasoning tokens (Claude extended thinking, OpenAI o-series, DeepSeek-R1) bill as output. Add 30-300% to output budget for reasoning workflows.
  • Some models charge differently above context thresholds (e.g., Gemini doubles past 200K). Check vendor docs.
  • Open-source models (Llama, DeepSeek, Qwen) priced via Together, Groq, or Fireworks — vendor pricing varies, we use the cheapest.

Related reading

Frequently Asked Questions

How does the AI cost calculator work?
Enter the number of input and output tokens for a typical request, and the calculator estimates per-request and monthly cost based on each model’s current published API pricing. All math happens in your browser.
Which models does it support?
Major LLM API providers — OpenAI (GPT-5, GPT-5.1), Anthropic (Claude 4.7+, Claude 4.8), Google (Gemini 2.5, Gemini 3 Pro), and open-weight options (Llama 4, DeepSeek-R1, Mistral). Pricing is updated regularly; always verify against the provider’s official pricing page before procurement.
What is a token in the context of LLMs?
A token is roughly 0.75 of an English word. A typical paragraph (about 75 words) is around 100 tokens. Most LLM APIs price per million input tokens and per million output tokens, often at different rates.
Are the pricing numbers up to date?
We update pricing regularly but provider rates change. Always verify against the provider’s official pricing page before making procurement decisions. Many providers also offer volume discounts not reflected in public list pricing.