AI API Cost Calculator
Compare monthly API spend across Claude 4.6 Opus / Sonnet / Haiku, GPT-5 / GPT-5 Mini / GPT-4.1, Gemini 2.5 Pro / Flash, Llama 4 Maverick, and DeepSeek V3.5. Includes prompt caching savings.
Spread across 10 frontier models: cheapest = $0.965 (Llama 4 Maverick), most expensive = $67.50 (Claude 4.6 Opus), range = $66.54/month.
| Model | Input | Output | Per call | Monthly total |
|---|---|---|---|---|
Llama 4 Maverick Meta (via Together) · 1M ctx | $0.27/M | $0.85/M | $0.00096 | $0.965 |
DeepSeek V3.5 DeepSeek · 128K ctx | $0.27/M | $1.10/M | $0.00109 | $1.09 |
Claude Haiku 4.5 Anthropic · 200K ctx | $0.25/M | $1.25/M | $0.00113 | $1.13 |
GPT-5 Mini OpenAI · 1M ctx | $0.30/M | $1.20/M | $0.00120 | $1.20 |
Gemini 2.5 Flash Google · 1M ctx | $0.30/M | $2.50/M | $0.00185 | $1.85 |
Gemini 2.5 Pro Google · 2M ctx | $1.25/M | $10.00/M | $0.00750 | $7.50 |
GPT-4.1 OpenAI · 1M ctx | $2.00/M | $8.00/M | $0.00800 | $8.00 |
Claude 4.6 Sonnet Anthropic · 1M ctx | $3.00/M | $15.00/M | $0.0135 | $13.50 |
GPT-5 OpenAI · 1M ctx | $5.00/M | $20.00/M | $0.02 | $20.00 |
Claude 4.6 Opus Anthropic · 1M ctx | $15.00/M | $75.00/M | $0.0675 | $67.50 |
Pricing verified 2026-04-28 from each vendor's public API page. Cache pricing applies only to repeated input tokens within the cache TTL (typically 5 min – 1 hr). Reasoning tokens (Claude extended thinking, OpenAI o-series) bill as output.
How to estimate your real-world tokens
- Input tokens: ~750 tokens per 1000 English words. System prompts + retrieved context typically dominate.
- Output tokens: usually 10-30% of input for chat, can be 200%+ for code generation or extended-thinking modes.
- Cache hit rate: 0% if every request is unique (chat). 50-90% if you reuse a system prompt or document across many calls.
- Calls per month: daily users × interactions per user × 30. Typical SaaS chat = 100-1000 per active user.
Pricing notes
- Prices verified 2026-04-28 against each vendor's public API pricing page.
- Cache prices apply only inside each vendor's cache TTL (typically 5 min, longer for some).
- Reasoning tokens (Claude extended thinking, OpenAI o-series, DeepSeek-R1) bill as output. Add 30-300% to output budget for reasoning workflows.
- Some models charge differently above context thresholds (e.g., Gemini doubles past 200K). Check vendor docs.
- Open-source models (Llama, DeepSeek, Qwen) priced via Together, Groq, or Fireworks — vendor pricing varies, we use the cheapest.