LLM API pricing & cost calculator.
Enter your token counts and request volume. See cost per request, monthly spend, output speed, and context window side-by-side across 24 frontier and open-weight models from OpenAI, Anthropic, Google, xAI, DeepSeek, and more. Sort, filter, and compare before you commit to an API.
Tier
Provider
Gemma 4 31B
$0.0004/req · $10.65/mo
Gemini 3 Flash
300 tok/s · ~1.7s for 500 tokens
| Model | Context | Cost / mo | ||||
|---|---|---|---|---|---|---|
Gemma 4 31BOSS | $0.120 | $0.350 | 262K | 180t/s ~2.8s gen | $0.0004 | $10.65 |
Llama 4 MaverickOSS Meta | $0.150 | $0.600 | 1.0M | 140t/s ~3.6s gen | $0.0005 | $15.75 |
DeepSeek V3OSS DeepSeek | $0.200 | $0.770 | 164K | 55t/s ~9.1s gen | $0.0007 | $20.55 |
MiniMax M2.7OSS MiniMax | $0.250 | $1.00 | 205K | 100t/s ~5.0s gen | $0.0009 | $26.25 |
MiniMax M3OSS MiniMax | $0.300 | $1.20 | 1M | 59t/s ~8.5s gen | $0.0010 | $31.50 |
DeepSeek V4 ProOSS DeepSeek | $0.435 | $0.870 | 1.0M | 60t/s ~8.3s gen | $0.0011 | $32.63 |
Kimi K2.5OSS Moonshot | $0.375 | $2.02 | 262K | 70t/s ~7.1s gen | $0.0016 | $47.25 |
Gemini 3 Flash | $0.500 | $3.00 | 1.0M | 300t/s ~1.7s gen | $0.0023 | $67.50 |
GLM-5OSS Zhipu | $1.00 | $3.20 | 200K | 90t/s ~5.6s gen | $0.0031 | $93.00 |
Grok 4.3 xAI | $1.25 | $2.50 | 1M | 110t/s ~4.5s gen | $0.0031 | $93.75 |
Qwen 3.7 MaxOSS Alibaba | $1.25 | $3.75 | 1M | 60t/s ~8.3s gen | $0.0037 | $112.50 |
OpenAI o4-mini OpenAI | $1.10 | $4.40 | 200K | 120t/s ~4.2s gen | $0.0039 | $115.50 |
OpenAI o3 OpenAI | $2.00 | $8.00 | 200K | 40t/s ~12.5s gen | $0.0070 | $210.00 |
GPT-4o OpenAI | $2.50 | $10.00 | 128K | 80t/s ~6.3s gen | $0.0088 | $262.50 |
Gemini 3.1 Pro | $2.00 | $12.00 | 1.0M | 120t/s ~4.2s gen | $0.0090 | $270.00 |
Gemini 3 Pro | $2.00 | $12.00 | 1M | 130t/s ~3.8s gen | $0.0090 | $270.00 |
GPT-5.2 OpenAI | $1.75 | $14.00 | 400K | 75t/s ~6.7s gen | $0.0096 | $288.75 |
GPT-5.4 OpenAI | $2.50 | $15.00 | 1.1M | 60t/s ~8.3s gen | $0.011 | $337.50 |
Grok 4 xAI | $3.00 | $15.00 | 2M | 75t/s ~6.7s gen | $0.012 | $360.00 |
Claude Sonnet 4.6 Anthropic | $3.00 | $15.00 | 1M | 65t/s ~7.7s gen | $0.012 | $360.00 |
Claude Opus 4.7 Anthropic | $5.00 | $25.00 | 1M | 40t/s ~12.5s gen | $0.020 | $600.00 |
GPT-5.5 OpenAI | $5.00 | $30.00 | 1.1M | 55t/s ~9.1s gen | $0.022 | $675.00 |
Claude Fable 5 Anthropic | $10.00 | $50.00 | 1M | 63t/s ~7.9s gen | $0.040 | $1,200 |
OpenAI o3-pro OpenAI | $20.00 | $80.00 | 200K | 20t/s ~25.0s gen | $0.070 | $2,100 |
Costs are pre-tax list API prices. Caching assumes the full input prefix is eligible. Speeds are approximate sustained throughput for relative comparison only.
How to read these numbers
Cost per request
The headline metric. Mix of input and output token spend for one call at your chosen workload. Sort ascending to find your cheapest option instantly.
Prompt caching
Many providers discount repeated input prefixes 50-90%. Toggle it on if your app reuses system prompts, few-shot examples, or retrieved context.
Output speed
Tokens per second drives time-to-first-token and total gen time. Flash-tier models run 3-6x faster than frontier reasoning models.
Go deeper
AI model benchmarks
Quality scores (MMLU, HumanEval, GPQA) to pair with price.
OpenBest LLM API providers
Curated shortlist of API providers with routing and reliability.
OpenCoding agent comparison
Which coding agent pairs best with each model tier.
OpenOpen-source AI repos
Self-hostable models and agent frameworks.
OpenAI tool finder
Answer a few questions to get a model recommendation.
OpenManaged AI stack setup
Let us wire your production AI stack end-to-end.
OpenLLM pricing FAQ
How is the cost per request calculated?
Cost per request = (input tokens / 1M * input price) + (output tokens / 1M * output price). The monthly estimate multiplies that by requests/day * 30. Toggle prompt caching to swap in the cached input price for providers that support repeated-prompt-prefix discounts.
Which is the cheapest LLM API in 2026?
Among hosted frontier-class models, DeepSeek V4 Pro, Gemma 4, and Llama 4 Maverick lead on raw price per token. But the cheapest model for your workload depends on your input/output ratio, volume, and whether latency or quality matters more than headline price.
Does prompt caching actually lower my bill?
Yes, often dramatically. Anthropic caches long prefixes for roughly 90% off, OpenAI and Google offer 50-75% discounts on repeated input. The calculator lets you toggle caching to see the impact, but only repeated identical prefixes qualify.
Are open-weight models really cheaper?
At the margin yes, but only at scale. Self-hosting removes per-token API markup but you pay for GPUs, ops, and scaling. The hosted open-weight prices shown here are a fair baseline; true self-host cost depends on your utilization.
Why does output cost more than input?
Generating tokens is compute-bound (autoregressive decoding), while reading input is a single forward pass. Providers price output 3-6x higher to reflect the extra GPU time per generated token.