Skip to main content
Dev tool · June 15, 2026

LLM API pricing & cost calculator.

Enter your token counts and request volume. See cost per request, monthly spend, output speed, and context window side-by-side across 24 frontier and open-weight models from OpenAI, Anthropic, Google, xAI, DeepSeek, and more. Sort, filter, and compare before you commit to an API.

24 Models tracked $0.10/1M Cheapest in June 15, 2026 Updated 10 Providers

Tier

Provider

Cheapest per request

Gemma 4 31B

$0.0004/req · $10.65/mo

Fastest output

Gemini 3 Flash

300 tok/s · ~1.7s for 500 tokens

ModelContextCost / mo
Gemma 4 31BOSS

Google

$0.120$0.350262K
180t/s
~2.8s gen
$0.0004$10.65
Llama 4 MaverickOSS

Meta

$0.150$0.6001.0M
140t/s
~3.6s gen
$0.0005$15.75
DeepSeek V3OSS

DeepSeek

$0.200$0.770164K
55t/s
~9.1s gen
$0.0007$20.55
MiniMax M2.7OSS

MiniMax

$0.250$1.00205K
100t/s
~5.0s gen
$0.0009$26.25
MiniMax M3OSS

MiniMax

$0.300$1.201M
59t/s
~8.5s gen
$0.0010$31.50
DeepSeek V4 ProOSS

DeepSeek

$0.435$0.8701.0M
60t/s
~8.3s gen
$0.0011$32.63
Kimi K2.5OSS

Moonshot

$0.375$2.02262K
70t/s
~7.1s gen
$0.0016$47.25
Gemini 3 Flash

Google

$0.500$3.001.0M
300t/s
~1.7s gen
$0.0023$67.50
GLM-5OSS

Zhipu

$1.00$3.20200K
90t/s
~5.6s gen
$0.0031$93.00
Grok 4.3

xAI

$1.25$2.501M
110t/s
~4.5s gen
$0.0031$93.75
Qwen 3.7 MaxOSS

Alibaba

$1.25$3.751M
60t/s
~8.3s gen
$0.0037$112.50
OpenAI o4-mini

OpenAI

$1.10$4.40200K
120t/s
~4.2s gen
$0.0039$115.50
OpenAI o3

OpenAI

$2.00$8.00200K
40t/s
~12.5s gen
$0.0070$210.00
GPT-4o

OpenAI

$2.50$10.00128K
80t/s
~6.3s gen
$0.0088$262.50
Gemini 3.1 Pro

Google

$2.00$12.001.0M
120t/s
~4.2s gen
$0.0090$270.00
Gemini 3 Pro

Google

$2.00$12.001M
130t/s
~3.8s gen
$0.0090$270.00
GPT-5.2

OpenAI

$1.75$14.00400K
75t/s
~6.7s gen
$0.0096$288.75
GPT-5.4

OpenAI

$2.50$15.001.1M
60t/s
~8.3s gen
$0.011$337.50
Grok 4

xAI

$3.00$15.002M
75t/s
~6.7s gen
$0.012$360.00
Claude Sonnet 4.6

Anthropic

$3.00$15.001M
65t/s
~7.7s gen
$0.012$360.00
Claude Opus 4.7

Anthropic

$5.00$25.001M
40t/s
~12.5s gen
$0.020$600.00
GPT-5.5

OpenAI

$5.00$30.001.1M
55t/s
~9.1s gen
$0.022$675.00
Claude Fable 5

Anthropic

$10.00$50.001M
63t/s
~7.9s gen
$0.040$1,200
OpenAI o3-pro

OpenAI

$20.00$80.00200K
20t/s
~25.0s gen
$0.070$2,100

Costs are pre-tax list API prices. Caching assumes the full input prefix is eligible. Speeds are approximate sustained throughput for relative comparison only.

How to read these numbers

Cost per request

The headline metric. Mix of input and output token spend for one call at your chosen workload. Sort ascending to find your cheapest option instantly.

Prompt caching

Many providers discount repeated input prefixes 50-90%. Toggle it on if your app reuses system prompts, few-shot examples, or retrieved context.

Output speed

Tokens per second drives time-to-first-token and total gen time. Flash-tier models run 3-6x faster than frontier reasoning models.

LLM pricing FAQ

How is the cost per request calculated?

Cost per request = (input tokens / 1M * input price) + (output tokens / 1M * output price). The monthly estimate multiplies that by requests/day * 30. Toggle prompt caching to swap in the cached input price for providers that support repeated-prompt-prefix discounts.

Which is the cheapest LLM API in 2026?

Among hosted frontier-class models, DeepSeek V4 Pro, Gemma 4, and Llama 4 Maverick lead on raw price per token. But the cheapest model for your workload depends on your input/output ratio, volume, and whether latency or quality matters more than headline price.

Does prompt caching actually lower my bill?

Yes, often dramatically. Anthropic caches long prefixes for roughly 90% off, OpenAI and Google offer 50-75% discounts on repeated input. The calculator lets you toggle caching to see the impact, but only repeated identical prefixes qualify.

Are open-weight models really cheaper?

At the margin yes, but only at scale. Self-hosting removes per-token API markup but you pay for GPUs, ops, and scaling. The hosted open-weight prices shown here are a fair baseline; true self-host cost depends on your utilization.

Why does output cost more than input?

Generating tokens is compute-bound (autoregressive decoding), while reading input is a single forward pass. Providers price output 3-6x higher to reflect the extra GPU time per generated token.