Dev tool · June 15, 2026

LLM API pricing & cost calculator.

Enter your token counts and request volume. See cost per request, monthly spend, output speed, and context window side-by-side across 24 frontier and open-weight models from OpenAI, Anthropic, Google, xAI, DeepSeek, and more. Sort, filter, and compare before you commit to an API.

24 Models tracked $0.10/1M Cheapest in June 15, 2026 Updated 10 Providers

Input tokens / reqOutput tokens / req

Requests / day

Apply prompt-caching input price

Tier

Provider

Cheapest per request

Gemma 4 31B

$0.0004/req · $10.65/mo

Fastest output

Gemini 3 Flash

300 tok/s · ~1.7s for 500 tokens

Model			Context			Cost / mo
Gemma 4 31BOSS Google	$0.120	$0.350	262K	180t/s ~2.8s gen	$0.0004	$10.65
Llama 4 MaverickOSS Meta	$0.150	$0.600	1.0M	140t/s ~3.6s gen	$0.0005	$15.75
DeepSeek V3OSS DeepSeek	$0.200	$0.770	164K	55t/s ~9.1s gen	$0.0007	$20.55
MiniMax M2.7OSS MiniMax	$0.250	$1.00	205K	100t/s ~5.0s gen	$0.0009	$26.25
MiniMax M3OSS MiniMax	$0.300	$1.20	1M	59t/s ~8.5s gen	$0.0010	$31.50
DeepSeek V4 ProOSS DeepSeek	$0.435	$0.870	1.0M	60t/s ~8.3s gen	$0.0011	$32.63
Kimi K2.5OSS Moonshot	$0.375	$2.02	262K	70t/s ~7.1s gen	$0.0016	$47.25
Gemini 3 Flash Google	$0.500	$3.00	1.0M	300t/s ~1.7s gen	$0.0023	$67.50
GLM-5OSS Zhipu	$1.00	$3.20	200K	90t/s ~5.6s gen	$0.0031	$93.00
Grok 4.3 xAI	$1.25	$2.50	1M	110t/s ~4.5s gen	$0.0031	$93.75
Qwen 3.7 MaxOSS Alibaba	$1.25	$3.75	1M	60t/s ~8.3s gen	$0.0037	$112.50
OpenAI o4-mini OpenAI	$1.10	$4.40	200K	120t/s ~4.2s gen	$0.0039	$115.50
OpenAI o3 OpenAI	$2.00	$8.00	200K	40t/s ~12.5s gen	$0.0070	$210.00
GPT-4o OpenAI	$2.50	$10.00	128K	80t/s ~6.3s gen	$0.0088	$262.50
Gemini 3.1 Pro Google	$2.00	$12.00	1.0M	120t/s ~4.2s gen	$0.0090	$270.00
Gemini 3 Pro Google	$2.00	$12.00	1M	130t/s ~3.8s gen	$0.0090	$270.00
GPT-5.2 OpenAI	$1.75	$14.00	400K	75t/s ~6.7s gen	$0.0096	$288.75
GPT-5.4 OpenAI	$2.50	$15.00	1.1M	60t/s ~8.3s gen	$0.011	$337.50
Grok 4 xAI	$3.00	$15.00	2M	75t/s ~6.7s gen	$0.012	$360.00
Claude Sonnet 4.6 Anthropic	$3.00	$15.00	1M	65t/s ~7.7s gen	$0.012	$360.00
Claude Opus 4.7 Anthropic	$5.00	$25.00	1M	40t/s ~12.5s gen	$0.020	$600.00
GPT-5.5 OpenAI	$5.00	$30.00	1.1M	55t/s ~9.1s gen	$0.022	$675.00
Claude Fable 5 Anthropic	$10.00	$50.00	1M	63t/s ~7.9s gen	$0.040	$1,200
OpenAI o3-pro OpenAI	$20.00	$80.00	200K	20t/s ~25.0s gen	$0.070	$2,100

Costs are pre-tax list API prices. Caching assumes the full input prefix is eligible. Speeds are approximate sustained throughput for relative comparison only.

How to read these numbers

Cost per request

The headline metric. Mix of input and output token spend for one call at your chosen workload. Sort ascending to find your cheapest option instantly.

Prompt caching

Many providers discount repeated input prefixes 50-90%. Toggle it on if your app reuses system prompts, few-shot examples, or retrieved context.

Output speed

Tokens per second drives time-to-first-token and total gen time. Flash-tier models run 3-6x faster than frontier reasoning models.

Go deeper

AI model benchmarks

Quality scores (MMLU, HumanEval, GPQA) to pair with price.

Open

Best LLM API providers

Curated shortlist of API providers with routing and reliability.

Open

Coding agent comparison

Which coding agent pairs best with each model tier.

Open

Open-source AI repos

Self-hostable models and agent frameworks.

Open

AI tool finder

Answer a few questions to get a model recommendation.

Open

Managed AI stack setup

Let us wire your production AI stack end-to-end.

Open

LLM pricing FAQ

How is the cost per request calculated?

Cost per request = (input tokens / 1M * input price) + (output tokens / 1M * output price). The monthly estimate multiplies that by requests/day * 30. Toggle prompt caching to swap in the cached input price for providers that support repeated-prompt-prefix discounts.

Which is the cheapest LLM API in 2026?

Among hosted frontier-class models, DeepSeek V4 Pro, Gemma 4, and Llama 4 Maverick lead on raw price per token. But the cheapest model for your workload depends on your input/output ratio, volume, and whether latency or quality matters more than headline price.

Does prompt caching actually lower my bill?

Yes, often dramatically. Anthropic caches long prefixes for roughly 90% off, OpenAI and Google offer 50-75% discounts on repeated input. The calculator lets you toggle caching to see the impact, but only repeated identical prefixes qualify.

Are open-weight models really cheaper?

At the margin yes, but only at scale. Self-hosting removes per-token API markup but you pay for GPUs, ops, and scaling. The hosted open-weight prices shown here are a fair baseline; true self-host cost depends on your utilization.

Why does output cost more than input?

Generating tokens is compute-bound (autoregressive decoding), while reading input is a single forward pass. Providers price output 3-6x higher to reflect the extra GPU time per generated token.