What Is Per-Token Pricing in AI? Definition and Benchmarks

Definition

Per-Token Pricing: A consumption based pricing model where AI inference APIs bill per million tokens consumed, with separate rates for input tokens (the prompt and any context fed to the model) and output tokens (what the model generates). Tokens are subword units roughly 4 characters or 0.75 words each. Standard across all major LLM providers including OpenAI, Anthropic, and Google.

Per token pricing is the dominant unit for AI inference because it maps closer to compute cost than any other metric. A token represents a discrete forward pass through the model's attention layers. Input tokens require one pass each. Output tokens require sequential generation, which is why output rates run 3 to 5 times higher than input rates. The metric exposes the raw cost driver: long prompts and long generations both cost more, but generations cost more per token.

The discount mechanics in enterprise AI contracts cluster around three levers. Cached input pricing applies when the same context is fed to the model repeatedly, with discounts of 50 to 90 percent off standard input rates depending on cache hit rate. Committed throughput tiers buy 20 to 40 percent off pay as you go in exchange for reserved capacity at the API endpoint. Batch APIs, which return results within 24 hours rather than real time, typically offer 50 percent off standard pricing.

Benchmark your per token AI spend

We benchmark per token pricing, committed throughput discounts, and cached input rates across 940 enterprise AI contracts. Send us your usage profile and we return rate intelligence in 48 hours.

Submit Your Proposal →

Negotiation levers on enterprise AI contracts

Enterprise AI contracts that exceed 10 million tokens per minute committed throughput qualify for the deepest discount tier, typically 35 to 55 percent off public pricing. Volume tier breakpoints sit at 1 million, 10 million, and 100 million committed tokens per minute. Cached input pricing should be negotiated as a separate line item, not bundled into the headline rate. Output caps written into the API contract prevent runaway generation costs from prompt injection or buggy agent loops.

For related vocabulary, see the per seat pricing definition, the consumption based pricing definition, and the per transaction pricing definition. The glossary hub covers the broader pricing vocabulary. For enterprise GenAI cost analysis, see the enterprise GenAI cost benchmark 2026 and the OpenAI vendor profile.

Frequently asked questions

What is per-token pricing?

Per token pricing bills AI model usage in tokens, where a token equals roughly 0.75 English words on average. APIs price separately for input tokens and output tokens, and outputs cost 3 to 5 times more than inputs. OpenAI, Anthropic, and Google all use this metric.

What is a typical price per million tokens?

In 2026, frontier models price at 2 to 15 dollars per million input tokens and 8 to 75 dollars per million output tokens, depending on tier. Lower tier models like GPT-4o mini or Claude Haiku price at 0.15 to 1.50 dollars per million input. Volume discounts apply at committed usage above 10 million tokens per minute.

How can enterprises control per-token spend?

Three levers: prompt compression to cut input tokens by 30 to 60 percent, output length caps to prevent runaway generations, and committed throughput discounts that lock in 20 to 40 percent off pay as you go. Enterprise contracts also include cached input pricing at 50 to 90 percent off standard for repeated context.

Benchmark your contract

Send us the proposal or renewal. We return discount, term, and unit price intelligence in 48 hours.

Submit Your Proposal →