Per token pricing bills AI model usage by counting tokens consumed, where one token equals roughly 0.75 English words. In 2026, frontier models from OpenAI, Anthropic, and Google price at 2 to 15 dollars per million input tokens and 8 to 75 dollars per million output tokens, with outputs running 3 to 5 times the input rate. Enterprise spend control rests on prompt compression, output caps, and committed throughput tiers that buy 20 to 40 percent off pay as you go, based on our benchmark of 940 AI infrastructure contracts signed in 2025.
Per-Token Pricing: A consumption based pricing model where AI inference APIs bill per million tokens consumed, with separate rates for input tokens (the prompt and any context fed to the model) and output tokens (what the model generates). Tokens are subword units roughly 4 characters or 0.75 words each. Standard across all major LLM providers including OpenAI, Anthropic, and Google.
Per token pricing is the dominant unit for AI inference because it maps closer to compute cost than any other metric. A token represents a discrete forward pass through the model's attention layers. Input tokens require one pass each. Output tokens require sequential generation, which is why output rates run 3 to 5 times higher than input rates. The metric exposes the raw cost driver: long prompts and long generations both cost more, but generations cost more per token.
The discount mechanics in enterprise AI contracts cluster around three levers. Cached input pricing applies when the same context is fed to the model repeatedly, with discounts of 50 to 90 percent off standard input rates depending on cache hit rate. Committed throughput tiers buy 20 to 40 percent off pay as you go in exchange for reserved capacity at the API endpoint. Batch APIs, which return results within 24 hours rather than real time, typically offer 50 percent off standard pricing.
We benchmark per token pricing, committed throughput discounts, and cached input rates across 940 enterprise AI contracts. Send us your usage profile and we return rate intelligence in 48 hours.
Enterprise AI contracts that exceed 10 million tokens per minute committed throughput qualify for the deepest discount tier, typically 35 to 55 percent off public pricing. Volume tier breakpoints sit at 1 million, 10 million, and 100 million committed tokens per minute. Cached input pricing should be negotiated as a separate line item, not bundled into the headline rate. Output caps written into the API contract prevent runaway generation costs from prompt injection or buggy agent loops.
For related vocabulary, see the per seat pricing definition, the consumption based pricing definition, and the per transaction pricing definition. The glossary hub covers the broader pricing vocabulary. For enterprise GenAI cost analysis, see the enterprise GenAI cost benchmark 2026 and the OpenAI vendor profile.
Per token pricing bills AI model usage in tokens, where a token equals roughly 0.75 English words on average. APIs price separately for input tokens and output tokens, and outputs cost 3 to 5 times more than inputs. OpenAI, Anthropic, and Google all use this metric.
In 2026, frontier models price at 2 to 15 dollars per million input tokens and 8 to 75 dollars per million output tokens, depending on tier. Lower tier models like GPT-4o mini or Claude Haiku price at 0.15 to 1.50 dollars per million input. Volume discounts apply at committed usage above 10 million tokens per minute.
Three levers: prompt compression to cut input tokens by 30 to 60 percent, output length caps to prevent runaway generations, and committed throughput discounts that lock in 20 to 40 percent off pay as you go. Enterprise contracts also include cached input pricing at 50 to 90 percent off standard for repeated context.
Send us the proposal or renewal. We return discount, term, and unit price intelligence in 48 hours.