The guide
What is an AI API Cost Calculator?
An AI API cost calculator turns a handful of assumptions — how many people use your product, how often they trigger the AI, and how much text each request moves — into a concrete monthly budget. AI providers bill by the token, not by the request, which makes invoices notoriously hard to predict from a price page alone. The calculator above does the arithmetic for you: requests × tokens × price per million, minus cache savings, plus infrastructure and a safety buffer.
It exists because the most common AI budgeting mistake is estimating "a few cents per request, that's nothing" and discovering at scale that output tokens, retries and growing context windows multiplied the real bill several times over.
How AI API pricing works
Almost every LLM provider uses the same billing model: a price per 1 million input tokens and a different, higher price per 1 million output tokens. A token is roughly ¾ of an English word, so 1,000 tokens ≈ 750 words. Your cost per request is:
- (input tokens ÷ 1,000,000) × input price, plus
- (output tokens ÷ 1,000,000) × output price.
Multiply by your monthly request volume and you have your base token bill. On top of that sit optional charges — image generation, audio, fine-tuned model surcharges — and discounts, most importantly prompt caching and batch APIs.
Input tokens vs output tokens
Input tokens are everything you send: the system prompt, user message, retrieved documents and conversation history. Output tokens are everything the model writes back. They are metered separately, and the split matters: a chatbot that carries long conversation history is input-heavy, while a content generator that writes long articles from short briefs is output-heavy. Knowing your ratio tells you which price column on the provider's page actually drives your bill.
Why output tokens usually cost more
Generating text is more expensive for the provider than reading it. A model processes your whole input in parallel passes, but must produce output one token at a time, each step requiring a full forward pass through the network. That sequential work ties up GPUs longer, so providers typically price output tokens 3–5× above input tokens. Practical consequence: capping response length and asking for concise formats is one of the cheapest optimizations available.
How caching can reduce AI costs
Most AI apps resend the same content on every request: the system prompt, tool definitions, a knowledge-base excerpt. Prompt caching lets the provider store that repeated prefix and bill it at a reduced rate — often 50–90% cheaper than regular input tokens. If your requests share a large stable prefix, set the "cached requests" slider in the calculator to your expected cache-hit rate and watch the input cost drop. Two caveats: only the repeated part of the prompt is discounted, and cache entries expire, so sporadic traffic benefits less.
Common hidden costs of AI automation
Token prices are the visible tip. Real deployments also pay for:
- Embeddings and a vector database if you use retrieval (RAG) — indexing and hosting are billed separately.
- Retries and errors — failed or timed-out calls still consume tokens.
- Monitoring and logging — observability tools or plain log storage.
- Hosting — servers or serverless functions wrapping the API.
- Workflow tools — Zapier/Make/n8n task quotas add up fast at volume.
- Human review — the payroll time spent checking AI output is a real cost most budgets forget.
- Prompt growth — prompts get longer over time as teams add instructions and context; budgets rarely shrink.
The optional cost fields in the calculator exist precisely so these don't surprise you.
Example cost estimates
Example: SaaS chatbot cost estimate
Imagine a customer-support chatbot in a B2B SaaS: 2,000 monthly active users, each asking about 6 questions a day, 20 business days a month — 240,000 requests. With ~800 input tokens (system prompt + history + retrieved help articles) and ~250 output tokens per answer, that's 192M input and 60M output tokens monthly. On a mid-tier model, the token bill lands in the hundreds to low thousands of dollars — and a 40% cache-hit rate on the shared system prompt can cut the input portion by a third. Plug these numbers into the calculator above to see the full breakdown with your own model choice.
Example: internal workflow automation cost estimate
Now an internal document-processing workflow: 50 employees, ~20 automated runs each per working day (22 days), summarizing contracts of ~3,000 input tokens into ~400-token summaries. That's only 22,000 requests but 66M input tokens — input-heavy workloads like this often run beautifully on small, cheap models, since summarization is a task where mini-tier quality is usually sufficient. The same volume on a flagship model could cost 10–20× more for little visible gain. This is the single most common overspend we see.
How to reduce AI API costs
- Right-size the model. Test the cheapest model that might work before defaulting to the flagship.
- Cache aggressively. Stable system prompts and shared context should hit the cache, not full-price input.
- Trim the prompt. Cut redundant instructions; summarize old chat history instead of resending it.
- Cap output length. Set max tokens and ask for concise formats — output is your most expensive token.
- Route by difficulty. Send easy requests to a cheap model, escalate hard ones to a premium model.
- Batch non-urgent work. Batch APIs typically discount 50% for jobs that can wait.
- Monitor from day one. Track cost per feature and per customer; spikes you can see are spikes you can fix.
When to choose a cheaper model vs a premium model
Choose a cheaper model when the task is classification, extraction, summarization, routing, or formulaic writing — high-volume tasks with easy-to-verify output. Choose a premium model when errors are expensive: complex reasoning, code generation, legal or medical drafting, agent workflows with many steps, or anything customer-facing where quality is the product. A useful rule: prototype on a premium model to prove the feature works, then walk down the price ladder until quality measurably degrades — and stop one step above that.
AI API cost checklist before launching
- Verified current pricing on the provider's official page (not a blog post).
- Measured real token counts from prototype logs, not guesses.
- Added a 20–30% safety margin for retries, spikes and prompt growth.
- Included infrastructure: vector DB, embeddings, hosting, monitoring, automation tools.
- Set billing alerts and hard spending limits in the provider dashboard.
- Know your cost per user and per 1,000 requests — and your break-even user count.
- Have a fallback plan (smaller model, rate limiting) if costs exceed projections.