AI Cost Calc

Free tool · No signup · Runs in your browser

Know your AI API costs before the invoice

Estimate your monthly spend on OpenAI, Claude, Gemini, Mistral or any LLM API. Model token costs, caching, infrastructure and a break-even check — in one place.

  • 12 models, side-by-side
  • Prompt-caching savings
  • CSV export & shareable link

Estimates are for planning purposes only. Always verify current pricing on each provider's official pricing page.

The guide

What is an AI API Cost Calculator?

An AI API cost calculator turns a handful of assumptions — how many people use your product, how often they trigger the AI, and how much text each request moves — into a concrete monthly budget. AI providers bill by the token, not by the request, which makes invoices notoriously hard to predict from a price page alone. The calculator above does the arithmetic for you: requests × tokens × price per million, minus cache savings, plus infrastructure and a safety buffer.

It exists because the most common AI budgeting mistake is estimating "a few cents per request, that's nothing" and discovering at scale that output tokens, retries and growing context windows multiplied the real bill several times over.

How AI API pricing works

Almost every LLM provider uses the same billing model: a price per 1 million input tokens and a different, higher price per 1 million output tokens. A token is roughly ¾ of an English word, so 1,000 tokens ≈ 750 words. Your cost per request is:

Multiply by your monthly request volume and you have your base token bill. On top of that sit optional charges — image generation, audio, fine-tuned model surcharges — and discounts, most importantly prompt caching and batch APIs.

Input tokens vs output tokens

Input tokens are everything you send: the system prompt, user message, retrieved documents and conversation history. Output tokens are everything the model writes back. They are metered separately, and the split matters: a chatbot that carries long conversation history is input-heavy, while a content generator that writes long articles from short briefs is output-heavy. Knowing your ratio tells you which price column on the provider's page actually drives your bill.

Why output tokens usually cost more

Generating text is more expensive for the provider than reading it. A model processes your whole input in parallel passes, but must produce output one token at a time, each step requiring a full forward pass through the network. That sequential work ties up GPUs longer, so providers typically price output tokens 3–5× above input tokens. Practical consequence: capping response length and asking for concise formats is one of the cheapest optimizations available.

How caching can reduce AI costs

Most AI apps resend the same content on every request: the system prompt, tool definitions, a knowledge-base excerpt. Prompt caching lets the provider store that repeated prefix and bill it at a reduced rate — often 50–90% cheaper than regular input tokens. If your requests share a large stable prefix, set the "cached requests" slider in the calculator to your expected cache-hit rate and watch the input cost drop. Two caveats: only the repeated part of the prompt is discounted, and cache entries expire, so sporadic traffic benefits less.

Common hidden costs of AI automation

Token prices are the visible tip. Real deployments also pay for:

The optional cost fields in the calculator exist precisely so these don't surprise you.

Example cost estimates

Example: SaaS chatbot cost estimate

Imagine a customer-support chatbot in a B2B SaaS: 2,000 monthly active users, each asking about 6 questions a day, 20 business days a month — 240,000 requests. With ~800 input tokens (system prompt + history + retrieved help articles) and ~250 output tokens per answer, that's 192M input and 60M output tokens monthly. On a mid-tier model, the token bill lands in the hundreds to low thousands of dollars — and a 40% cache-hit rate on the shared system prompt can cut the input portion by a third. Plug these numbers into the calculator above to see the full breakdown with your own model choice.

Example: internal workflow automation cost estimate

Now an internal document-processing workflow: 50 employees, ~20 automated runs each per working day (22 days), summarizing contracts of ~3,000 input tokens into ~400-token summaries. That's only 22,000 requests but 66M input tokens — input-heavy workloads like this often run beautifully on small, cheap models, since summarization is a task where mini-tier quality is usually sufficient. The same volume on a flagship model could cost 10–20× more for little visible gain. This is the single most common overspend we see.

How to reduce AI API costs

  1. Right-size the model. Test the cheapest model that might work before defaulting to the flagship.
  2. Cache aggressively. Stable system prompts and shared context should hit the cache, not full-price input.
  3. Trim the prompt. Cut redundant instructions; summarize old chat history instead of resending it.
  4. Cap output length. Set max tokens and ask for concise formats — output is your most expensive token.
  5. Route by difficulty. Send easy requests to a cheap model, escalate hard ones to a premium model.
  6. Batch non-urgent work. Batch APIs typically discount 50% for jobs that can wait.
  7. Monitor from day one. Track cost per feature and per customer; spikes you can see are spikes you can fix.

When to choose a cheaper model vs a premium model

Choose a cheaper model when the task is classification, extraction, summarization, routing, or formulaic writing — high-volume tasks with easy-to-verify output. Choose a premium model when errors are expensive: complex reasoning, code generation, legal or medical drafting, agent workflows with many steps, or anything customer-facing where quality is the product. A useful rule: prototype on a premium model to prove the feature works, then walk down the price ladder until quality measurably degrades — and stop one step above that.

AI API cost checklist before launching

Recommended stack

Tools that help control AI costs

Some links may be affiliate links. We may earn a commission at no extra cost to you.

FAQ

Frequently asked questions

How do I estimate AI API costs?

Multiply your expected monthly requests (users × prompts per day × days) by the average tokens per request, then apply the provider's price per million input and output tokens. Add infrastructure costs (vector database, hosting, monitoring) and a 20–30% safety margin. This calculator does all of that for you.

What are input and output tokens?

Tokens are the chunks of text AI models read and write — roughly 750 English words per 1,000 tokens. Input tokens are everything you send to the model (instructions, context, chat history). Output tokens are everything the model writes back. Providers bill them at different rates.

Why are AI API costs hard to predict?

Because usage is variable: prompt length grows as you add context, users behave differently than expected, retries and errors add hidden requests, and output length depends on the task. That's why estimating with a safety margin and monitoring real usage from day one matters.

Which AI model is cheapest?

Smaller models (mini/flash/small tiers and hosted open-source 8B-class models) are usually 10–30× cheaper per token than flagship models. The real question is which model is cheapest while still meeting your quality bar — test a small model on your actual task before paying for a premium one.

How can I reduce LLM API costs?

The biggest levers: use a smaller model where quality allows, shorten prompts and trim chat history, use prompt caching for repeated context, cap output length, batch non-urgent work, and route easy requests to cheap models while reserving premium models for hard cases.

Does this calculator use live pricing?

No. Prices are stored in an editable data file and verified periodically against each provider's official pricing page. Every result is an estimate for planning purposes only, not a quote.

Can I use this for OpenAI, Claude, Gemini and Mistral?

Yes. The calculator includes model entries for OpenAI, Anthropic (Claude), Google Gemini, Mistral and hosted open-source models, and a custom pricing mode for any other provider — just enter your own price per million tokens.

What hidden costs should I include?

Beyond tokens: embeddings and a vector database if you use retrieval (RAG), fine-tuning, storage, monitoring and logging tools, server or serverless hosting, workflow automation platforms, and the human time spent reviewing AI outputs. The calculator has optional fields for each.

Is this estimate accurate?

It's as accurate as your assumptions. The math is exact, but real usage always differs from projections — treat results as a planning range, keep the safety margin, and reconcile against your first real invoices. Always verify current pricing on each provider's official page.

Can you help optimize my AI automation costs?

Yes — use the free AI cost review form on this page. Tell us your use case and current spend, and we'll suggest model choices, caching strategies and workflow changes that typically reduce costs by 30–60%.

Free service

Get a Free AI Cost Review

Tell us what you're building and what you spend. We'll reply with concrete suggestions — model choices, caching strategy and workflow design — to build a cost-efficient AI workflow and estimate your AI automation ROI.

We never sell or share your data with third parties without consent.