How to Estimate AI API Costs — OpenAI, Claude, Gemini Pricing Guide

“How much will this cost?” is the first question every team asks before putting AI into production — and the hardest to answer without real data.

This guide gives you the formulas, examples, and monitoring strategies to predict your AI API spending with confidence.

Understanding Token-Based Pricing

AI APIs charge per token — roughly 4 characters or ¾ of a word in English. Most providers charge differently for input and output tokens:

Model	Input Cost (per 1M tokens)	Output Cost (per 1M tokens)
GPT-4o	$2.50	$10.00
GPT-4o-mini	$0.15	$0.60
Claude Sonnet 4	$3.00	$15.00
Claude Haiku 4	$0.80	$4.00
Gemini 2.5 Pro	$1.25	$10.00
Gemini 2.0 Flash	$0.10	$0.40

The Cost Formula

Cost per request = (input_tokens × input_price) + (output_tokens × output_price)

Monthly cost = cost_per_request × requests_per_day × 30

Example: Customer Support Chatbot

Let’s estimate costs for a chatbot handling 10,000 conversations/day:

System prompt: ~500 tokens
Average user message: ~50 tokens
Average conversation context: ~800 tokens (history)
Average response: ~200 tokens

Input per request: 500 + 50 + 800 = 1,350 tokens Output per request: 200 tokens

With GPT-4o:

Input: 1,350 × $2.50/1M = $0.003375
Output: 200 × $10.00/1M = $0.002
Per request: $0.005375
Monthly: $0.005375 × 10,000 × 30 = $1,612/month

With GPT-4o-mini:

Input: 1,350 × $0.15/1M = $0.000203
Output: 200 × $0.60/1M = $0.000120
Per request: $0.000323
Monthly: $0.000323 × 10,000 × 30 = $97/month

Same chatbot. Same quality for most support questions. $1,612 vs $97.

Example: AI Coding Assistant

A coding assistant processing 5,000 requests/day:

System prompt: ~2,000 tokens (detailed instructions)
Code context: ~3,000 tokens (file contents, errors)
User message: ~100 tokens
Generated code: ~500 tokens

With GPT-4o:

Input: 5,100 × $2.50/1M = $0.01275
Output: 500 × $10.00/1M = $0.005
Per request: $0.01775
Monthly: $0.01775 × 5,000 × 30 = $2,663/month

The Hidden Costs

Your actual bill will be higher than the formula suggests because of:

1. Retries

Failed requests (timeouts, rate limits, errors) need retries. Budget for 5-15% extra requests.

2. Conversation History

Each message in a multi-turn conversation resends the entire history. A 10-message conversation means message #10 includes all 9 previous messages as input tokens.

3. System Prompt Overhead

Your system prompt is sent with every single request. A 1,000-token system prompt at 10,000 requests/day = 10M input tokens just for the system prompt.

4. Development and Testing

Dev environments, testing, CI/CD prompt testing — these add up. A team of 5 developers testing prompts manually can easily generate 20-30% of production volume.

Setting Up Cost Controls

Step 1: Set Budget Alerts

Before going to production, configure alerts at:

50% of monthly budget — awareness
75% of monthly budget — investigate if trending high
90% of monthly budget — action required
100% of monthly budget — hard stop or degraded mode

Step 2: Implement Per-User Limits

Prevent a single user or API key from consuming a disproportionate share:

// Example: 100 requests per minute per user
const rateLimiter = {
  window: '1m',
  maxRequests: 100,
  keyBy: 'userId'
};

Step 3: Track Cost Per Feature

Don’t just track total spending. Break it down by:

Feature/endpoint — Which features cost the most?
User segment — Are free-tier users costing more than they should?
Model — Are you accidentally using expensive models for simple tasks?

Cost Monitoring Dashboard

At minimum, track these metrics daily:

Metric	Why It Matters
Total cost (daily/weekly/monthly)	Trend awareness
Cost per request (p50, p95)	Catch expensive outliers
Tokens per request (input/output)	Identify bloated prompts
Requests by model	Verify model routing
Cache hit rate	Measure optimization effectiveness
Cost per user	Identify abuse or inefficiency

Using an AI Gateway for Cost Control

Building all of this — monitoring, rate limiting, alerts, model routing — from scratch takes significant engineering time.

An AI gateway like Floopy gives you all of this out of the box:

Real-time cost dashboard with per-request, per-user, and per-model breakdowns
Budget alerts and hard limits configurable per API key
Automatic cost logging to ClickHouse for historical analysis
Smart Cost Routing that automatically picks cheaper models for simple tasks

The estimate you should actually care about. A static per-token estimate assumes model choice is fixed. In production, the cheapest viable model per prompt changes over time as prompts, traffic mix, and quality bars drift. Floopy’s feedback-driven routing keeps that estimate honest by propagating one NPS score per session across every routing decision in that session and combining it with LLM-as-judge, admin ratings, and public benchmarks — so the “cheapest viable” cutoff is continuously re-fit instead of frozen at setup time. Walk-through: Smart Cost Routing and session propagation.

You get visibility into exactly where your money is going from day one.

Cost Estimation Cheat Sheet

Application Type	Typical Volume	Estimated Monthly Cost (GPT-4o-mini)
Internal chatbot	1K req/day	$10-30
Customer support bot	10K req/day	$100-300
Content generation	5K req/day	$50-200
Code assistant	5K req/day	$150-500
RAG application	10K req/day	$200-600
High-volume API	100K req/day	$1,000-5,000

These are estimates using GPT-4o-mini. Multiply by 15-20x for GPT-4o equivalents.

Key Takeaways

Do the math before going to production — use the cost formula with your actual prompt sizes
Account for hidden costs — retries, conversation history, system prompt overhead
Set budget controls on day one — don’t wait for a surprise bill
Track cost per feature and per user — aggregates hide the real problems
Start with the cheapest viable model — upgrade only where quality demands it