Skip to content

How to Estimate and Control Your AI API Costs

Learn how to predict your OpenAI, Anthropic, or Google AI spending before it surprises you — with formulas, examples, and monitoring tips.

Floopy Team | | 6 min read
cost-estimation api-costs monitoring guides

“How much will this cost?” is the first question every team asks before putting AI into production — and the hardest to answer without real data.

This guide gives you the formulas, examples, and monitoring strategies to predict your AI API spending with confidence.

Understanding Token-Based Pricing

AI APIs charge per token — roughly 4 characters or ¾ of a word in English. Most providers charge differently for input and output tokens:

ModelInput Cost (per 1M tokens)Output Cost (per 1M tokens)
GPT-4o$2.50$10.00
GPT-4o-mini$0.15$0.60
Claude Sonnet 4$3.00$15.00
Claude Haiku 4$0.80$4.00
Gemini 2.5 Pro$1.25$10.00
Gemini 2.0 Flash$0.10$0.40

The Cost Formula

Cost per request = (input_tokens × input_price) + (output_tokens × output_price)
Monthly cost = cost_per_request × requests_per_day × 30

Example: Customer Support Chatbot

Let’s estimate costs for a chatbot handling 10,000 conversations/day:

  • System prompt: ~500 tokens
  • Average user message: ~50 tokens
  • Average conversation context: ~800 tokens (history)
  • Average response: ~200 tokens

Input per request: 500 + 50 + 800 = 1,350 tokens Output per request: 200 tokens

With GPT-4o:

  • Input: 1,350 × $2.50/1M = $0.003375
  • Output: 200 × $10.00/1M = $0.002
  • Per request: $0.005375
  • Monthly: $0.005375 × 10,000 × 30 = $1,612/month

With GPT-4o-mini:

  • Input: 1,350 × $0.15/1M = $0.000203
  • Output: 200 × $0.60/1M = $0.000120
  • Per request: $0.000323
  • Monthly: $0.000323 × 10,000 × 30 = $97/month

Same chatbot. Same quality for most support questions. $1,612 vs $97.

Example: AI Coding Assistant

A coding assistant processing 5,000 requests/day:

  • System prompt: ~2,000 tokens (detailed instructions)
  • Code context: ~3,000 tokens (file contents, errors)
  • User message: ~100 tokens
  • Generated code: ~500 tokens

With GPT-4o:

  • Input: 5,100 × $2.50/1M = $0.01275
  • Output: 500 × $10.00/1M = $0.005
  • Per request: $0.01775
  • Monthly: $0.01775 × 5,000 × 30 = $2,663/month

The Hidden Costs

Your actual bill will be higher than the formula suggests because of:

1. Retries

Failed requests (timeouts, rate limits, errors) need retries. Budget for 5-15% extra requests.

2. Conversation History

Each message in a multi-turn conversation resends the entire history. A 10-message conversation means message #10 includes all 9 previous messages as input tokens.

3. System Prompt Overhead

Your system prompt is sent with every single request. A 1,000-token system prompt at 10,000 requests/day = 10M input tokens just for the system prompt.

4. Development and Testing

Dev environments, testing, CI/CD prompt testing — these add up. A team of 5 developers testing prompts manually can easily generate 20-30% of production volume.

Setting Up Cost Controls

Step 1: Set Budget Alerts

Before going to production, configure alerts at:

  • 50% of monthly budget — awareness
  • 75% of monthly budget — investigate if trending high
  • 90% of monthly budget — action required
  • 100% of monthly budget — hard stop or degraded mode

Step 2: Implement Per-User Limits

Prevent a single user or API key from consuming a disproportionate share:

// Example: 100 requests per minute per user
const rateLimiter = {
window: '1m',
maxRequests: 100,
keyBy: 'userId'
};

Step 3: Track Cost Per Feature

Don’t just track total spending. Break it down by:

  • Feature/endpoint — Which features cost the most?
  • User segment — Are free-tier users costing more than they should?
  • Model — Are you accidentally using expensive models for simple tasks?

Cost Monitoring Dashboard

At minimum, track these metrics daily:

MetricWhy It Matters
Total cost (daily/weekly/monthly)Trend awareness
Cost per request (p50, p95)Catch expensive outliers
Tokens per request (input/output)Identify bloated prompts
Requests by modelVerify model routing
Cache hit rateMeasure optimization effectiveness
Cost per userIdentify abuse or inefficiency

Using an AI Gateway for Cost Control

Building all of this — monitoring, rate limiting, alerts, model routing — from scratch takes significant engineering time.

An AI gateway like Floopy gives you all of this out of the box:

  • Real-time cost dashboard with per-request, per-user, and per-model breakdowns
  • Budget alerts and hard limits configurable per API key
  • Automatic cost logging to ClickHouse for historical analysis
  • Smart Cost Routing that automatically picks cheaper models for simple tasks

The estimate you should actually care about. A static per-token estimate assumes model choice is fixed. In production, the cheapest viable model per prompt changes over time as prompts, traffic mix, and quality bars drift. Floopy’s feedback-driven routing keeps that estimate honest by propagating one NPS score per session across every routing decision in that session and combining it with LLM-as-judge, admin ratings, and public benchmarks — so the “cheapest viable” cutoff is continuously re-fit instead of frozen at setup time. Walk-through: Smart Cost Routing and session propagation.

You get visibility into exactly where your money is going from day one.

Cost Estimation Cheat Sheet

Application TypeTypical VolumeEstimated Monthly Cost (GPT-4o-mini)
Internal chatbot1K req/day$10-30
Customer support bot10K req/day$100-300
Content generation5K req/day$50-200
Code assistant5K req/day$150-500
RAG application10K req/day$200-600
High-volume API100K req/day$1,000-5,000

These are estimates using GPT-4o-mini. Multiply by 15-20x for GPT-4o equivalents.

Key Takeaways

  1. Do the math before going to production — use the cost formula with your actual prompt sizes
  2. Account for hidden costs — retries, conversation history, system prompt overhead
  3. Set budget controls on day one — don’t wait for a surprise bill
  4. Track cost per feature and per user — aggregates hide the real problems
  5. Start with the cheapest viable model — upgrade only where quality demands it