Smart Cost Routing: Cut AI Costs Up to 60%
Smart Cost Routing picks cheaper models for simple prompts, guarded by Floopy's session-level feedback loop. 40-60% savings without compromising quality.
Not every prompt needs GPT-4o.
A “what’s 2+2?” doesn’t need the same model as “write a distributed systems whitepaper.” Yet most teams route every request to their most expensive model, paying premium prices for trivial tasks.
Today we’re launching Smart Cost Routing, a feature that automatically detects prompt complexity and routes simple requests to cheaper models.
Smart Cost Routing is the most visible lever inside Floopy’s feedback-driven routing loop. The cost model picks a cheaper candidate; Floopy’s four feedback sources — session NPS, LLM-as-judge (auto), admin ratings, public benchmarks — decide whether the choice stuck. One rating per session is propagated to every routing decision in that session, so a cheaper-but-worse choice loses weight across every turn of the conversation it degraded, not just the one we happened to log.
The Problem
Teams using AI in production typically configure a single model per endpoint. Whether the user asks “translate hello to Spanish” or “debug this complex async Rust code,” the same expensive model handles both.
Our data shows that 40-60% of production prompts are simple — short questions, translations, summaries, classifications. These can be handled by models costing 5-10x less.
How Smart Cost Routing Works
Step 1: Classify Complexity
Every incoming prompt is instantly classified into three tiers:
- Simple (score 0-0.3): Short prompts, single-turn, no code, no tools
- Moderate (score 0.3-0.7): Multi-turn conversations, some code, structured output
- Complex (score 0.7-1.0): Long system prompts, code generation, tool use
Classification uses heuristics (zero latency) enhanced by historical similarity matching.
Step 2: Select the Cheapest Viable Model
For simple and moderate prompts, the system picks the cheapest model that historically maintained quality:
- 90% of the time (exploitation): Uses the model with the best performance score
- 10% of the time (exploration): Tests less-explored models to gather data
Step 3: Session-Level Safety Guarantees
- Complex prompts always use your default model
- The system never picks a more expensive model
- You set the minimum quality threshold (default 70%)
- Every cheaper-model choice is auto-scored by the LLM-as-judge on four dimensions (accuracy, completeness, safety, format-adherence) and tied back to the session’s NPS when it arrives; if the cheaper pick dropped session quality, the router reweights against it on the next turn — not on the next day
This is the piece that matters for people migrating from per-request feedback systems. You don’t have to rate each response to protect quality. One NPS per session covers the full trajectory — multi-turn, tool calls, retries — and the router learns from that signal across every decision it made inside the session.
Example Savings
| Scenario | Default Model | Smart Route | Savings |
|---|---|---|---|
| ”What is 2+2?” | GPT-4o ($2.50/M) | GPT-4o-mini ($0.15/M) | 94% |
| “Translate to Spanish” | Claude 3.5 Sonnet ($3/M) | Claude 3 Haiku ($0.25/M) | 92% |
| “Summarize this paragraph” | Gemini 1.5 Pro ($1.25/M) | Gemini 2.0 Flash ($0.10/M) | 92% |
| Complex code review | GPT-4o | GPT-4o (no change) | 0% |
With 50% of traffic being simple prompts, you could see 40-60% overall cost reduction.
Getting Started
- Go to Routing in the Floopy dashboard
- Create or edit a routing rule
- Toggle Smart Cost Routing on
- Set your exploration rate and minimum quality
- Watch the savings accumulate in your dashboard
Monitoring Your Savings
Two new dashboard widgets help you track results:
- Smart Cost Savings: Total dollars saved, daily trend, breakdown by model
- Smart Routing Accuracy: Quality comparison showing exactly how much quality you retain vs. how much you save — decomposed into the four feedback sources so you can see which signal is driving the call
Availability
Smart Cost Routing is available on the Pro plan ($199.90/month) and on Enterprise.