Skip to content

Smart Cost Routing: Cut AI Costs Up to 60%

Smart Cost Routing picks cheaper models for simple prompts, guarded by Floopy's session-level feedback loop. 40-60% savings without compromising quality.

Floopy Team | | 4 min read
cost-optimization routing feedback-driven-routing agent-optimization product

Not every prompt needs GPT-4o.

A “what’s 2+2?” doesn’t need the same model as “write a distributed systems whitepaper.” Yet most teams route every request to their most expensive model, paying premium prices for trivial tasks.

Today we’re launching Smart Cost Routing, a feature that automatically detects prompt complexity and routes simple requests to cheaper models.

Smart Cost Routing is the most visible lever inside Floopy’s feedback-driven routing loop. The cost model picks a cheaper candidate; Floopy’s four feedback sources — session NPS, LLM-as-judge (auto), admin ratings, public benchmarks — decide whether the choice stuck. One rating per session is propagated to every routing decision in that session, so a cheaper-but-worse choice loses weight across every turn of the conversation it degraded, not just the one we happened to log.

The Problem

Teams using AI in production typically configure a single model per endpoint. Whether the user asks “translate hello to Spanish” or “debug this complex async Rust code,” the same expensive model handles both.

Our data shows that 40-60% of production prompts are simple — short questions, translations, summaries, classifications. These can be handled by models costing 5-10x less.

How Smart Cost Routing Works

Step 1: Classify Complexity

Every incoming prompt is instantly classified into three tiers:

  • Simple (score 0-0.3): Short prompts, single-turn, no code, no tools
  • Moderate (score 0.3-0.7): Multi-turn conversations, some code, structured output
  • Complex (score 0.7-1.0): Long system prompts, code generation, tool use

Classification uses heuristics (zero latency) enhanced by historical similarity matching.

Step 2: Select the Cheapest Viable Model

For simple and moderate prompts, the system picks the cheapest model that historically maintained quality:

  • 90% of the time (exploitation): Uses the model with the best performance score
  • 10% of the time (exploration): Tests less-explored models to gather data

Step 3: Session-Level Safety Guarantees

  • Complex prompts always use your default model
  • The system never picks a more expensive model
  • You set the minimum quality threshold (default 70%)
  • Every cheaper-model choice is auto-scored by the LLM-as-judge on four dimensions (accuracy, completeness, safety, format-adherence) and tied back to the session’s NPS when it arrives; if the cheaper pick dropped session quality, the router reweights against it on the next turn — not on the next day

This is the piece that matters for people migrating from per-request feedback systems. You don’t have to rate each response to protect quality. One NPS per session covers the full trajectory — multi-turn, tool calls, retries — and the router learns from that signal across every decision it made inside the session.

Example Savings

ScenarioDefault ModelSmart RouteSavings
”What is 2+2?”GPT-4o ($2.50/M)GPT-4o-mini ($0.15/M)94%
“Translate to Spanish”Claude 3.5 Sonnet ($3/M)Claude 3 Haiku ($0.25/M)92%
“Summarize this paragraph”Gemini 1.5 Pro ($1.25/M)Gemini 2.0 Flash ($0.10/M)92%
Complex code reviewGPT-4oGPT-4o (no change)0%

With 50% of traffic being simple prompts, you could see 40-60% overall cost reduction.

Getting Started

  1. Go to Routing in the Floopy dashboard
  2. Create or edit a routing rule
  3. Toggle Smart Cost Routing on
  4. Set your exploration rate and minimum quality
  5. Watch the savings accumulate in your dashboard

Monitoring Your Savings

Two new dashboard widgets help you track results:

  • Smart Cost Savings: Total dollars saved, daily trend, breakdown by model
  • Smart Routing Accuracy: Quality comparison showing exactly how much quality you retain vs. how much you save — decomposed into the four feedback sources so you can see which signal is driving the call

Availability

Smart Cost Routing is available on the Pro plan ($199.90/month) and on Enterprise.