Cut LLM costs safely with feedback-driven routing.
Floopy learns from session NPS, LLM-as-judge, admin ratings, and benchmarks to route each request to the lowest-cost model that still meets your quality constraints. Every decision is auditable, explainable, and reversible.
- ✓ OpenAI-compatible gateway
- ✓ Shadow + canary out of the box
- ✓ Exportable decision history
{
"chosen": "claude-haiku-4.5",
"baseline": "gpt-5.4-mini",
"confidence": 0.91,
"signals": {
"session_nps": +0.42,
"llm_judge": +0.31,
"admin_rating": +0.18,
"benchmark": +0.27
},
"constraints_passed": [
"max_regression < 2%",
"min_confidence ≥ 0.85"
],
"reversible": true
}Automatic routing sounds risky.
It should.
Floopy changes which model serves a request. That is a load-bearing decision in your product, and it should not live inside a black box. Every optimization Floopy makes is backed by a trace, a confidence score, a constraint check, and a rollback path.
How Floopy decides.
One pipeline. Six stages. Every stage emits structured output you can read, log, and replay.
- Session NPS
- LLM-as-judge
- Admin ratings
- Benchmarks
- max regression
- max cost increase
- minimum confidence
- and more
On any miss — low confidence, failed constraint, provider error — Floopy serves your default model and records the miss in the trace.
Trust controls.
The primitives that make automatic routing safe to enable, route by route.
Shadow mode
Run Floopy in parallel with your default model. No production traffic affected until you opt in route-by-route.
Decision trace
Per-request JSON: chosen model, baseline, signal contributions, confidence, evidence (samples, score gap, variance, bucketed regressions), and a human-readable explanation rendered in your language. Streamed and queryable.
Constraints
Nine declarative knobs across quality limits, cost limits, and promotion gates. Hard limits — the router will not violate them, and every change is hashed into the audit log.
Regression rollback
Auto-pin a route to its baseline if regressions exceed your threshold within the rolling window. One-line override available.
Export
Decision history exports to S3, BigQuery, or webhook. Bring your own warehouse, your own retention, your own SIEM.
Enterprise isolated learning
Opt out of the shared learning pool. Your routing models train on your traffic only. SOC 2, HIPAA, BAA available.
Start quickly. Validate safely.
Floopy is OpenAI-compatible, so integration can start with a small SDK/client change. Start in shadow mode first, inspect decision traces, then enable live optimization when the baseline comparison proves value.
import OpenAI from 'openai'; const client = new OpenAI({ baseURL: "https://api.floopy.ai/v1", apiKey: process.env.FLOOPY_API_KEY, }); const res = await client.chat.completions.create({ model: 'auto', // let Floopy pick the cheapest that holds quality messages, }); // attach feedback later by response id await fetch("https://api.floopy.ai/v1/feedback", { method: "POST", body: JSON.stringify({ id: res.id, score: 1 }), });
from openai import OpenAI import os, requests client = OpenAI( base_url="https://api.floopy.ai/v1", api_key=os.environ["FLOOPY_API_KEY"], ) res = client.chat.completions.create( model="auto", messages=messages, ) # attach feedback later by response id requests.post( "https://api.floopy.ai/v1/feedback", json={"id": res.id, "score": 1}, )
$ curl https://api.floopy.ai/v1/chat/completions \ -H "Authorization: Bearer $FLOOPY_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "auto", "messages": [...] }' # Later: attach feedback by id $ curl https://api.floopy.ai/v1/feedback \ -H "Authorization: Bearer $FLOOPY_API_KEY" \ -d '{ "id": "run_01h...", "score": 1 }'
Make your AI infrastructure queryable by agents.
Connect Floopy to your internal agents and let them inspect costs, audit requests, compare providers, analyze feedback, and explain routing decisions. Floopy turns logs, costs, feedback, and audit trails into living context for your AI agents.
Connected by token, isolated by design.
The MCP server is fronted by TBAC. Your org's data never leaves your tenant boundary, and every tool call is logged for replay.
Verify savings on your own traffic.
Generic benchmarks make for tidy decks. They are not why you adopt a routing layer. Floopy compares actual routing outcomes against your default-model baseline, on your own production traffic, with your own quality signals.
- ✓ Baseline = your current default model, mirrored from real traffic.
- ✓ Quality signal = whatever you already collect — NPS, judge, admin ratings.
- ✓ Promotion to production is a manual flip per route. Never automatic.
| metric | baseline | floopy | delta |
|---|---|---|---|
| Cost / request | $0.0142 | $0.0088 | −38.0% |
| Quality score (judge + NPS blended) | 0.812 | 0.819 | +0.9% |
| p95 latency | 1,840 ms | 1,910 ms | +3.8% |
| Regression events (24h) | — | 0 / 12,418 | within threshold |
OpenAI-compatible gateway.
Works with every major AI provider — 20 supported today through one endpoint.
Three design choices no other router makes.
Plus a published scoring formula you can audit on every decision.
Session propagation
One NPS rating per session propagates to every routing decision in that session. No per-request labeling required.
Multi-source weighting
Four feedback sources combined with weights that adapt as real signal accumulates — benchmarks first, NPS once it arrives.
Managed shared pool
Every Floopy customer's signal improves the shared router. Enterprise can opt out for isolated learning.
40 / 40 / 20 routing formula
Every candidate model is scored on 40% success + 40% feedback + 20% cost. Published, auditable, replayable.
Simple pricing for production LLM optimization.
Start small, prove savings in shadow mode, and upgrade when you need exports, constraints, experiments, and longer retention.
- 50,000 requests / month
- 20+ providers (OpenAI, Anthropic, Gemini…)
- Exact cache + LLM Firewall firewall
- 7-day log retention
- 100k requests / month · 1k rpm
- Feedback API · 500 submissions / mo
- Semantic cache
- 30-day log retention
- Feedback-driven routing
- Smart selectors + A/B testing
- Advanced firewall (LLM Firewall)
- 2-year retention · 10k rpm
- SSO/SAML · SOC 2 · HIPAA
- Dedicated SLA + Slack support
- Opt-out of shared model
- Dedicated tenant isolation
Gateway vs observability vs optimization.
Portkey, Helicone, and LiteLLM solve real problems — gatewaying, logging, observability, provider normalization. Floopy sits a layer above: user outcomes influence which model serves the next request, with constraints and traces around every decision.
Floopy sits comfortably behind a gateway you already run. Bring your own logging stack. Bring your own observability vendor.
Frequently asked.
Common questions teams ask before turning feedback-driven routing on.
Can I run Floopy without affecting production? +
Is Floopy an AI gateway or an optimization layer? +
How does feedback-driven routing work? +
How can I see why a request was routed to a model? +
GET /v1/decisions/{id}.What happens if Floopy makes a bad decision? +
Can I control how aggressive optimization is? +
max_regression, max_cost_increase, min_confidence, and per-route routing constraints. The router cannot pick a model that violates them — violations fall back to your default.Does Floopy train on my data? +
How is Floopy different from Portkey, Helicone and LiteLLM? +
How do I verify savings on my own traffic? +
Can I export my decision data? +
GET /v1/export/decisions with an optional gzip flag and a SHA-256 trailer for verifiability. Pipe it into S3, BigQuery, or your warehouse — the data is yours, with no lock-in.Start in shadow mode.
Verify savings before production.
Point your SDK at Floopy in shadow. Watch the comparison populate against your own baseline. Promote routes one at a time, with constraints you wrote, on a timeline you control.
Shadow mode does not affect production traffic. Decision traces are exportable from day one.