Skip to content
Get Started
Inspectable · constrained · reversible

Cut LLM costs safely with feedback-driven routing.

Floopy learns from session NPS, LLM-as-judge, admin ratings, and benchmarks to route each request to the lowest-cost model that still meets your quality constraints. Every decision is auditable, explainable, and reversible.

  • OpenAI-compatible gateway
  • Shadow + canary out of the box
  • Exportable decision history
decision_trace.json req_8f3a2c1
{
  "chosen": "claude-haiku-4.5",
  "baseline": "gpt-5.4-mini",
  "confidence": 0.91,
  "signals": {
    "session_nps":    +0.42,
    "llm_judge":      +0.31,
    "admin_rating":   +0.18,
    "benchmark":      +0.27
  },
  "constraints_passed": [
    "max_regression < 2%",
    "min_confidence ≥ 0.85"
  ],
  "reversible": true
}
shadow mode · last 24h ● live
cost / request
−38.2%
quality vs baseline
+0.6%
regressions
0 / 12,418
⚠ honest framing

Automatic routing sounds risky.
It should.

Floopy changes which model serves a request. That is a load-bearing decision in your product, and it should not live inside a black box. Every optimization Floopy makes is backed by a trace, a confidence score, a constraint check, and a rollback path.

Decision traces
Every routed request emits a JSON trace: chosen model, baseline, signal weights, confidence, evidence (samples, score gap, variance, bucketed regressions), and a human-readable explanation in your language.
Confidence gates
Routes only flip when confidence exceeds your threshold. Below it, requests fall back to your default model.
Hard constraints
Nine declarative knobs across quality limits, cost limits, and promotion gates. The router cannot pick a model that violates them.
Shadow + canary
New routing decisions run alongside your default for as long as you want. Promote only when the data convinces you.
Per-route rollback
Pin any route back to your default model in one API call. Pinning is logged and reversible.
how it decides

How Floopy decides.

One pipeline. Six stages. Every stage emits structured output you can read, log, and replay.

step 01
Request
incoming user turn
step 02
Candidates
eligible models for this route
Signals
  • Session NPS
  • LLM-as-judge
  • Admin ratings
  • Benchmarks
Constraints
  • max regression
  • max cost increase
  • minimum confidence
  • and more
step 05
Decision trace
signed, exportable
step 06
Route or fallback
default model on miss

On any miss — low confidence, failed constraint, provider error — Floopy serves your default model and records the miss in the trace.

trust controls

Trust controls.

The primitives that make automatic routing safe to enable, route by route.

pre-launch

Shadow mode

Run Floopy in parallel with your default model. No production traffic affected until you opt in route-by-route.

every request

Decision trace

Per-request JSON: chosen model, baseline, signal contributions, confidence, evidence (samples, score gap, variance, bucketed regressions), and a human-readable explanation rendered in your language. Streamed and queryable.

declarative

Constraints

Nine declarative knobs across quality limits, cost limits, and promotion gates. Hard limits — the router will not violate them, and every change is hashed into the audit log.

automatic

Regression rollback

Auto-pin a route to its baseline if regressions exceed your threshold within the rolling window. One-line override available.

no lock-in

Export

Decision history exports to S3, BigQuery, or webhook. Bring your own warehouse, your own retention, your own SIEM.

enterprise

Enterprise isolated learning

Opt out of the shared learning pool. Your routing models train on your traffic only. SOC 2, HIPAA, BAA available.

Integration

Start quickly. Validate safely.

Floopy is OpenAI-compatible, so integration can start with a small SDK/client change. Start in shadow mode first, inspect decision traces, then enable live optimization when the baseline comparison proves value.

agent.ts
import OpenAI from 'openai';

const client = new OpenAI({
  baseURL: "https://api.floopy.ai/v1",
  apiKey: process.env.FLOOPY_API_KEY,
});

const res = await client.chat.completions.create({
  model: 'auto',   // let Floopy pick the cheapest that holds quality
  messages,
});

// attach feedback later by response id
await fetch("https://api.floopy.ai/v1/feedback", {
  method: "POST",
  body: JSON.stringify({ id: res.id, score: 1 }),
});
from openai import OpenAI
import os, requests

client = OpenAI(
  base_url="https://api.floopy.ai/v1",
  api_key=os.environ["FLOOPY_API_KEY"],
)

res = client.chat.completions.create(
  model="auto",
  messages=messages,
)

# attach feedback later by response id
requests.post(
  "https://api.floopy.ai/v1/feedback",
  json={"id": res.id, "score": 1},
)
$ curl https://api.floopy.ai/v1/chat/completions \
  -H "Authorization: Bearer $FLOOPY_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{ "model": "auto", "messages": [...] }'

# Later: attach feedback by id
$ curl https://api.floopy.ai/v1/feedback \
  -H "Authorization: Bearer $FLOOPY_API_KEY" \
  -d '{ "id": "run_01h...", "score": 1 }'
mcp · model context protocol

Make your AI infrastructure queryable by agents.

Connect Floopy to your internal agents and let them inspect costs, audit requests, compare providers, analyze feedback, and explain routing decisions. Floopy turns logs, costs, feedback, and audit trails into living context for your AI agents.

mcp tools your agents can call full list in the docs
tool
floopy. get_analytics ()
Token + cost breakdown by feature, route, and provider for any time window.
AnalyticsWindow
tool
floopy. get_decision ()
Pull a single request's full audit: signals, constraints, chosen vs baseline, firewall verdict.
DecisionAudit
tool
floopy. get_verification ()
Side-by-side quality and cost across candidate models on real traffic, not benchmarks.
VerificationAggregate
tool
floopy. list_decisions ()
Filter recent decisions by session, route, or window to surface low-quality outliers.
Decision[]
tool
floopy. explain_routing ()
Dry-run a routing decision and inspect the gateway's pick — no provider call, no log row.
RoutingDecision
tool
floopy. list_models ()
Inventory of providers and models configured for this organization.
ModelInventory
ask in your preferred llm natural language → tool calls
ask_your_llm.session mcp connected
01
user
Which model has the lowest cost per resolved session?
02
user
Which feature is wasting the most tokens?
03
user
Find decisions with low quality and high latency.
04
user
Explain why this request routed to GPT-4 instead of the baseline.
routed to → floopy-mcp · your model · your prompt · your data plane
security

Connected by token, isolated by design.

The MCP server is fronted by TBAC. Your org's data never leaves your tenant boundary, and every tool call is logged for replay.

TBAC tokens
Connect via short-lived tokens scoped by tag and resource. Each agent only sees what its token can see.
Org-isolated data
Tool calls hit your tenant only. No cross-org reads, no shared learning, no shared cache. Auditable on every call.
Read-by-default
Inspection tools are read-only. Mutations like create_experiment or update_constraints require separately-scoped, confirm-gated tokens.
audit trail · over mcp
Floopy exposes its own audit trail through MCP. Let your internal agents inspect gateway decisions, routing changes, feedback signals, and cost anomalies — without leaving your IDE or chat.
connect mcp
your traffic, your numbers

Verify savings on your own traffic.

Generic benchmarks make for tidy decks. They are not why you adopt a routing layer. Floopy compares actual routing outcomes against your default-model baseline, on your own production traffic, with your own quality signals.

  • Baseline = your current default model, mirrored from real traffic.
  • Quality signal = whatever you already collect — NPS, judge, admin ratings.
  • Promotion to production is a manual flip per route. Never automatic.
shadow_comparison · last 24h · route: /v1/chat ● live
metricbaselinefloopydelta
Cost / request$0.0142$0.0088−38.0%
Quality score (judge + NPS blended)0.8120.819+0.9%
p95 latency1,840 ms1,910 ms+3.8%
Regression events (24h)0 / 12,418within threshold
example values shown — your dashboard renders your own numbers
Providers

OpenAI-compatible gateway.

Works with every major AI provider — 20 supported today through one endpoint.

OpenAIAnthropicGoogle GeminiGoogle VertexAWS BedrockAzure OpenAIDeepSeekMistralxAIGroqCerebrasSambaNovaTogetherFireworksPerplexityCohereAI21DeepInfraNebiusNovita See all providers →
What makes Floopy different

Three design choices no other router makes.

Plus a published scoring formula you can audit on every decision.

Session propagation

One NPS rating per session propagates to every routing decision in that session. No per-request labeling required.

Multi-source weighting

Four feedback sources combined with weights that adapt as real signal accumulates — benchmarks first, NPS once it arrives.

Managed shared pool

Every Floopy customer's signal improves the shared router. Enterprise can opt out for isolated learning.

40 / 40 / 20 routing formula

Every candidate model is scored on 40% success + 40% feedback + 20% cost. Published, auditable, replayable.

Pricing

Simple pricing for production LLM optimization.

Start small, prove savings in shadow mode, and upgrade when you need exports, constraints, experiments, and longer retention.

Free
$0 /month
Explore Floopy with a limited free plan.
  • 50,000 requests / month
  • 20+ providers (OpenAI, Anthropic, Gemini…)
  • Exact cache + LLM Firewall firewall
  • 7-day log retention
Start free
Starter
$29.90 /month
Build something real with your own feedback signal.
  • 100k requests / month · 1k rpm
  • Feedback API · 500 submissions / mo
  • Semantic cache
  • 30-day log retention
Subscribe
Enterprise
Custom
Compliance, isolation, SLA and dedicated support.
  • SSO/SAML · SOC 2 · HIPAA
  • Dedicated SLA + Slack support
  • Opt-out of shared model
  • Dedicated tenant isolation
Talk to sales
category map

Gateway vs observability vs optimization.

Portkey, Helicone, and LiteLLM solve real problems — gatewaying, logging, observability, provider normalization. Floopy sits a layer above: user outcomes influence which model serves the next request, with constraints and traces around every decision.

gateway
Provider gateway
Portkey · LiteLLM
Provider normalization
Routing config manual rules
Logging
Feedback-driven routing
Decision traces with signal weights
observability
LLM observability
Helicone · Langfuse
Provider normalization partial
Routing config
Logging
Feedback-driven routing
Decision traces with signal weights
optimization
Floopy
feedback-driven optimization
Provider normalization
Routing config learned + constrained
Logging ✓ + decision trace
Feedback-driven routing
Decision traces with signal weights

Floopy sits comfortably behind a gateway you already run. Bring your own logging stack. Bring your own observability vendor.

FAQ

Frequently asked.

Common questions teams ask before turning feedback-driven routing on.

Can I run Floopy without affecting production? +
Yes. Start in shadow mode. Floopy computes routing decisions in parallel while your current provider still serves responses. You inspect what it would have chosen, with full decision traces, before any live traffic moves.
Is Floopy an AI gateway or an optimization layer? +
Both, but the core is the optimization layer. Floopy ships an OpenAI-compatible gateway so integration is one base-URL swap, but the product on top is feedback-driven routing: candidates, signals, constraints, decision traces, and reversible rollouts.
How does feedback-driven routing work? +
Four signals feed the router: session NPS (one rating propagated across every decision in that session), LLM-as-judge scoring on every request, admin ratings, and public benchmarks. Weights are dynamic per phase — benchmarks dominate at Day 0; once your org has signal, automatic feedback enters; once you log NPS, session NPS becomes primary. Phase and weights are visible in the trace for every decision.
How can I see why a request was routed to a model? +
Every request gets a decision trace with candidates considered, weights applied, filtered reasons, the winner, a confidence score, and the constraint check outcomes. Inspect it in the dashboard or via GET /v1/decisions/{id}.
What happens if Floopy makes a bad decision? +
Bad decisions can happen in any routing system. Floopy reduces blast radius with hard constraints (max regression, max cost increase, min confidence), confidence thresholds, regression monitoring, canary and shadow experiments, and one-call rollback per route. Bad decisions are bounded, observable, and reversible.
Can I control how aggressive optimization is? +
Yes. Set max_regression, max_cost_increase, min_confidence, and per-route routing constraints. The router cannot pick a model that violates them — violations fall back to your default.
Does Floopy train on my data? +
Free and Pro plans use aggregated routing signals to improve shared priors — never raw prompts or completions. Raw prompts and completions are not used for shared learning. Enterprise can run isolated learning, with no cross-tenant signal flow.
How is Floopy different from Portkey, Helicone and LiteLLM? +
Portkey and LiteLLM are gateways — provider normalization, routing rules, logging. Helicone and Langfuse are observability. Floopy sits a layer above: user outcomes influence which model serves the next request, with constraints and traces around every decision. You can run Floopy behind a gateway you already operate, and ship logs to the observability vendor you already pay for.
How do I verify savings on my own traffic? +
Use shadow mode and the baseline-vs-Floopy comparison report. Generic benchmarks make for tidy decks, but the only number that matters is what happens on your traffic, against your own quality signals. Promotion to production is always a manual flip per route.
Can I export my decision data? +
Yes. Decision history exports as JSONL via GET /v1/export/decisions with an optional gzip flag and a SHA-256 trailer for verifiability. Pipe it into S3, BigQuery, or your warehouse — the data is yours, with no lock-in.
Safe adoption

Start in shadow mode.
Verify savings before production.

Point your SDK at Floopy in shadow. Watch the comparison populate against your own baseline. Promote routes one at a time, with constraints you wrote, on a timeline you control.

Shadow mode does not affect production traffic. Decision traces are exportable from day one.