v2.4 · Feedback-driven routing

AI Agent
Optimization Platform

Floopy sits between your agent and the model providers. Every call gets routed to the cheapest model that still passes your quality bar — learned from real user feedback, not vibes.

Start free See benchmarks

58%

avg cost reduction

<8ms

routing overhead p99

99.99%

uptime last 90d

floopy.router · support-agent benchmark

Demo · benchmark data

Cost per 1k runs

$5.21 / 1k

$12.40 → $5.21

vs GPT-4o baseline

↓ 58%

How it works

Three lines of code.
Feedback does the rest.

Wrap your LLM call, wire up a thumbs-up signal, and Floopy starts learning which model is enough for each route. No prompt rewriting, no eval authoring, no pipeline changes.

Point your client at Floopy

One-line change. Use the OpenAI SDK you already ship — just swap the base URL. Anthropic, Google and Mistral work via the same OpenAI-compatible wire format.

import OpenAI from 'openai';

const client = new OpenAI({
  baseURL: "https://api.floopy.ai/v1",
  apiKey: process.env.FLOOPY_API_KEY,
});

Attach a feedback signal

Thumbs up/down, task completion, rewrite count — anything you already log. One metric is enough to bootstrap.

await fetch("https://api.floopy.ai/v1/feedback", {
  method: "POST",
  body: JSON.stringify({
    id: run.id,
    score: 1,
    reason: "resolved",
  }),
});

Ship. Watch cost drop.

Floopy tests cheaper models behind a canary, promotes what passes your quality bar, and rolls back on regression.

// After 48h shadow traffic
gpt-4o      → -62%
claude-sonnet → -41%
quality    → +0.3σ

Benchmarks

Same quality bar.
Half the bill.

Measured on 12,000 production traces from a customer-support agent. Quality held within one standard deviation of the GPT-4o baseline.

Read full methodology →

Configuration

Relative cost per 1k runs

Cost

baseline · gpt-4o

$12.40

manual routing

$9.67

cache-only

$7.94

▶ floopy · auto

$5.21

Platform

Everything between prompt
and production.

Not another observability dashboard. Floopy actively intervenes — routing, caching, fallback, canarying — while giving you the traces and evals to trust what it's doing.

Adaptive routing

Per-route policy learned from feedback. Pins on regression, canaries on drift, rebases on new models the week they ship.

Semantic cache

Fingerprints request + tools + context. Exact and paraphrase-match hits, versioned per route, TTL per signal.

Feedback loops

Thumbs, rewrites, completion, NPS — whatever you already collect. Offline RLHF without a data team.

Eval harness

LLM-as-judge, rubric evals, and golden sets. Runs on every promotion candidate before traffic sees it.

Tracing

Every tool call, token, and judgement. OpenTelemetry-native, export to Datadog, Honeycomb, or S3.

Guardrails

PII redaction, prompt-injection detection, region pinning, and per-tenant budget caps. On by default.

SDKs

Drop-in, everywhere
you already are.

Use the OpenAI SDK you already ship in Node, Python, Go, or Deno. Stream-safe, tool-calling-safe, and compatible with every provider that speaks the OpenAI wire format.

Read the docs View on GitHub →

agent.ts

import OpenAI from 'openai';

const client = new OpenAI({
  baseURL: "https://api.floopy.ai/v1",
  apiKey: process.env.FLOOPY_API_KEY,
});

const res = await client.chat.completions.create({
  model: 'auto',   // let Floopy pick the cheapest that holds quality
  messages,
});

// attach feedback later by response id
await fetch("https://api.floopy.ai/v1/feedback", {
  method: "POST",
  body: JSON.stringify({ id: res.id, score: 1 }),
});

from openai import OpenAI
import os, requests

client = OpenAI(
  base_url="https://api.floopy.ai/v1",
  api_key=os.environ["FLOOPY_API_KEY"],
)

res = client.chat.completions.create(
  model="auto",
  messages=messages,
)

# attach feedback later by response id
requests.post(
  "https://api.floopy.ai/v1/feedback",
  json={"id": res.id, "score": 1},
)

$ curl https://api.floopy.ai/v1/chat/completions \
  -H "Authorization: Bearer $FLOOPY_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{ "model": "auto", "messages": [...] }'

# Later: attach feedback by id
$ curl https://api.floopy.ai/v1/feedback \
  -H "Authorization: Bearer $FLOOPY_API_KEY" \
  -d '{ "id": "run_01h...", "score": 1 }'

Providers & integrations

20 providers.
One endpoint.

Bring your own provider keys, or use Floopy's defaults. OpenAI-compatible routing means your client code doesn't change when the best model does.

OpenAI

provider

Anthropic

provider

Gemini

provider

Groq

provider

Mistral

provider

DeepSeek

provider

xAI

provider

Perplexity

provider

Azure

provider

+11 more providers

See all providers →

Pricing

Four plans.
Pay for value captured.

Free until you're seeing real savings. After that, a simple monthly tier or a custom contract for enterprise.

Free

$0 /month

Explore Floopy with a limited free plan.

5,000 requests / month
20+ providers (OpenAI, Anthropic, Gemini…)
Exact cache + LLM Firewall firewall
7-day log retention

Start free

Starter

$29.90 /month

Build something real with your own feedback signal.

100k requests / month · 2k rpm
Feedback API · 500 submissions / mo
Semantic cache
30-day log retention

Pro · most popular

$199.90 /month

Feedback-driven routing at production scale.

Feedback-driven routing
Smart selectors + A/B testing
Advanced firewall (LLM Firewall)
2-year retention · 10k rpm

Start 30-day trial

Enterprise

Custom

Compliance, isolation, SLA and dedicated support.

SSO/SAML · SOC 2 · HIPAA
Dedicated SLA + Slack support
Opt-out of shared model
Dedicated tenant isolation

Talk to sales

FAQ

Frequently asked questions

How feedback-driven routing works, how Floopy differs from gateways and LLMOps tools, and the privacy questions you'll ask.

What is Floopy? +

Floopy is an AI Agent Optimization Platform with a closed feedback loop that uses four signal sources: end-user session NPS (primary), LLM-as-judge scoring across 4 dimensions (relevance, coherence, helpfulness, safety), admin manual ratings, and public benchmarks. These sources are combined with dynamic weights that shift based on data availability — benchmarks dominate on day 0, auto feedback takes over after 10 requests, session NPS becomes primary after 10 sessions with feedback.

How does feedback-driven routing work? +

Every request you send through Floopy carries a floopy-session-id header. You POST one NPS score per session to /v1/feedback, and that score propagates to every routing decision in that session — there's no per-response rating required. The router combines four signals (session NPS, LLM-as-judge, admin ratings, public benchmarks) with dynamic weights: benchmarks dominate on day 0, auto feedback takes over after 10 requests, session NPS becomes primary after 10 sessions with feedback. Lower cost, same quality, no new instrumentation.

Is Floopy an AI Gateway? +

Floopy includes AI gateway capabilities — OpenAI-compatible routing, caching, rate limiting, observability — but the core product is continuous agent optimization through feedback-driven routing. If you're evaluating gateways like Portkey, Helicone, or LiteLLM, the request proxying you get with Floopy is a subset of what we do. See /compare for the category breakdown.

What happens to my data in Free/Pro vs Enterprise? +

Free, Starter, and Pro organizations contribute aggregated routing signal — session NPS, LLM-as-judge scores, benchmark deltas, never raw prompts or completions — to a shared model that improves routing for everyone. Enterprise customers can opt out for isolated learning with no cross-tenant signal flow. Raw request and response logs are never shared across tenants under any plan.

Do I need to instrument anything new to use Floopy's feedback loop? +

No. If you already collect NPS, CSAT, or thumbs-up/down at the end of conversations, you just POST that score to Floopy's /v1/feedback endpoint with the session ID you're already passing in the floopy-session-id header. That's it. If you don't collect user feedback yet, Floopy still improves routing automatically via LLM-as-judge scoring on every request — you get the loop benefit even with zero user input. The loop works with whatever signal you have.

Why one rating per session instead of per response? +

Modern agents don't deliver value one response at a time. They reason, call tools, chain steps. A single response being "good" or "bad" often depends on decisions made three steps earlier. Floopy's router learns from the whole trajectory: when you rate a session 9/10, every routing decision in that session gets credit; when you rate 3/10, every decision gets learning signal about what to do differently. Per-request scoring misses this entirely. Per-request is available as an option if you want it, but it's not how the core optimization works.

Is Floopy an alternative to Portkey? +

Yes, if you're evaluating Portkey primarily for routing, caching, and observability. Floopy handles those the same way through a drop-in OpenAI-compatible endpoint. The difference is what happens after the request: Floopy uses the NPS score you POST to /v1/feedback to close a routing loop — future requests in similar sessions route to cheaper models when quality holds. Portkey is a gateway; Floopy is an optimization platform that includes the gateway.

How is Floopy different from Helicone? +

Helicone is excellent at per-request observability and developer feedback — each call gets its own rating. Floopy takes the opposite stance: one NPS score per session propagates across every routing decision in that session, because agent quality depends on the whole trajectory, not individual responses. If you want fine-grained request-level tracking, Helicone fits. If you want the router to learn from session-level end-user signal you already collect, Floopy fits.

Can I replace LiteLLM with Floopy? +

Yes for the proxying layer. LiteLLM is a best-in-class abstraction over 100+ providers; Floopy supports 20 providers through the same OpenAI-compatible interface and adds managed caching, firewall, rate limiting, and feedback-driven routing. If you're self-hosting LiteLLM purely for provider normalization, Floopy trades self-hosting flexibility for a managed optimization loop that learns from session NPS.

Is Floopy really faster than calling OpenAI directly? +

Yes. Our benchmarks show Floopy is 4.8% faster than direct OpenAI calls even with all features disabled — tested with the OpenAI Node.js SDK, 50 rounds, anti-cache timestamps, and isolated prompts across 10 languages. The Rust gateway's persistent connection pooling eliminates per-request TLS handshakes, saving more time than the gateway spends processing. Speed is table-stakes — the optimization loop on top is what compounds into cost savings over time.

How much memory does the gateway use? +

41MB average, 44MB peak — verified under benchmark load (350 requests across 7 scenarios). The Rust binary includes the LLM firewall and still uses less memory than a typical Python import chain. For comparison, Python-based gateways use 200–400MB at idle.

What about data privacy and PII? +

Request and response logs are automatically scrubbed for PII before storage — emails, CPFs, SSNs, credit cards, phone numbers, and API keys are replaced with redaction markers. Scrubbing runs asynchronously and never blocks your requests. The gateway and dashboard are architecturally separated — a compromised gateway cannot access user accounts or billing data.

Ship cheaper agents, today

Your users won't notice.
Your CFO will.

Start routing in under 10 minutes. Free up to 100k calls per month, no credit card.

Start free Book a 20-min demo

AI AgentOptimization Platform

Three lines of code.Feedback does the rest.

Point your client at Floopy

Attach a feedback signal

Ship. Watch cost drop.

Same quality bar.Half the bill.

Everything between promptand production.

Adaptive routing

Semantic cache

Feedback loops

Eval harness

Tracing

Guardrails

Drop-in, everywhereyou already are.

20 providers.One endpoint.

Four plans.Pay for value captured.

Frequently asked questions

Your users won't notice.Your CFO will.

AI Agent
Optimization Platform

Three lines of code.
Feedback does the rest.

Same quality bar.
Half the bill.

Everything between prompt
and production.

Drop-in, everywhere
you already are.

20 providers.
One endpoint.

Four plans.
Pay for value captured.

Your users won't notice.
Your CFO will.