AI Agent
Optimization Platform
Floopy sits between your agent and the model providers. Every call gets routed to the cheapest model that still passes your quality bar — learned from real user feedback, not vibes.
Three lines of code.
Feedback does the rest.
Wrap your LLM call, wire up a thumbs-up signal, and Floopy starts learning which model is enough for each route. No prompt rewriting, no eval authoring, no pipeline changes.
Point your client at Floopy
One-line change. Use the OpenAI SDK you already ship — just swap the base URL. Anthropic, Google and Mistral work via the same OpenAI-compatible wire format.
import OpenAI from 'openai'; const client = new OpenAI({ baseURL: "https://api.floopy.ai/v1", apiKey: process.env.FLOOPY_API_KEY, });
Attach a feedback signal
Thumbs up/down, task completion, rewrite count — anything you already log. One metric is enough to bootstrap.
await fetch("https://api.floopy.ai/v1/feedback", { method: "POST", body: JSON.stringify({ id: run.id, score: 1, reason: "resolved", }), });
Ship. Watch cost drop.
Floopy tests cheaper models behind a canary, promotes what passes your quality bar, and rolls back on regression.
// After 48h shadow traffic gpt-4o → -62% claude-sonnet → -41% quality → +0.3σ
Same quality bar.
Half the bill.
Measured on 12,000 production traces from a customer-support agent. Quality held within one standard deviation of the GPT-4o baseline.
Read full methodology →Everything between prompt
and production.
Not another observability dashboard. Floopy actively intervenes — routing, caching, fallback, canarying — while giving you the traces and evals to trust what it's doing.
Adaptive routing
Per-route policy learned from feedback. Pins on regression, canaries on drift, rebases on new models the week they ship.
Semantic cache
Fingerprints request + tools + context. Exact and paraphrase-match hits, versioned per route, TTL per signal.
Feedback loops
Thumbs, rewrites, completion, NPS — whatever you already collect. Offline RLHF without a data team.
Eval harness
LLM-as-judge, rubric evals, and golden sets. Runs on every promotion candidate before traffic sees it.
Tracing
Every tool call, token, and judgement. OpenTelemetry-native, export to Datadog, Honeycomb, or S3.
Guardrails
PII redaction, prompt-injection detection, region pinning, and per-tenant budget caps. On by default.
Drop-in, everywhere
you already are.
Use the OpenAI SDK you already ship in Node, Python, Go, or Deno. Stream-safe, tool-calling-safe, and compatible with every provider that speaks the OpenAI wire format.
import OpenAI from 'openai'; const client = new OpenAI({ baseURL: "https://api.floopy.ai/v1", apiKey: process.env.FLOOPY_API_KEY, }); const res = await client.chat.completions.create({ model: 'auto', // let Floopy pick the cheapest that holds quality messages, }); // attach feedback later by response id await fetch("https://api.floopy.ai/v1/feedback", { method: "POST", body: JSON.stringify({ id: res.id, score: 1 }), });
from openai import OpenAI import os, requests client = OpenAI( base_url="https://api.floopy.ai/v1", api_key=os.environ["FLOOPY_API_KEY"], ) res = client.chat.completions.create( model="auto", messages=messages, ) # attach feedback later by response id requests.post( "https://api.floopy.ai/v1/feedback", json={"id": res.id, "score": 1}, )
$ curl https://api.floopy.ai/v1/chat/completions \ -H "Authorization: Bearer $FLOOPY_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "auto", "messages": [...] }' # Later: attach feedback by id $ curl https://api.floopy.ai/v1/feedback \ -H "Authorization: Bearer $FLOOPY_API_KEY" \ -d '{ "id": "run_01h...", "score": 1 }'
20 providers.
One endpoint.
Bring your own provider keys, or use Floopy's defaults. OpenAI-compatible routing means your client code doesn't change when the best model does.
Four plans.
Pay for value captured.
Free until you're seeing real savings. After that, a simple monthly tier or a custom contract for enterprise.
- 5,000 requests / month
- 20+ providers (OpenAI, Anthropic, Gemini…)
- Exact cache + LLM Firewall firewall
- 7-day log retention
- 100k requests / month · 2k rpm
- Feedback API · 500 submissions / mo
- Semantic cache
- 30-day log retention
- Feedback-driven routing
- Smart selectors + A/B testing
- Advanced firewall (LLM Firewall)
- 2-year retention · 10k rpm
- SSO/SAML · SOC 2 · HIPAA
- Dedicated SLA + Slack support
- Opt-out of shared model
- Dedicated tenant isolation
Frequently asked questions
How feedback-driven routing works, how Floopy differs from gateways and LLMOps tools, and the privacy questions you'll ask.
What is Floopy? +
How does feedback-driven routing work? +
floopy-session-id header. You POST one NPS score per session to /v1/feedback, and that score propagates to every routing decision in that session — there's no per-response rating required. The router combines four signals (session NPS, LLM-as-judge, admin ratings, public benchmarks) with dynamic weights: benchmarks dominate on day 0, auto feedback takes over after 10 requests, session NPS becomes primary after 10 sessions with feedback. Lower cost, same quality, no new instrumentation.Is Floopy an AI Gateway? +
What happens to my data in Free/Pro vs Enterprise? +
Do I need to instrument anything new to use Floopy's feedback loop? +
floopy-session-id header. That's it. If you don't collect user feedback yet, Floopy still improves routing automatically via LLM-as-judge scoring on every request — you get the loop benefit even with zero user input. The loop works with whatever signal you have.Why one rating per session instead of per response? +
Is Floopy an alternative to Portkey? +
/v1/feedback to close a routing loop — future requests in similar sessions route to cheaper models when quality holds. Portkey is a gateway; Floopy is an optimization platform that includes the gateway.How is Floopy different from Helicone? +
Can I replace LiteLLM with Floopy? +
Is Floopy really faster than calling OpenAI directly? +
How much memory does the gateway use? +
What about data privacy and PII? +
Your users won't notice.
Your CFO will.
Start routing in under 10 minutes. Free up to 100k calls per month, no credit card.