Skip to content
Ffloopy
Why Floopy

Floopy isn't an AI Gateway. It's an AI Agent Optimization Platform.

Category matters because it sets expectations. An AI gateway proxies requests; an observability tool logs them; middleware wraps SDKs; a feedback-loop LLMOps platform closes the routing decision against real outcomes. Floopy includes proxy, observability, and middleware features, but the core product is continuous agent optimization — see exactly where we fit and what's different below.

Capability matrix

Category capability matrix

Five categories, not four — feedback-loop LLMOps deserves its own column.

CapabilityAI GatewayObservabilityMiddlewareFeedback-loop LLMOpsFloopy
Static routing rules
Feedback-driven routing
Observability
Rule-based fallback
Learned fallback from production
Feedback sourcesSingle (binary)Developer metricsNoneDeveloper metricsFour sources with dynamic weights
Feedback granularityPer-requestPer-request or traceN/APer-request or traceSession-level propagation
ArchitectureManaged proxySDK + backendSDK wrapperSelf-hosted (TensorZero)Managed SaaS with opt-out
Per-request cost tracking
Per-session ROI measurementPartial

Static routing rules

AI Gateway
Observability
Middleware
Feedback-loop LLMOps
Floopy

Feedback-driven routing

AI Gateway
Observability
Middleware
Feedback-loop LLMOps
Floopy

Observability

AI Gateway
Observability
Middleware
Feedback-loop LLMOps
Floopy

Rule-based fallback

AI Gateway
Observability
Middleware
Feedback-loop LLMOps
Floopy

Learned fallback from production

AI Gateway
Observability
Middleware
Feedback-loop LLMOps
Floopy

Feedback sources

AI Gateway
Single (binary)
Observability
Developer metrics
Middleware
None
Feedback-loop LLMOps
Developer metrics
Floopy
Four sources with dynamic weights

Feedback granularity

AI Gateway
Per-request
Observability
Per-request or trace
Middleware
N/A
Feedback-loop LLMOps
Per-request or trace
Floopy
Session-level propagation

Architecture

AI Gateway
Managed proxy
Observability
SDK + backend
Middleware
SDK wrapper
Feedback-loop LLMOps
Self-hosted (TensorZero)
Floopy
Managed SaaS with opt-out

Per-request cost tracking

AI Gateway
Observability
Middleware
Feedback-loop LLMOps
Floopy

Per-session ROI measurement

AI Gateway
Observability
Middleware
Feedback-loop LLMOps
Partial
Floopy
Named comparisons

Named comparisons

Where Floopy stands next to the tools you might be comparing — with strengths acknowledged and category drawn clearly.

Portkey

Portkey is a solid AI gateway with prompt management, observability, and request routing — widely used by teams that need unified LLM access and cost tracking. Floopy includes those gateway capabilities, but the product is continuous agent optimization: session-level feedback propagation plus dynamic multi-source weighting decide which model handles each call. If you need proxying and a prompt library, Portkey fits; if you want the gateway to also learn from end-user signal, Floopy is the fit.

Helicone

Helicone is a strong observability layer — per-request logging, caching, and feedback APIs for debugging and fine-tune data collection. Floopy also logs every request and accepts scores, but the unit of learning is different: one NPS per session propagated to every routing decision in that session, instead of per-request thumbs up/down. If your goal is per-request debug data, Helicone works well; if your goal is session-level routing that improves against end-user outcomes, Floopy is built for it.

LiteLLM

LiteLLM is an excellent open-source proxy for unifying multi-provider SDK calls with retry and fallback rules — a natural fit if you run your own infrastructure and want static routing. Floopy is managed SaaS and goes further: the router learns from session NPS, LLM-as-judge scoring, manual ratings, and public benchmarks, with weights that shift as signal accumulates. Use LiteLLM when you want self-hosted proxy ergonomics; use Floopy when you want feedback-driven routing without running the infrastructure.

Maxim

Maxim focuses on evaluation, experimentation, and prompt testing — a helpful tool during development to compare model outputs and measure prompt quality offline. Floopy is a production-time feedback loop: your live session NPS and auto scoring continuously re-rank models so routing improves after deploy, not just before. Maxim and Floopy are complementary — eval pipelines on one side, runtime optimization on the other.

Bifrost

Bifrost is a fast Rust LLM gateway focused on low-latency request proxying. Floopy keeps latency overhead low too (see the benchmark page), but the core difference is what the gateway does with that latency budget: Floopy runs a feedback-driven routing decision per request informed by session-level signal, rather than a purely static proxy. If you need the thinnest possible proxy, Bifrost wins on latency; if you want a gateway that learns, Floopy is designed for it.

TensorZero

TensorZero pioneered the open-source feedback-loop approach in 2024 with excellent engineering and a self-hosted architecture. If your team has the DevOps capacity and wants full infrastructure control, it's a solid choice. Floopy takes a different path: managed SaaS, session-level end-user NPS as the primary signal (rather than developer-defined metrics), and cross-tenant intelligence that improves every customer's routing as the platform grows. Choose based on whether you want to run infrastructure yourself and what feedback source you trust most.

Design principles

Four rules we don't break.

These are the constraints we hold the product to. When something else has to give, these don't.

01 / Quality first

Never trade quality for cost without your say.

Every candidate route ships behind a canary and an eval bar. Regressions roll back in seconds, not sprints.

quality = 0.95 → hard floor
02 / Zero lock-in

One flag turns us off.

Floopy is a drop-in baseURL for the OpenAI SDK you already ship. Ejecting is a one-line config change, not a migration project.

baseURL: "https://api.openai.com/v1" → passthrough
03 / Show your work

Every decision is explainable.

Every routed call has a reason string. Every promotion has a diff. Every rollback has a trace.

span.floopy_reason = "haiku::cached"
04 / Paid on outcomes

If it didn't save, don't charge.

We bill against a measured baseline, not a sticker price. Customers see the line item; so do we.

invoice = max(0, savings × 0.15)
How it sits

Inline, but out of the way.

Floopy is a thin control plane between your app and the providers you already use. It owns routing, caching, and feedback — nothing else. Your prompts, logic, and tools stay in your code.

Overhead p50 3.1ms
Overhead p99 7.8ms
Streaming First-token preserved
Zero-retention mode Available
FAQ

Common questions

Deeper answers on how Floopy relates to TensorZero, Portkey, Helicone, and LiteLLM — and on session-level vs per-request feedback.

Is Floopy an alternative to Portkey?+
Yes, if you're evaluating Portkey primarily for routing, caching, and observability. Floopy handles those the same way through a drop-in OpenAI-compatible endpoint. The difference is what happens after the request: Floopy uses the NPS score you POST to /v1/feedback to close a routing loop — future requests in similar sessions route to cheaper models when quality holds. Portkey is a gateway; Floopy is an optimization platform that includes the gateway.
How is Floopy different from Helicone?+
Helicone is excellent at per-request observability and developer feedback — each call gets its own rating. Floopy takes the opposite stance: one NPS score per session propagates across every routing decision in that session, because agent quality depends on the whole trajectory, not individual responses. If you want fine-grained request-level tracking, Helicone fits. If you want the router to learn from session-level end-user signal you already collect, Floopy fits.
Can I replace LiteLLM with Floopy?+
Yes for the proxying layer. LiteLLM is a best-in-class abstraction over 100+ providers; Floopy supports 20 providers through the same OpenAI-compatible interface and adds managed caching, firewall, rate limiting, and feedback-driven routing. If you're self-hosting LiteLLM purely for provider normalization, Floopy trades self-hosting flexibility for a managed optimization loop that learns from session NPS.
Is Floopy an alternative to TensorZero?+
Same space, different design choices. TensorZero is open-source and self-hosted, optimized for teams that want full infrastructure control and are comfortable running their own gateway. Floopy is managed SaaS with cross-tenant learning — your routing gets smarter because every Floopy customer's signal improves the shared model (Enterprise can opt out). Floopy also defaults to end-user session NPS as the primary feedback source, while TensorZero is typically wired to developer-defined metrics and human feedback. Both are valid — the choice depends on whether you want to run infrastructure yourself and on what signal you trust more.
Why one rating per session instead of per response?+
Modern agents don't deliver value one response at a time. They reason, call tools, chain steps. A single response being "good" or "bad" often depends on decisions made three steps earlier. Floopy's router learns from the whole trajectory: when you rate a session 9/10, every routing decision in that session gets credit; when you rate 3/10, every decision gets learning signal about what to do differently. Per-request scoring misses this entirely. Per-request is available as an option if you want it, but it's not how the core optimization works.

Ready to close the loop?

Start on Free or talk to us about Enterprise isolation and custom model ranking.

Start free Talk to Enterprise