Skip to content
Get Started
Why Floopy

Floopy isn't an AI Gateway. It's an AI Agent Optimization Platform.

Category matters because it sets expectations. An AI gateway proxies requests; an observability tool logs them; middleware wraps SDKs; a feedback-loop LLMOps platform closes the routing decision against real outcomes. Floopy includes proxy, observability, and middleware features, but the core product is continuous agent optimization — see exactly where we fit and what's different below.

Capability matrix

Category capability matrix

Five categories, not four — feedback-loop LLMOps deserves its own column.

CapabilityAI GatewayObservabilityMiddlewareFeedback-loop LLMOpsFloopy
Static routing rules
Feedback-driven routing
Observability
Rule-based fallback
Learned fallback from production
Feedback sourcesSingle (binary)Developer metricsNoneDeveloper metricsFour sources with dynamic weights
Feedback granularityPer-requestPer-request or traceN/APer-request or traceSession-level propagation
ArchitectureManaged proxySDK + backendSDK wrapperSelf-hosted (TensorZero)Managed SaaS with opt-out
Per-request cost tracking
Per-session ROI measurementPartial

Static routing rules

AI Gateway
Observability
Middleware
Feedback-loop LLMOps
Floopy

Feedback-driven routing

AI Gateway
Observability
Middleware
Feedback-loop LLMOps
Floopy

Observability

AI Gateway
Observability
Middleware
Feedback-loop LLMOps
Floopy

Rule-based fallback

AI Gateway
Observability
Middleware
Feedback-loop LLMOps
Floopy

Learned fallback from production

AI Gateway
Observability
Middleware
Feedback-loop LLMOps
Floopy

Feedback sources

AI Gateway
Single (binary)
Observability
Developer metrics
Middleware
None
Feedback-loop LLMOps
Developer metrics
Floopy
Four sources with dynamic weights

Feedback granularity

AI Gateway
Per-request
Observability
Per-request or trace
Middleware
N/A
Feedback-loop LLMOps
Per-request or trace
Floopy
Session-level propagation

Architecture

AI Gateway
Managed proxy
Observability
SDK + backend
Middleware
SDK wrapper
Feedback-loop LLMOps
Self-hosted (TensorZero)
Floopy
Managed SaaS with opt-out

Per-request cost tracking

AI Gateway
Observability
Middleware
Feedback-loop LLMOps
Floopy

Per-session ROI measurement

AI Gateway
Observability
Middleware
Feedback-loop LLMOps
Partial
Floopy
Named comparisons

Named comparisons

Where Floopy stands next to the tools you might be comparing — with strengths acknowledged and category drawn clearly.

Portkey

Portkey is a solid AI gateway with prompt management, observability, and request routing — widely used by teams that need unified LLM access and cost tracking. Floopy includes those gateway capabilities, but the product is continuous agent optimization: session-level feedback propagation plus dynamic multi-source weighting decide which model handles each call. If you need proxying and a prompt library, Portkey fits; if you want the gateway to also learn from end-user signal, Floopy is the fit.

Helicone

Helicone is a strong observability layer — per-request logging, caching, and feedback APIs for debugging and fine-tune data collection. Floopy also logs every request and accepts scores, but the unit of learning is different: one NPS per session propagated to every routing decision in that session, instead of per-request thumbs up/down. If your goal is per-request debug data, Helicone works well; if your goal is session-level routing that improves against end-user outcomes, Floopy is built for it.

LiteLLM

LiteLLM is an excellent open-source proxy for unifying multi-provider SDK calls with retry and fallback rules — a natural fit if you run your own infrastructure and want static routing. Floopy is managed SaaS and goes further: the router learns from session NPS, LLM-as-judge scoring, manual ratings, and public benchmarks, with weights that shift as signal accumulates. Use LiteLLM when you want self-hosted proxy ergonomics; use Floopy when you want feedback-driven routing without running the infrastructure.

Maxim

Maxim focuses on evaluation, experimentation, and prompt testing — a helpful tool during development to compare model outputs and measure prompt quality offline. Floopy is a production-time feedback loop: your live session NPS and auto scoring continuously re-rank models so routing improves after deploy, not just before. Maxim and Floopy are complementary — eval pipelines on one side, runtime optimization on the other.

Bifrost

Bifrost is a fast Rust LLM gateway focused on low-latency request proxying. Floopy keeps latency overhead low too (see the benchmark page), but the core difference is what the gateway does with that latency budget: Floopy runs a feedback-driven routing decision per request informed by session-level signal, rather than a purely static proxy. If you need the thinnest possible proxy, Bifrost wins on latency; if you want a gateway that learns, Floopy is designed for it.

TensorZero

TensorZero pioneered the open-source feedback-loop approach in 2024 with excellent engineering and a self-hosted architecture. If your team has the DevOps capacity and wants full infrastructure control, it's a solid choice. Floopy takes a different path: managed SaaS, session-level end-user NPS as the primary signal (rather than developer-defined metrics), and cross-tenant intelligence that improves every customer's routing as the platform grows. Choose based on whether you want to run infrastructure yourself and what feedback source you trust most.

Design principles

Four rules we don't break.

These are the constraints we hold the product to. When something else has to give, these don't.

01 / Quality first

Never trade quality for cost without your say.

Every candidate route ships behind a canary and an eval bar. Regressions roll back in seconds, not sprints.

quality = 0.95 → hard floor
02 / Zero lock-in

One flag turns us off.

Floopy is a drop-in baseURL for the OpenAI SDK you already ship. Ejecting is a one-line config change, not a migration project.

baseURL: "https://api.openai.com/v1" → passthrough
03 / Show your work

Every decision is explainable.

Every routed call has a reason string. Every promotion has a diff. Every rollback has a trace.

span.floopy_reason = "haiku::cached"
04 / Paid on outcomes

If it didn't save, don't charge.

We bill against a measured baseline, not a sticker price. Customers see the line item; so do we.

invoice = max(0, savings × 0.15)
How it sits

Inline, but out of the way.

Floopy is a thin control plane between your app and the providers you already use. It owns routing, caching, and feedback — nothing else. Your prompts, logic, and tools stay in your code.

Overhead p50 3.1ms
Overhead p99 7.8ms
Streaming First-token preserved
Zero-retention mode Available
FAQ

Common questions

Deeper answers on how Floopy relates to TensorZero, Portkey, Helicone, and LiteLLM — and on session-level vs per-request feedback.

Is Floopy an alternative to Portkey?+
Same gateway primitives, different scope. What is the same: OpenAI-compatible endpoint, multi-provider routing, caching, rate limiting, request/response logging, an LLM firewall, an audit dashboard. What differs: Floopy ships a feedback-driven router (session NPS to routing decisions), per-decision confidence on every audit row, a dry-run endpoint (POST /v1/routing/explain), customer-declared optimization constraints (PUT /v1/constraints), shadow-mode experiments (POST /v1/experiments), and a JSONL export of decisions with a SHA-256 trailer (GET /v1/export/decisions). Portkey leans further into prompt management and agent traces; Floopy leans into closing the loop between user-felt outcomes and routing. If you mainly need a configurable gateway, Portkey covers it. If you also want a router that learns from outcomes you already collect, Floopy fits.
How is Floopy different from Helicone?+
Helicone is excellent at per-request observability and developer feedback — each call gets its own rating. Floopy takes the opposite stance: one NPS score per session propagates across every routing decision in that session, because agent quality depends on the whole trajectory, not individual responses. If you want fine-grained request-level tracking, Helicone fits. If you want the router to learn from session-level end-user signal you already collect, Floopy fits.
Can I replace LiteLLM with Floopy?+
For the proxy layer, mostly. Floopy supports 20 providers — OpenAI, Anthropic, Google Gemini, Google Vertex AI, AWS Bedrock, Azure OpenAI, DeepSeek, Mistral, xAI, Groq, Cerebras, SambaNova, Together AI, Fireworks AI, Perplexity, Cohere, AI21 Labs, DeepInfra, Nebius, Novita — through one OpenAI-compatible endpoint. Streaming, tool calls, and vision inputs are covered for the majors (OpenAI, Anthropic, Gemini, Bedrock); for the others, streaming and tools are supported, vision coverage varies by model. LiteLLM supports more providers (100+) and is open-source if you want to self-host. Floopy trades raw provider count for a managed feedback loop, an LLM firewall, an audit API, and a decision export. If you self-host LiteLLM purely for normalization, both work; if you want the loop, that is what Floopy adds.
Is Floopy an alternative to TensorZero?+
Same problem space, different posture on data pooling. TensorZero is open-source and self-hosted, so all signal stays inside your infrastructure. Floopy is managed SaaS with cross-tenant aggregation on a narrow set of fields: aggregated session NPS, LLM-as-judge scores per (provider, model), and benchmark deltas. What is NOT pooled: raw prompts, raw completions, organization identifiers, user identifiers, request bodies, response bodies. Aggregated signals influence the shared priors that warm up Day-0 phase for new orgs; once your org has its own data, your weights dominate. Enterprise can opt out of pooling entirely. Pick TensorZero if you must run the gateway yourself or cannot share any signal at all. Pick Floopy if you want managed infrastructure and the speed-up of warm priors that come from aggregated, non-PII signal.
Why one rating per session instead of per response?+
Because agent quality depends on the trajectory, not the response. Tool calls, retries, and reasoning steps inside one session all influence the final outcome the user feels — so one NPS per session credits or penalises every routing decision in that session. Per-request signal is still available: every request gets an LLM-as-judge score across four dimensions (relevance, coherence, helpfulness, safety), recorded on the decision_trace and queryable via GET /v1/decisions/{id}. You can also POST per-request feedback to /v1/feedback with a request id — the router accepts it and weights it under the Auto phase. The default is session NPS because that is what most teams already collect; the option to attach per-request signal is real and shipped, just not the headline path.

Ready to close the loop?

Start on Free or talk to us about Enterprise isolation and custom model ranking.

Start free Talk to Enterprise