Credibility, by Design: Floopy Now Exposes Every Routing Decision

We don’t ask you to trust the magic. We ask you to inspect it.

Today we’re shipping the largest credibility surface Floopy has ever had: five new public APIs, a redesigned decision side-panel in the dashboard, a confidence field on every routing decision, declarative constraints you can pin in code, shadow-mode experiments, and a streaming export of your full decision history with a SHA-256 trailer. The Pro canary opens today. Public GA is in 14 days.

The problem we kept hearing

For most of the last year, the most uncomfortable line on a Floopy sales call has been some variation of: “How do I know what your router is actually doing?”

It was a fair question and the answer wasn’t good enough. You could see aggregate cost charts. You could see latency. You could see savings. What you couldn’t see was the per-request decision: which model the router picked, which alternatives it considered, what signals tipped it, and how confident it was. The router shipped good outcomes; it didn’t ship the receipts.

For platform engineers wiring Floopy into a SIEM, ML leads chasing a bad chatbot turn, FinOps leads auditing a sudden cost drop, and security reviewers running vendor risk — the absence of those receipts was a real adoption blocker. So we built them.

What’s new

1. Decisions you can audit

GET /v1/decisions/{id} returns the full decision record for any single request: the chosen model, the alternatives considered, every signal that fed the choice, the constraints that were active, and the confidence the router had at the moment of the call. GET /v1/decisions lists decisions with org-scoped filtering and a stable cursor. The redesigned decision side-panel in app.floopy.ai renders the same record visually — drill in from any row in the request log and see exactly why your traffic went where it went.

Both endpoints are gated on Pro for the list/filter surface, but GET /v1/decisions/{id} is available to Free as well, because credibility for the Free user is the path to Pro.

2. Confidence you can act on

Every decision now carries a confidence field — a bounded score that captures how much signal mass the router had behind its choice. On Day-0 traffic (a model the router has never seen in your org), confidence is hard-capped well below the steady-state ceiling, so a brand-new candidate cannot dominate routing on benchmark numbers alone. The Quality×Cost scatter view in the dashboard plots confidence as point opacity — high-confidence decisions are vivid; the low-confidence tail is faded so you can find it at a glance.

You can also turn confidence into a hard rule (next section).

3. Constraints you can declare

PUT /v1/constraints lets you pin three guardrails in code, per organization:

max_regression — refuse to route to a candidate whose measured quality is more than X percentage points below your current default.
max_cost_increase — refuse to route to a candidate that would raise rolling cost by more than Y%.
confidence_threshold — refuse to route to any candidate whose confidence is below a floor you set.

Each PUT is recorded in the audit_events log with the actor, the previous values, and the new values. The router reads constraints on every routing decision and respects them silently — no surprise upgrades, no surprise downgrades, no opaque trade-offs.

4. Experiments you can shadow

POST /v1/experiments lets you run a candidate model in shadow mode alongside your production route: the router answers production traffic from the live winner, but also fires the same prompt at the candidate, scores both, and writes the comparison to your decision history. Nothing user-visible changes. You can roll back any active experiment with one call, and rollback is itself an audit event.

Shadow mode is how we recommend you evaluate any new candidate — including new releases from the major providers — before you let it touch real traffic.

5. Data you can take with you

GET /v1/export/decisions streams your decision history as JSONL with a SHA-256 trailer: the last line of the export is the digest of every preceding line, so you can verify the export was complete and unmodified before you load it into your own warehouse. The export is concurrency-limited per org (one stream at a time) and capped on row count to protect both you and us, but inside those limits you can take everything we’ve recorded about your routing — and you should.

There is no proprietary blob, no opaque dump, no “talk to support to export.” Your decisions belong to you.

How to verify the math, not just trust it

The methodology pages at floopy.ai/docs/ are the long version of this post:

Confidence: /docs/methodology/confidence — the formula, the Day-0 cap, the shared-information ceiling, and the rationale for each parameter.
Baseline vs Floopy: /docs/methodology/baseline-vs-floopy — how we compute the savings number that shows up on your dashboard, including the prompts we replay, the windowing rules, and the failure modes we explicitly do not paper over.

Every endpoint also has a full reference page with a curl example, a JSON sample, and an error matrix at /docs/api/.

The honest part

We are not claiming “perfect routing.” The router makes mistakes. It will keep making mistakes. The point of this release is not that the mistakes go away — it’s that you can now find them, name them, and bound them.

When the router gets a decision wrong, the audit record makes the wrongness visible in the side panel. When the model mix shifts in a way you didn’t authorize, the constraints API stops it before it ships. When a new candidate looks tempting on benchmarks but flops on your traffic, shadow mode catches it without exposing a single user. When you want to walk away with everything we know about your traffic, the export does that with a checksum.

That’s the shape of the deal. We tell you what the router is doing, you set the rules it has to play by, and you can leave with your data at any time.

How to get it

The new endpoints, the dashboard side-panel, the Quality×Cost scatter, the Baseline-vs-Floopy view, and the FAQ rewrites are live behind a canary flag today. Pro plan customers can opt in to the canary by emailing support@floopy.ai with their organization ID. Public GA is in 14 days, gated on the post-canary review described in the canary rollout runbook.

If you’re evaluating Floopy and “we couldn’t audit the router” was on your blocker list — that one’s closed.