Credibility v2: Now You Can Bound It, And We Will Explain It Back

In our v1 launch we said one thing and meant it: we don’t ask you to trust the magic, we ask you to inspect it. We shipped five APIs, a redesigned decision side-panel, a per-decision confidence number, declarative constraints, shadow-mode experiments, and a streaming export with a SHA-256 trailer. (Read the v1 post.)

Inspection was the first move. Today we’re shipping the second.

v2 says: now you can also bound it, and we’ll explain it back to you in your language.

Seven new features sit on the same audit-first foundation. Pro canary opens today. Public GA is in 14 days, gated on the post-canary review.

What’s new

1. Confidence Evidence — what the router knew, on every decision

Every Feedback-Driven and Smart-Cost decision now carries an evidence block alongside confidence and confidence_reason. Five fields: samples, top2_score_gap, outcome_variance, recent_regressions, last_regression_at. The same numbers that fed the formula, surfaced so a reviewer can audit the inputs, not just the verdict.

recent_regressions is bucketed — Exact{n} for n < 10, AtLeast{10}, AtLeast{50}. Enough signal to investigate, no precise volumes. last_regression_at is rounded to a 5-minute boundary for the same reason. The 7-day rolling window is org-scoped end to end. Read the full Confidence methodology.

2. Four new constraint gates

The PUT /v1/constraints body now accepts nine fields, up from three. The four new ones:

min_samples_before_promotion — refuse to let a candidate win below a sample floor.
max_outcome_variance — refuse a candidate whose outcomes are inconsistent.
max_cost_drop_without_validation — refuse a too-good-to-be-true candidate without a passing shadow first.
require_shadow_before_live — refuse any promotion that has not been shadow-validated.

Each gate emits a typed filtered[].reason and is documented end-to-end on the Constraints feature page.

3. Constraints CRUD UI

/routing/constraints in the dashboard renders the nine fields in three sections (quality limits, cost limits, promotion gates), each labelled with its FilteredReason, with a save button that diffs the previous state against the new state and links to the audit-log entry. Constraints used to live in code only; now they live in code and in a UI your security reviewer can read.

4. Verified Optimization

A new endpoint, GET /v1/optimization/verification, answers a strict, narrow question on the customer’s own request log: do we have enough evidence to call this route verified, in the last 7 days? The verdict is one of verified, not_verified, insufficient_data, regression_detected, with the underlying baseline and Floopy aggregates returned alongside.

The thresholds are pinned in code: SAMPLE_FLOOR = 100, QUALITY_TOLERANCE = 0.03. The endpoint is Redis-cached at 60 seconds and respects a Cache-Control: max-age=60 HTTP contract. The Verification Status card on the Baseline-vs-Floopy dashboard page reads the same endpoint. Methodology is on the Baseline-vs-Floopy page.

5. Human-readable Explanation

Every decision now carries an explanation.text paragraph rendered in the locale of the request’s Accept-Language (falling back to en). The text is composed from a closed taxonomy of 15 templates and a small bag of typed parameters. It is rendered at read time, never persisted in decision_trace, never references prompt content, capped at 600 characters, and gated against control characters. The resolved locale is echoed via the HTTP Content-Language response header. Currently supported: en and pt.

Two reasons to render at read time. First, security and storage: persisting prose forever would carry a stored-text injection surface forever. Second, internationalisation: a customer who reads the same decision in en today and pt tomorrow gets two different paragraphs from one stored decision, with no migration step. Read the Decision Explanation feature for the full taxonomy.

6. Experiments UI + Results Endpoint

Routing experiments are now a first-class dashboard surface. /routing/experiments ships a list view, a create form, a detail view with polling-driven baseline-vs-candidate panels, and a one-click rollback dialog. Polling cadence is 20 seconds for the first 10 minutes, 60 seconds afterwards, and pauses entirely on a hidden tab.

The detail view reads a new endpoint, GET /v1/experiments/{id}/results, which is Redis-cached at 30 seconds and bounded to the experiment’s started_at/ended_at lifetime. Curl-only experiment management still works, but it is no longer the headline path — the dashboard is.

7. Onboarding Shadow Setup

Floopy’s onboarding now offers a one-step shadow setup between “connect a provider” and “first call”. The step uses Floopy’s internal model catalogue to suggest a sensible cheaper alternative for the model you just connected, creates a shadow experiment in one click, and polls its results so you can see baseline-vs-candidate numbers populate without leaving the onboarding flow. On Free plans the step renders a preview that makes zero fetch calls — credibility for the Free user, on day zero.

The shape of v2

v1 was about exposing what the router did. v2 is about bounding what it is allowed to do, and explaining it back to you in your language.

You can pin nine declarative gates and the router will refuse to violate them. You can ask, on your own traffic, whether a route is verified — and get a no when the evidence isn’t there. You can read, in plain prose, in en or pt, why a decision went the way it did, and the prose is composed from a closed template set with no caller-controlled substrings. You can run a shadow experiment from the UI in two clicks. You can do all of it on the same day you sign up.

What we did not do

We did not add LLM-generated explanations. The text comes from a closed match over (template_id, locale), not from a language model. We did not add raw prompt or completion bodies to any endpoint or export — the v1 contract holds. We did not add a customer-tunable verification window other than 7 days, or a customer-tunable confidence formula, or a competitor-table edit, or a pricing-page change.

We did not break any v1 wire shape. Every new field is additive and optional. v1 readers ignore unknown keys; v2 readers tolerate v1 rows.

How to get it

Pro canary opens today. Pro plan customers can opt in by emailing support@floopy.ai with their organization ID. Public GA is in 14 days, gated on the post-canary review.

If you read v1 and the answer was “close — keep going”, this is the keep-going. We told you what the router was doing. Now you can tell the router what it is allowed to do, and the router will tell you back, in your language, what it did.

That’s the shape of v2.