Skip to content

POST /v1/routing/explain

Dry-run the routing selection for a candidate request body. The gateway runs the full strategy chain — including the LLM firewall — and returns the candidates, weights, the model that would have been selected, and the router’s confidence. No provider is called. No decision_trace row is emitted. No rate-limit budget is spent on the chat path.

Designed for CI checks (“does this prompt route the way we expect?”), pre-sale demos, and constraint-tuning experiments.

POST https://api.floopy.ai/v1/routing/explain
Authorization: Bearer <your-floopy-api-key>
Content-Type: application/json
  • Permission: write_permission. The endpoint accepts a JSON payload (hence write), but it is non-mutating — no provider call, no log row, no counter increment.
  • Pro plan (requires Feedback-Driven routing).
  • We are rolling this out per organization while we validate quality. Contact support if your organization is not yet enabled.

Body cap: 64 KiB. Larger payloads return 400 body_too_large.

{
"request": {
"model": "gpt-5.4",
"messages": [
{ "role": "user", "content": "Summarise this incident report in 3 bullets." }
],
"temperature": 0.3
},
"headers": {
"floopy-session-id": "sess_demo_1"
}
}
FieldTypeRequiredConstraints
requestobjectYesA standard OpenAI-compatible chat-completion body (same shape as POST /v1/chat/completions).
headersobjectNoA subset of headers to forward to the routing context. Reserved keys are dropped server-side (see Header sanitisation).

Unknown top-level keys return 400 invalid_body (the request schema is deny_unknown_fields).

Before the headers map is passed into the dry-run engine, entries whose lowercase key matches the following are dropped:

  • exact match: authorization, cookie, host
  • prefix match: x-forwarded-, x-real-ip

This keeps the explain endpoint from being abused as a header-relay or fingerprinting tool.

{
"dry_run": true,
"strategy_id": "feedback_driven",
"phase": "auto",
"weights": { "session": 0.5, "auto": 0.3, "manual": 0.1, "benchmark": 0.1 },
"candidates": [
{ "provider": "openai", "model": "gpt-5.4-mini", "score": 0.81 },
{ "provider": "anthropic", "model": "claude-sonnet-4", "score": 0.78 }
],
"filtered": [
{ "provider": "openai", "model": "gpt-5.4", "reason": "constraint_max_cost_increase", "score": 0.86 }
],
"would_select": { "provider": "openai", "model": "gpt-5.4-mini" },
"reason": "dispatched",
"confidence": 0.78,
"confidence_reason": "ok",
"exploration_rate_effective": 0.05,
"used_shared_pool_prior": false,
"evidence": {
"samples": 412,
"top2_score_gap": 0.07,
"outcome_variance": 0.04,
"recent_regressions": { "kind": "exact", "exact": 1 },
"last_regression_at": "2026-05-04T18:25:00Z"
},
"explanation": {
"text": "Floopy would route this request to openai/gpt-5.4-mini based on 412 historical samples and a moderate confidence of 0.78. The next candidate scored within 0.07 points and outcome variance has been stable. One regression was recorded in the last 7 days.",
"template_id": "feedback_driven_moderate_confidence"
}
}

The response also carries a Content-Language header echoing the locale used to render explanation.text. See the Decision Explanation feature.

FieldTypeDescription
dry_runbooleanAlways true.
strategy_idstringStrategy that ran (feedback_driven, smart_cost, etc.).
phasestring | nullFeedback-Driven phase.
weightsobject | nullSignal weights (Feedback-Driven only).
candidatesarrayConsidered models and their scores.
filteredarrayRemoved candidates. reason is a closed enum (see GET /v1/decisions/{request_id}).
would_selectobject | nullThe (provider, model) the gateway would dispatch to. null when the firewall blocked the request.
reasonstringdispatched, exhausted, or no_enabled_targets.
confidencenumber | nullSee Confidence methodology.
confidence_reasonstring | nullClosed enum: ok, cap_day0, cap_shared, no_router_invoked, insufficient_samples, single_candidate.
exploration_rate_effectivenumberEffective exploration rate.
used_shared_pool_priorbooleanWhether cross-tenant priors influenced the score.
evidenceobject | nullInputs that drove the confidence number, surfaced for audit (Feedback-Driven and Smart-Cost only). null when no router was invoked. See Confidence methodology.
evidence.samplesintegerRolling sample count.
evidence.top2_score_gapnumberScore gap between winner and runner-up.
evidence.outcome_variancenumberRolling outcome variance for the winner.
evidence.recent_regressionstagged unionBucketed regression count over the winner’s (provider, model) in the last 7 days. kind: "exact" with exact field for n < 10; kind: "at_least" with at_least field for 10 and 50 thresholds.
evidence.last_regression_atISO8601 | nullMost recent regression in the 7-day window, rounded to the nearest 5-minute boundary. null when none.
explanationobject | nullHuman-readable explanation of the dry-run decision. See Decision Explanation.
explanation.textstringPlain-prose paragraph (≤ 600 chars). Never references prompt content.
explanation.template_idstringClosed-enum template id.

The response carries:

  • Content-Language: en or Content-Language: pt — the locale used to render explanation.text. Resolved from the request’s Accept-Language header, falling back to en. See Decision Explanation.

The LLM firewall runs on the dry-run path. If it blocks, the response is still 200, with:

{ "filtered": [{ "provider": "...", "model": "...", "reason": "firewall_blocked", "score": 0.0 }],
"would_select": null,
"reason": "no_enabled_targets" }

This is intentional — the dry-run faithfully reflects what would happen in production, including the firewall.

Statuserror codeWhen
400invalid_bodyBody fails schema validation or is not JSON.
400body_too_largeBody exceeds 64 KiB.
403write_permission / plan_requiredPermission or plan flag missing.
404no_routeNo routing rule applies to the org and model.
429rate_limitedExceeded per-org or per-key budget. Carries Retry-After.
5xxinternalUpstream failure.
Terminal window
curl -s -X POST \
-H "Authorization: Bearer $FLOOPY_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"request": {
"model": "gpt-5.4",
"messages": [{"role":"user","content":"Summarise this incident report in 3 bullets."}],
"temperature": 0.3
},
"headers": {"floopy-session-id":"sess_demo_1"}
}' \
"https://api.floopy.ai/v1/routing/explain" | jq .

Both windows are evaluated atomically — if either denies, the request is rejected with 429.

  • 30 requests / minute / organization (floopy_explain_rpm_budget, default).
  • 10 requests / minute / API key.

The dual cap exists so a leaked key cannot probe routing internals at high cadence.