Skip to content
Sign In Get Started

GET /v1/experiments/{id}/results

Read aggregated results for a single routing experiment. The endpoint runs two aggregations over the request log — one for the experiment’s baseline side, one for its candidate side — bounded to the experiment’s lifetime, and returns cost, quality, and latency aggregates plus the deltas between them.

This is the API behind the experiment detail page in the Floopy dashboard, including the polling loop used by the onboarding shadow-setup step.

GET https://api.floopy.ai/v1/experiments/{id}/results
Authorization: Bearer <your-floopy-api-key>
  • Permission: read_permission.
  • Pro plan.
  • We are rolling this out per organization while we validate quality. Contact support if your organization is not yet enabled.
FieldTypeRequiredConstraints
idstring (UUID v4)YesThe experiment id. Validated as a v4 UUID at the path extractor; malformed input returns 400 invalid_experiment_id and no database call is made.

A successful call pre-fetches the experiment record scoped by the caller’s organization and the supplied id. If the experiment does not belong to the caller’s org, the response is 404 with bytes identical to a non-existent experiment; the analytics store is not queried.

The aggregation window is bounded by the experiment’s lifetime, not by a customer-supplied window parameter:

  • Lower bound (always applied): request_created_at >= experiment.started_at.
  • Upper bound (only when the experiment is no longer active): request_created_at <= experiment.ended_at.

For an active experiment, the upper bound is open — every new request the experiment sees is included on the next read. For a completed or rolled_back experiment, the window is frozen at [started_at, ended_at].

This means polling the same experiment id twice in a row can return different numbers if the experiment is still active and traffic is flowing.

{
"experiment_id": "5b3f9c1e-7a2b-4c3d-9e1f-0a1b2c3d4e5f",
"type": "shadow",
"status": "active",
"started_at": "2026-05-01T09:00:00Z",
"ended_at": null,
"baseline": {
"samples": 9412,
"avg_cost_micro_usd": 412,
"composite_quality": 0.812,
"p50_latency_ms": 612
},
"candidate": {
"samples": 9412,
"avg_cost_micro_usd": 226,
"composite_quality": 0.804,
"p50_latency_ms": 588
},
"delta": {
"cost_pct": -45.1,
"quality_abs": -0.008,
"p50_latency_ms": -24
}
}
FieldTypeDescription
experiment_idUUIDEchoes the path id.
typestringcanary or shadow.
statusstringdraft, active, completed, or rolled_back.
started_atISO8601 | nullWhen the experiment was activated. The lower bound of the aggregation window.
ended_atISO8601 | nullWhen the experiment ended. The upper bound when present; null for active experiments.
baselineobjectAggregate over the experiment’s baseline (provider, model) for rows in the time-bounded window.
candidateobjectAggregate over the experiment’s candidate (provider, model) for the same window.
delta.cost_pctnumber(candidate − baseline) / baseline * 100. Negative means the candidate is cheaper.
delta.quality_absnumbercandidate − baseline on composite quality in [0.0, 1.0].
delta.p50_latency_msnumbercandidate − baseline on p50 latency in milliseconds.

For a shadow experiment, the candidate rows are scored side-by-side with the live baseline traffic; the user-visible response always comes from the baseline. For a canary experiment, both sides serve real traffic in proportion to traffic_pct.

When neither side has accumulated any rows yet (a freshly activated experiment), baseline and candidate blocks return zeros; delta is omitted.

The endpoint Redis-caches the response per (organization_id, experiment_id) for 30 seconds. There is no Cache-Control header echoed — the cache is internal to the gateway.

A 30-second TTL is short enough that polling-driven UIs (the onboarding shadow-setup step polls every 20 s) see fresh-enough numbers, and long enough that the analytics store is not hit on every render.

Statuserror codeWhen
400invalid_experiment_idThe path id is not a valid UUID v4. No database call is made.
403read_permissionKey lacks read_permission.
403plan_requiredThe endpoint is not included on the caller’s plan.
404not_foundNo experiment record exists in the caller’s org for that id. Bytes are identical to a cross-tenant lookup.
429rate_limitedExceeded 60 req/min/org or 20 req/min/key. Carries Retry-After.
503upstream_timeoutThe analytics aggregation exceeded the 5 s wall-clock timeout.
5xxinternalUpstream failure.
Terminal window
curl -s -H "Authorization: Bearer $FLOOPY_API_KEY" \
"https://api.floopy.ai/v1/experiments/5b3f9c1e-7a2b-4c3d-9e1f-0a1b2c3d4e5f/results" | jq .
  • 60 requests / minute / organization.
  • 20 requests / minute / API key.

Both windows are evaluated atomically. The dashboard’s polling loop respects these caps by spacing requests at 20 s for the first 10 minutes and 60 s after that, and pausing entirely while the browser tab is hidden.

Every successful response writes an experiment_results_read event to the audit log, throttled per (organization_id, key_id, experiment_id) for 60 seconds.