A/B Testing

Overview

A/B Testing lets you route a percentage of your traffic to different model or prompt variants and compare their performance with real production data. Instead of guessing which configuration works best, you run a controlled test and let the results guide your decision.

Setting Up a Test

Go to A/B Testing in the dashboard and click Create Test.
Define your variants. Each variant specifies a model, provider, and optionally a prompt version.
Set the traffic split — the percentage of requests routed to each variant. For example, 80% to your current configuration and 20% to the challenger.
Save and activate the test.

Once active, the gateway automatically routes incoming requests according to the configured split. Routing is consistent per session or user (depending on your configuration) so individual users get a coherent experience.

Tracking Performance

The A/B Testing dashboard shows real-time metrics for each variant:

Latency — average and percentile response times.
Cost — average cost per request and total spend.
Token usage — input and output tokens per request.
Feedback scores — if you collect feedback, scores are broken down per variant.
Error rate — percentage of failed requests per variant.

Use these metrics to determine which variant delivers the best balance of quality, cost, and speed.

Applying a Winner

When you have enough data to make a decision, click Apply Winner on the variant you want to keep. This updates your routing configuration to send 100% of traffic to the winning variant and deactivates the test.

Previous test results are preserved in the dashboard for future reference.

Smart Selectors

Smart Selectors go beyond simple percentage splits. They let you define routing rules based on request properties:

Content-based routing — route requests containing certain keywords or topics to a specific model. For example, send safety-sensitive prompts to a model with stronger guardrails.
Cost-based routing — route short, simple prompts to a cheaper model and complex prompts to a more capable one.
User-based routing — route specific user segments to different variants.

Smart Selectors can be combined with percentage-based splits for hybrid routing strategies.

Best Practices

Test one variable at a time. Changing both the model and the prompt simultaneously makes it hard to attribute differences.
Run tests long enough. A few hours of data is rarely sufficient. Run tests for at least a few days to account for traffic pattern variations.
Start with a small split. Route 10-20% of traffic to the challenger variant to limit risk while gathering data.
Use feedback. Automated metrics like latency and cost are useful, but user feedback provides the strongest signal on output quality.
Review edge cases. Before applying a winner, manually review a sample of responses from each variant to catch quality issues that metrics might miss.

Using A/B Tests and Smart Selectors via Headers

Instead of relying solely on dashboard-configured routing, you can direct a specific request to an A/B test or Smart Selector by passing the appropriate header.

Header	Description
`floopy-ab-test`	The UUID of the A/B test to use for routing this request
`floopy-smart-select`	The ID of the Smart Selector to use for routing this request

You can find the test and selector IDs in the A/B Testing section of the dashboard.

import { OpenAI } from "openai";

const client = new OpenAI({
  baseURL: "https://api.floopy.ai/v1",
  apiKey: process.env.FLOOPY_API_KEY,
});

// Route through an A/B test
const response = await client.chat.completions.create(
  {
    model: "gpt-4o",
    messages: [{ role: "user", content: "Explain relativity simply." }],
  },
  {
    headers: {
      "floopy-ab-test": "8a10573a-8e06-4bb0-8c1d-38c973350253",
    },
  },
);

// Route through a Smart Selector
const response2 = await client.chat.completions.create(
  {
    model: "gpt-4o",
    messages: [{ role: "user", content: "Translate this to French." }],
  },
  {
    headers: {
      "floopy-smart-select": "selector_cost-optimizer",
    },
  },
);

console.log(response.choices[0].message.content);

from openai import OpenAI
import os

client = OpenAI(
    base_url="https://api.floopy.ai/v1",
    api_key=os.environ["FLOOPY_API_KEY"],
)

# Route through an A/B test
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Explain relativity simply."}],
    extra_headers={
        "floopy-ab-test": "8a10573a-8e06-4bb0-8c1d-38c973350253",
    },
)

# Route through a Smart Selector
response2 = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Translate this to French."}],
    extra_headers={
        "floopy-smart-select": "selector_cost-optimizer",
    },
)

print(response.choices[0].message.content)

# Route through an A/B test
curl https://api.floopy.ai/v1/chat/completions \
  -H "Authorization: Bearer $FLOOPY_API_KEY" \
  -H "Content-Type: application/json" \
  -H "floopy-ab-test: 8a10573a-8e06-4bb0-8c1d-38c973350253" \
  -d '{
    "model": "gpt-4o",
    "messages": [{"role": "user", "content": "Explain relativity simply."}]
  }'

# Route through a Smart Selector
curl https://api.floopy.ai/v1/chat/completions \
  -H "Authorization: Bearer $FLOOPY_API_KEY" \
  -H "Content-Type: application/json" \
  -H "floopy-smart-select: selector_cost-optimizer" \
  -d '{
    "model": "gpt-4o",
    "messages": [{"role": "user", "content": "Translate this to French."}]
  }'

Plan Requirements

Feature	Required Plan Feature
A/B Testing	`has_test_ab`
Smart Selectors	`has_smart_selector`

Check your current plan under Settings > Billing to see which features are available for your organization.