A/B Testing
Overview
A/B Testing lets you route a percentage of your traffic to different model or prompt variants and compare their performance with real production data. Instead of guessing which configuration works best, you run a controlled test and let the results guide your decision.
Setting Up a Test
- Go to A/B Testing in the dashboard and click Create Test.
- Define your variants. Each variant specifies a model, provider, and optionally a prompt version.
- Set the traffic split — the percentage of requests routed to each variant. For example, 80% to your current configuration and 20% to the challenger.
- Save and activate the test.
Once active, the gateway automatically routes incoming requests according to the configured split. Routing is consistent per session or user (depending on your configuration) so individual users get a coherent experience.
Tracking Performance
The A/B Testing dashboard shows real-time metrics for each variant:
- Latency — average and percentile response times.
- Cost — average cost per request and total spend.
- Token usage — input and output tokens per request.
- Feedback scores — if you collect feedback, scores are broken down per variant.
- Error rate — percentage of failed requests per variant.
Use these metrics to determine which variant delivers the best balance of quality, cost, and speed.
Applying a Winner
When you have enough data to make a decision, click Apply Winner on the variant you want to keep. This updates your routing configuration to send 100% of traffic to the winning variant and deactivates the test.
Previous test results are preserved in the dashboard for future reference.
Smart Selectors
Smart Selectors go beyond simple percentage splits. They let you define routing rules based on request properties:
- Content-based routing — route requests containing certain keywords or topics to a specific model. For example, send safety-sensitive prompts to a model with stronger guardrails.
- Cost-based routing — route short, simple prompts to a cheaper model and complex prompts to a more capable one.
- User-based routing — route specific user segments to different variants.
Smart Selectors can be combined with percentage-based splits for hybrid routing strategies.
Best Practices
- Test one variable at a time. Changing both the model and the prompt simultaneously makes it hard to attribute differences.
- Run tests long enough. A few hours of data is rarely sufficient. Run tests for at least a few days to account for traffic pattern variations.
- Start with a small split. Route 10-20% of traffic to the challenger variant to limit risk while gathering data.
- Use feedback. Automated metrics like latency and cost are useful, but user feedback provides the strongest signal on output quality.
- Review edge cases. Before applying a winner, manually review a sample of responses from each variant to catch quality issues that metrics might miss.
Using A/B Tests and Smart Selectors via Headers
Instead of relying solely on dashboard-configured routing, you can direct a specific request to an A/B test or Smart Selector by passing the appropriate header.
| Header | Description |
|---|---|
floopy-ab-test | The UUID of the A/B test to use for routing this request |
floopy-smart-select | The ID of the Smart Selector to use for routing this request |
You can find the test and selector IDs in the A/B Testing section of the dashboard.
import { OpenAI } from "openai";
const client = new OpenAI({ baseURL: "https://api.floopy.ai/v1", apiKey: process.env.FLOOPY_API_KEY,});
// Route through an A/B testconst response = await client.chat.completions.create( { model: "gpt-4o", messages: [{ role: "user", content: "Explain relativity simply." }], }, { headers: { "floopy-ab-test": "8a10573a-8e06-4bb0-8c1d-38c973350253", }, },);
// Route through a Smart Selectorconst response2 = await client.chat.completions.create( { model: "gpt-4o", messages: [{ role: "user", content: "Translate this to French." }], }, { headers: { "floopy-smart-select": "selector_cost-optimizer", }, },);
console.log(response.choices[0].message.content);from openai import OpenAIimport os
client = OpenAI( base_url="https://api.floopy.ai/v1", api_key=os.environ["FLOOPY_API_KEY"],)
# Route through an A/B testresponse = client.chat.completions.create( model="gpt-4o", messages=[{"role": "user", "content": "Explain relativity simply."}], extra_headers={ "floopy-ab-test": "8a10573a-8e06-4bb0-8c1d-38c973350253", },)
# Route through a Smart Selectorresponse2 = client.chat.completions.create( model="gpt-4o", messages=[{"role": "user", "content": "Translate this to French."}], extra_headers={ "floopy-smart-select": "selector_cost-optimizer", },)
print(response.choices[0].message.content)# Route through an A/B testcurl https://api.floopy.ai/v1/chat/completions \ -H "Authorization: Bearer $FLOOPY_API_KEY" \ -H "Content-Type: application/json" \ -H "floopy-ab-test: 8a10573a-8e06-4bb0-8c1d-38c973350253" \ -d '{ "model": "gpt-4o", "messages": [{"role": "user", "content": "Explain relativity simply."}] }'
# Route through a Smart Selectorcurl https://api.floopy.ai/v1/chat/completions \ -H "Authorization: Bearer $FLOOPY_API_KEY" \ -H "Content-Type: application/json" \ -H "floopy-smart-select: selector_cost-optimizer" \ -d '{ "model": "gpt-4o", "messages": [{"role": "user", "content": "Translate this to French."}] }'Plan Requirements
| Feature | Required Plan Feature |
|---|---|
| A/B Testing | has_test_ab |
| Smart Selectors | has_smart_selector |
Check your current plan under Settings > Billing to see which features are available for your organization.