Skip to content

Smart Selector

What is Smart Selector

Instead of manually choosing which model to use, Smart Selector automatically learns which model performs best for your use case and routes traffic accordingly. It runs a multi-armed bandit (MAB) system that balances exploration (trying different models to gather data) with exploitation (sending traffic to the current best performer).

Over time, Smart Selector converges on the optimal model for your specific workload without requiring manual A/B testing or guesswork. It adapts continuously — if a model degrades or a new one improves, traffic shifts automatically.

How It Works

graph TD
    A[Request with floopy-smart-select header] --> B[Load Variant Scores from ClickHouse]
    B --> C{Min Samples Met?}
    C -->|No| D[Force Exploration: Round-Robin]
    C -->|Yes| E[Apply Algorithm]
    E --> F{Strategy}
    F -->|Epsilon-Greedy| G[Exploit best or explore random]
    F -->|UCB1| H[Balance score + uncertainty]
    F -->|Thompson Sampling| I[Bayesian probability sampling]
    G --> J[Select Variant]
    H --> J
    I --> J
    D --> J
    J --> K[Route to Selected Provider/Model]

When a request arrives with the floopy-smart-select header, the gateway loads the current performance scores for each variant from ClickHouse. If any variant has fewer observations than the configured minimum sample count, the system forces exploration using round-robin to gather enough data. Once all variants meet the minimum, the configured algorithm selects the variant.

Scoring

Each variant is scored across five dimensions derived from user feedback and cost data:

DimensionSourceDescription
RelevanceUser feedbackHow well the response answers the question
CoherenceUser feedbackLogical consistency and readability
HelpfulnessUser feedbackPractical usefulness of the response
SafetyUser feedbackAbsence of harmful or inappropriate content
Cost EfficiencyCalculatedThe cheapest variant scores 100; others are scored proportionally (e.g., a variant 2x the cost scores 50)

The composite score is a weighted sum of all five dimensions. Floopy provides four presets:

PresetRelevanceCoherenceHelpfulnessSafetyCost Efficiency
Balanced0.250.200.250.150.15
Quality First0.300.250.300.100.05
Cost Optimized0.150.100.150.100.50
Safety Critical0.150.150.150.450.10

You can also define custom weights that sum to 1.0.

Benchmark Prior for New Variants

When a new variant (model) is added to a Smart Selector with no historical data, the system uses public benchmark scores as an initial quality estimate instead of starting from zero.

How it works:

  • Benchmark scores are sourced from the same curated model intelligence database used by Smart Cost Routing (MMLU, HumanEval, SWE-bench, GPQA, and 8 other standardized benchmarks)
  • The score is computed using General intent weights: MMLU 30%, GPQA 15%, HumanEval 15%, MATH 15%, IFEval 15%, HellaSwag 10%
  • Scaled to 0–100 to match feedback dimension range: initial_composite = benchmark_score × 100
  • Models without benchmark data start at 50 (neutral) — exploration will test them regardless

Why this matters:

  • Without priors, a new variant needs the full under-sampling phase (N × min_samples requests) before competing with established variants
  • With priors, bandit algorithms make informed decisions immediately
  • UCB1 confidence bound: score + √(2 × ln(total) / n) — with a meaningful initial score, the exploration bonus is calibrated correctly
  • Thompson Sampling benefits from a prior that reflects actual model capability

Example: Adding Claude Sonnet 4.5 to a selector:

  1. Day 0: Starts with composite ~85 based on benchmarks (strong MMLU, GPQA, HumanEval)
  2. Under-sampling phase: Gets minimum samples for baseline feedback
  3. Day 2+: Real feedback takes over, benchmark prior fades naturally
  4. The model competes fairly from the start instead of being penalized for being new

Transition from prior to feedback: As ClickHouse accumulates real feedback data, the aggregation query naturally produces feedback-based scores that replace the benchmark prior. No explicit transition logic — the data source simply shifts from static benchmarks to dynamic user feedback.

Algorithms

Smart Selector supports three selection algorithms. Each balances exploration and exploitation differently.

Epsilon-Greedy (default)

The simplest algorithm. With probability exploration_rate it picks a random variant (exploration); otherwise it picks the variant with the highest composite score (exploitation).

An exploration_rate of 0.1 means 10% of requests go to a random variant and 90% go to the current best. Lower values exploit more aggressively; higher values gather more data.

UCB1 (Upper Confidence Bound)

UCB1 accounts for uncertainty in the score estimates. It selects the variant that maximizes:

UCB = composite_score + sqrt(2 * ln(total_samples) / variant_samples)

Variants with fewer observations get a larger uncertainty bonus, ensuring they are tried enough times to produce reliable estimates. As data accumulates, the bonus shrinks and the algorithm converges on the best variant. UCB1 requires no tuning parameters.

Thompson Sampling

A Bayesian approach that models each variant’s score as a Beta distribution:

Beta(alpha, beta) where:
alpha = score_rate * n + 1
beta = (1 - score_rate) * n + 1

Each request, the algorithm samples from every variant’s distribution and selects the variant with the highest sample. Thompson Sampling naturally balances exploration and exploitation and converges faster than Epsilon-Greedy in most scenarios.

Configuration

Create a Smart Selector in the dashboard under Smart Selectors. Each selector defines:

  • Variants — two or more provider/model combinations to compete. Each variant can optionally include a prompt override (useful for testing different system prompts alongside different models).
  • Algorithm — Epsilon-Greedy, UCB1, or Thompson Sampling.
  • Optimization Mode — Composite, Single, or Cost-Aware (see below).
  • Exploration Rate — applies to Epsilon-Greedy only. Default 0.1.
  • Min Samples — minimum observations per variant before the algorithm activates. Default 30.
  • Evaluation Window — time window for score calculation (e.g., last 7 days). Older data is excluded, letting the selector adapt to model changes.
  • Weight Preset — Balanced, Quality First, Cost Optimized, Safety Critical, or Custom.

Usage

Add the floopy-smart-select header with your selector ID. The gateway handles variant selection and routing automatically.

import { OpenAI } from "openai";
const client = new OpenAI({
baseURL: "https://api.floopy.ai/v1",
apiKey: process.env.FLOOPY_API_KEY,
});
const response = await client.chat.completions.create(
{
model: "gpt-4o", // ignored when smart selector is active
messages: [{ role: "user", content: "Draft a marketing email for our new feature." }],
},
{
headers: {
"floopy-smart-select": "sel_abc123",
},
},
);
console.log(response.choices[0].message.content);

The model field in the request body is ignored when a smart selector is active — the selector chooses the model. The response includes Floopy-Provider and Floopy-Model headers indicating which variant was selected.

Optimization Modes

Composite (default)

Optimizes for the weighted multi-objective composite score. All five dimensions contribute according to the configured weights. This is the recommended mode for most use cases.

Single

Optimizes for a single dimension only (e.g., relevance). All other dimensions are ignored during variant selection. Use this when you have one clear priority and do not want other factors influencing routing.

Cost-Aware

Selects the cheapest variant that meets a configurable quality threshold. The algorithm first filters out any variant whose composite score falls below the threshold, then picks the cheapest remaining option. This mode is ideal for high-volume, cost-sensitive workloads where output quality has a known acceptable floor.

Monitoring

The dashboard provides real-time visibility into Smart Selector performance under Smart Selectors > [Your Selector]:

  • Winning variant — which variant currently has the highest composite score.
  • Selection distribution — a chart showing what percentage of traffic each variant receives over time. A healthy selector shows traffic concentrating on the best variant while maintaining a small exploration slice.
  • Score trends — per-variant score graphs across all five dimensions, plotted over the evaluation window.
  • Feedback volume — number of feedback submissions per variant, so you can identify variants that need more data.

Plan Requirements

Smart Selector is available on the Pro plan and above. The feature is gated by the has_smart_selector flag on your organization’s plan. Free and Starter plans can view the Smart Selector configuration page but cannot activate selectors.