Routing

Overview

Routing rules control how the Floopy gateway distributes requests across your configured LLM providers. You can create multiple routing rules and assign them to different API keys, giving you fine-grained control over traffic flow.

Changes to routing rules take effect immediately. The gateway cache is invalidated automatically when you update a rule in the dashboard.

Routing Strategies

Fallback

Sends requests to providers in a priority order you define. If the first provider fails or is unavailable, the gateway retries with the next provider in the list.

Use fallback when reliability is your top priority.

Priority	Provider	Behavior
1	OpenAI	Try first
2	Anthropic	If OpenAI fails
3	Google Gemini	If both fail

Round-Robin

Distributes requests evenly across providers in rotation. Each successive request goes to the next provider in the list.

Use round-robin to balance load evenly when providers have similar performance.

Weighted

Distributes requests based on percentage weights you assign. For example, you could send 70% of traffic to OpenAI and 30% to Anthropic.

Use weighted routing for cost optimization, A/B testing models, or gradual migrations.

OpenAI:     70%
Anthropic:  30%

Latency-Based

Automatically routes each request to the provider with the lowest P50 latency based on recent performance data. The gateway continuously measures response times and adjusts routing in real time.

Use latency-based routing when response speed matters most.

How Routing Strategies Work

This section explains the internal mechanics of each strategy — how requests are distributed, how failures are handled, and what guarantees each strategy provides.

Fallback in Detail

graph TD
    A[Request] --> B[Try Provider 1]
    B -->|Success| C[Return Response]
    B -->|Failure| D[Circuit Breaker Records Failure]
    D --> E[Try Provider 2]
    E -->|Success| C
    E -->|Failure| F[Try Provider 3]
    F -->|Success| C
    F -->|No More Providers| G[502 All Providers Exhausted]

Providers are tried in the priority order you define. If a provider returns a 5xx error, times out, or has an open circuit breaker, the gateway immediately moves to the next provider in the list. The response header Floopy-Fallback-Used: true is set whenever a non-primary provider handles the request, so your application can detect when a fallback occurred.

Round-Robin in Detail

Round-robin uses an atomic counter stored in Redis to distribute requests evenly across providers. Each incoming request increments the counter and selects the provider at index counter % num_providers. Because the counter is stored in Redis and incremented atomically, distribution remains even across multiple gateway instances. The counter is scoped per routing rule, so different rules maintain independent rotation.

Weighted in Detail

Each provider in a weighted rule has a weight between 0 and 100. When a request arrives, a random number is generated (seeded by the request ID for deterministic replay during debugging) and mapped to the cumulative weight distribution. For example, if OpenAI has weight 70 and Anthropic has weight 30, roughly 70% of requests are routed to OpenAI and 30% to Anthropic. Weights do not need to sum to 100 — the gateway normalizes them automatically.

Latency-Based in Detail

The gateway tracks P50 (median) latency per provider using recent request data from ClickHouse. Providers with lower latency receive proportionally more traffic using inverse weighting:

weight(provider) = 1 / p50_latency(provider)
selection_probability = weight(provider) / sum(all_weights)

For example, if Provider A has a P50 of 200ms and Provider B has a P50 of 800ms, Provider A receives 4x more traffic than Provider B. Providers with open circuit breakers are excluded from selection entirely, ensuring that failing providers do not receive traffic regardless of their historical latency.

Circuit Breaker

stateDiagram-v2
    [*] --> Closed
    Closed --> Open : Failure rate > threshold
    Open --> HalfOpen : Cooldown elapsed
    HalfOpen --> Closed : Probe succeeds
    HalfOpen --> Open : Probe fails

Each (organization, provider) pair has its own independent circuit breaker. The circuit breaker tracks failures within a sliding time window and transitions between three states:

Closed — Normal operation. Requests are sent to the provider. The circuit breaker monitors the failure rate (failures / total requests) within the tracking window (default: 60 seconds).
Open — When the failure rate exceeds the threshold (default: 50%), the circuit opens. All requests skip this provider immediately — no network call is made. This prevents wasting latency on a provider that is known to be failing.
Half-Open — After the cooldown period (default: 30 seconds), the circuit transitions to half-open. A single probe request is allowed through. If it succeeds, the circuit closes and the provider is fully re-enabled. If it fails, the circuit re-opens for another cooldown period.

The circuit breaker works with all routing strategies and activates automatically. You do not need to configure it manually.

Creating a Routing Rule

Go to Settings > Routing in the dashboard.
Click Create routing rule.
Enter a name for the rule (e.g., production-fallback, low-latency).
Select a strategy (fallback, round-robin, weighted, or latency-based).
Add the providers you want to include and configure their order, weights, or priority.
Click Save.

Assigning a Routing Rule to an API Key

Once you have created a routing rule, assign it to one or more API keys:

Go to Settings > API Keys.
Click the API key you want to configure.
Select the routing rule from the Routing rule dropdown.
Save changes.

All requests made with that API key will now follow the selected routing strategy.

Overriding Routing at Request Time

You can override the routing rule for individual requests using the floopy-routing-rule header. You can also use floopy-ab-test and floopy-smart-select headers for A/B testing and smart model selection.

import { OpenAI } from "openai";

const client = new OpenAI({
  baseURL: "https://api.floopy.ai/v1",
  apiKey: process.env.FLOOPY_API_KEY,
});

// Override routing rule
const response = await client.chat.completions.create(
  {
    model: "gpt-4o",
    messages: [{ role: "user", content: "Hello" }],
  },
  {
    headers: {
      "floopy-routing-rule": "low-latency-us-east",
    },
  },
);

// A/B test between models
const abResponse = await client.chat.completions.create(
  {
    model: "gpt-4o",
    messages: [{ role: "user", content: "Hello" }],
  },
  {
    headers: {
      "floopy-ab-test": "experiment-q2-2026",
    },
  },
);

from openai import OpenAI
import os

client = OpenAI(
    base_url="https://api.floopy.ai/v1",
    api_key=os.environ["FLOOPY_API_KEY"],
)

# Override routing rule
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello"}],
    extra_headers={
        "floopy-routing-rule": "low-latency-us-east",
    },
)

# A/B test between models
ab_response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello"}],
    extra_headers={
        "floopy-ab-test": "experiment-q2-2026",
    },
)

# Override routing rule
curl https://api.floopy.ai/v1/chat/completions \
  -H "Authorization: Bearer $FLOOPY_API_KEY" \
  -H "Content-Type: application/json" \
  -H "floopy-routing-rule: low-latency-us-east" \
  -d '{
    "model": "gpt-4o",
    "messages": [{"role": "user", "content": "Hello"}]
  }'

# A/B test between models
curl https://api.floopy.ai/v1/chat/completions \
  -H "Authorization: Bearer $FLOOPY_API_KEY" \
  -H "Content-Type: application/json" \
  -H "floopy-ab-test: experiment-q2-2026" \
  -d '{
    "model": "gpt-4o",
    "messages": [{"role": "user", "content": "Hello"}]
  }'

Circuit Breaker

The gateway includes a built-in circuit breaker that automatically disables a provider when it fails repeatedly. This prevents cascading failures and wasted latency on providers that are down.

When a provider’s error rate exceeds the threshold, the circuit breaker opens and the gateway skips that provider for a cooldown period. After the cooldown, the gateway sends a test request to check if the provider has recovered. If the test succeeds, the provider is re-enabled.

The circuit breaker works with all routing strategies. You do not need to configure it manually — it activates automatically.

Fallback Behavior

When a provider returns an error (5xx, timeout, or connection failure), the gateway behavior depends on the routing strategy:

Fallback — Tries the next provider in priority order.
Round-robin / Weighted — Retries with the next available provider.
Latency-based — Falls back to the provider with the next-lowest latency.

If all providers fail, the gateway returns a 502 Bad Gateway response with details about which providers were attempted. The Floopy-Fallback-Used response header is set to "true" when a fallback provider handled the request.

Example Configuration

A common production setup:

Create a routing rule named production with fallback strategy:
- Priority 1: OpenAI (gpt-4o)
- Priority 2: Anthropic (claude-3-5-sonnet)
- Priority 3: Google Gemini (gemini-2.5-pro)
Assign this rule to your production API key.

If OpenAI experiences an outage, your application automatically falls back to Anthropic, then Gemini — with no code changes and no downtime.

Smart Cost Routing

Combine routing strategies with intelligent cost optimization. When enabled on a routing rule, Smart Cost Routing analyzes prompt complexity and automatically routes simple requests to cheaper models.

Learn more about Smart Cost Routing →