Fallbacks

Why Fallbacks Matter

Provider outages happen. OpenAI, Anthropic, and Google all experience downtime, rate limiting, and degraded performance. If your application depends on a single provider, a single outage takes you down with it.

Fallbacks ensure your application stays up by automatically routing to an alternative provider when the primary one fails. Your users never see the difference.

How Fallbacks Work

When a request fails, the gateway tries the next provider in your fallback chain. Combined with the circuit breaker, providers that are already known to be failing are skipped entirely — no wasted latency on doomed requests.

graph TD
    A[Incoming Request] --> B[Provider A]
    B -->|Success| C[Return Response]
    B -->|Failure| D{Circuit Breaker Open for B?}
    D -->|Yes| E[Skip — already failing]
    D -->|No| F[Record Failure]
    F --> E
    E --> G[Provider B]
    G -->|Success| C
    G -->|Failure| H{Circuit Breaker Open for C?}
    H -->|Yes| I[Skip]
    H -->|No| J[Record Failure]
    J --> I
    I --> K[Provider C]
    K -->|Success| C
    K -->|Failure| L[Return Error — all providers exhausted]

The gateway translates the request format between providers automatically. A request written for gpt-4o is adapted to the correct format when it falls back to claude-sonnet-4-6 or gemini-2.5-pro.

Configuring Fallbacks

There are two ways to set up fallbacks.

Via Routing Rules

Create a fallback routing rule in the dashboard under Routing > Rules. Define a priority-ordered list of provider/model pairs. The gateway tries each in order when the higher-priority provider fails.

This approach is best when you want centralized control and the ability to change fallback behavior without deploying code.

Via Model String

Pass a comma-separated list of models in the model field of your request:

"model": "gpt-4o,claude-sonnet-4-6,gemini-2.5-pro"

The gateway tries each model in order, left to right. This approach is the simplest — no dashboard configuration required.

Code Examples

import { OpenAI } from "openai";

const client = new OpenAI({
  baseURL: "https://api.floopy.ai/v1",
  apiKey: process.env.FLOOPY_API_KEY,
});

const response = await client.chat.completions.create({
  model: "gpt-4o,claude-sonnet-4-6,gemini-2.5-pro",
  messages: [{ role: "user", content: "Summarize the latest AI news." }],
});

console.log(response.choices[0].message.content);

from openai import OpenAI
import os

client = OpenAI(
    base_url="https://api.floopy.ai/v1",
    api_key=os.environ["FLOOPY_API_KEY"],
)

response = client.chat.completions.create(
    model="gpt-4o,claude-sonnet-4-6,gemini-2.5-pro",
    messages=[{"role": "user", "content": "Summarize the latest AI news."}],
)

print(response.choices[0].message.content)

curl https://api.floopy.ai/v1/chat/completions \
  -H "Authorization: Bearer $FLOOPY_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o,claude-sonnet-4-6,gemini-2.5-pro",
    "messages": [{"role": "user", "content": "Summarize the latest AI news."}]
  }'

Circuit Breaker Integration

The gateway tracks the failure rate for each provider using a sliding window. When a provider’s failure rate exceeds the configured threshold, its circuit opens and the provider is skipped automatically for subsequent requests. This avoids wasting latency and tokens on providers that are known to be down.

After a cooldown period, the circuit enters a half-open state and allows a single probe request through. If the probe succeeds, the circuit closes and the provider is back in rotation. If it fails, the circuit stays open for another cooldown interval.

This means fallback chains respond instantly to outages — the first few failures trigger the circuit, and all subsequent requests skip the failing provider with zero added latency.

Detecting Fallbacks

When a fallback provider handles your request, the response includes additional headers:

Header	Description
`Floopy-Fallback-Used`	`true` if the response came from a non-primary provider
`Floopy-Provider`	The provider that actually handled the request (e.g., `anthropic`)
`Floopy-Model`	The model that actually generated the response (e.g., `claude-sonnet-4-6`)

Use these headers to log which provider served each request and to monitor your fallback rate over time.

Best Practices

Order by preference. Put your fastest or cheapest provider first. Fallbacks are tried left to right, so the primary provider handles the majority of traffic under normal conditions.
Mix provider families. Using gpt-4o,claude-sonnet-4-6,gemini-2.5-pro gives you true redundancy across OpenAI, Anthropic, and Google. If you list three OpenAI models, a platform-wide OpenAI outage takes out your entire chain.
Monitor fallback rate. A rising fallback rate in the dashboard is an early signal that your primary provider is degraded, even before a full outage. Set up alerts to catch this.
Test your chain. Verify that your fallback models produce acceptable output quality for your use case. A fallback that returns poor results is worse than a clear error in many applications.
Combine with caching. Cached responses are returned before the fallback chain is evaluated. High cache hit rates reduce your exposure to provider failures.