Rate Limiting

Overview

Rate limiting protects your organization from runaway costs, abuse, and accidental traffic spikes. Floopy enforces limits at two levels — the gateway level and per API key — so you have both a global safety net and fine-grained control over individual consumers.

When a request exceeds the configured limit, the gateway returns a 429 Too Many Requests response. The response includes a Retry-After header indicating when the client can try again.

Gateway-Level Rate Limiting

Gateway-level limits apply to all traffic hitting the Floopy gateway:

Unauthenticated requests are rate limited by IP address. This prevents anonymous abuse and scanning.
Authenticated requests are rate limited by organization with a configurable requests-per-minute (RPM) cap.

The organization-wide RPM limit acts as a ceiling across all API keys. Even if individual keys have higher limits, the total traffic for the organization cannot exceed this value.

You can configure the organization RPM in Settings > Organization.

Per-API-Key Rate Limiting

Each API key can have its own rate limit, independent of the gateway-level limit. This is useful when you have multiple consumers with different trust levels or usage expectations.

To set a per-key limit:

Go to Settings > API Keys.
Select the key you want to configure.
Set the requests per minute value.

A request is blocked if it exceeds either the key-level limit or the organization-level limit, whichever is hit first.

How Limits Are Enforced

Floopy uses a sliding window algorithm backed by Redis. Unlike fixed windows (which reset at the top of each minute), sliding windows distribute the limit evenly across time and prevent burst traffic at window boundaries.

This approach is consistent across horizontally scaled gateway instances — all nodes share the same rate limit state, so a client cannot bypass limits by hitting different servers.

Dashboard Metrics

The Rate Limiting section of the dashboard shows:

Rate limit events over time — a chart of how many requests were throttled per time period.
Most limited users — which API keys or IP addresses are hitting limits most frequently.

Use these metrics to decide whether to increase limits for legitimate consumers or investigate potential abuse.

Custom Rate Limit Policies

You can define a custom rate limit policy on a per-request basis using the floopy-ratelimit-policy header. This overrides the default rate limits configured for the API key.

The header format is:

<limit>;w=<window>;u=<unit>;s=<segment>

Parameter	Description
`limit`	Maximum number of units allowed in the window
`w`	Window duration in seconds
`u`	Unit type: `request` (count requests) or `cents` (count cost in cents)
`s`	Segment: `global` (shared across all users), `user` (per end-user), or a custom segment ID

For example, 100;w=60;u=request;s=global means 100 requests per 60 seconds across all users sharing this policy.

import { OpenAI } from "openai";

const client = new OpenAI({
  baseURL: "https://api.floopy.ai/v1",
  apiKey: process.env.FLOOPY_API_KEY,
});

const response = await client.chat.completions.create(
  {
    model: "gpt-4o",
    messages: [{ role: "user", content: "Hello" }],
  },
  {
    headers: {
      "floopy-ratelimit-policy": "100;w=60;u=request;s=global",
    },
  },
);

console.log(response.choices[0].message.content);

from openai import OpenAI
import os

client = OpenAI(
    base_url="https://api.floopy.ai/v1",
    api_key=os.environ["FLOOPY_API_KEY"],
)

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello"}],
    extra_headers={
        "floopy-ratelimit-policy": "100;w=60;u=request;s=global",
    },
)

print(response.choices[0].message.content)

curl https://api.floopy.ai/v1/chat/completions \
  -H "Authorization: Bearer $FLOOPY_API_KEY" \
  -H "Content-Type: application/json" \
  -H "floopy-ratelimit-policy: 100;w=60;u=request;s=global" \
  -d '{
    "model": "gpt-4o",
    "messages": [{"role": "user", "content": "Hello"}]
  }'

Best Practices

Start generous, tighten later. Set initial limits above your expected peak traffic and reduce them as you understand your usage patterns.
Use per-key limits for external consumers. If you distribute API keys to third parties, per-key limits prevent any single consumer from monopolizing your quota.
Monitor before enforcing. Review the rate limiting dashboard for a few days before lowering limits to avoid disrupting legitimate traffic.
Combine with caching. Cached responses do not count against rate limits. Enabling caching effectively increases the throughput your consumers experience.