API Reference
The Floopy gateway exposes an OpenAI-compatible API. Point any OpenAI SDK at the gateway by changing the baseURL, and your existing code works without modification.
Endpoint
Section titled “Endpoint”POST https://api.floopy.ai/v1/chat/completionsAuthentication
Section titled “Authentication”Include your Floopy API key in the Authorization header:
Authorization: Bearer <your-floopy-api-key>Request Body
Section titled “Request Body”{ "model": "gpt-4o", "messages": [ { "role": "system", "content": "You are a helpful assistant." }, { "role": "user", "content": "Hello!" } ], "temperature": 0.7, "max_tokens": 1000, "stream": false, "inputs": { "name": "Alice" }}Fields
Section titled “Fields”| Field | Type | Required | Description |
|---|---|---|---|
model | string | Yes | Model ID (e.g., "gpt-4o"). Comma-separated for fallback: "gpt-4o,claude-sonnet-4-6" |
messages | array | Yes | Array of message objects with role ("system", "user", "assistant") and content (string) |
temperature | number | No | 0.0-2.0. Controls randomness. Default: model default |
max_tokens | integer | No | Maximum tokens in response |
top_p | number | No | 0.0-1.0. Nucleus sampling threshold |
frequency_penalty | number | No | -2.0 to 2.0. Reduces token repetition |
presence_penalty | number | No | -2.0 to 2.0. Encourages new topics |
stop | array | No | Strings that stop generation when encountered |
reasoning_effort | string | No | "low", "medium", "high". For o1/o3 models only |
response_format | object | No | JSON mode: {"type": "json_object"} |
stream | boolean | No | Enable SSE streaming. Default: false |
inputs | object | No | Key-value pairs for prompt template variable substitution. Stripped before forwarding to provider |
Response Body (non-streaming)
Section titled “Response Body (non-streaming)”{ "id": "chatcmpl-abc123", "object": "chat.completion", "created": 1717000000, "model": "gpt-4o", "choices": [ { "index": 0, "message": { "role": "assistant", "content": "Hello! How can I help?" }, "finish_reason": "stop" } ], "usage": { "prompt_tokens": 12, "completion_tokens": 8, "total_tokens": 20 }}Response Body (streaming)
Section titled “Response Body (streaming)”When stream: true, the gateway returns Server-Sent Events (SSE). Each event contains a chunk:
data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1717000000,"model":"gpt-4o","choices":[{"index":0,"delta":{"content":"Hello"},"finish_reason":null}]}The stream ends with:
data: [DONE]Response Headers
Section titled “Response Headers”The gateway adds these headers to every response:
| Header | Description | Example |
|---|---|---|
Floopy-Provider | The provider that handled the request | OpenAI |
Floopy-Model | The model that processed the request | gpt-4o |
Floopy-Fallback-Used | Whether a fallback provider was used because the primary was unavailable | true |
Floopy-Reasoning-Tokens | Number of reasoning tokens used (DeepSeek models) | 150 |
Floopy-Queue-Time | Time the request spent in the provider queue, in seconds (Groq) | 0.5 |
Floopy-Prompt-Time | Time spent processing the prompt, in seconds (Groq) | 0.2 |
Floopy-Completion-Time | Time spent generating the completion, in seconds (Groq) | 1.3 |
Error Response Format
Section titled “Error Response Format”Standard errors return a JSON body:
{ "error": { "type": "error", "code": "ERROR_CODE", "message": "Human-readable description" }}Rate limit errors (429) include a Retry-After header with the number of seconds to wait.
Monthly limit errors include additional fields:
{ "error": { "type": "error", "code": "MONTHLY_LIMIT_EXCEEDED", "message": "Monthly request limit reached", "limit": 100000, "used": 100000, "reset_at": "2025-02-01T00:00:00Z" }}See Troubleshooting for a full list of error codes and how to resolve them.
Session Feedback
Section titled “Session Feedback”Submit session-level feedback to improve routing quality. Authenticated with the same API key used for chat requests.
POST https://api.floopy.ai/v1/feedbackRequest Body
Section titled “Request Body”{ "session_id": "sess_abc123", "score": 8, "useful": true}Fields
Section titled “Fields”| Field | Type | Required | Description |
|---|---|---|---|
session_id | string | Yes | The session ID sent in the floopy-session-id header during chat requests |
score | integer | Yes | NPS-style score from 0 to 10. 0-6 = detractor, 7-8 = passive, 9-10 = promoter |
useful | boolean | Yes | Whether the conversation was useful to the end user |
Response
Section titled “Response”Returns 200 OK with an empty JSON object on success.
{}See Feedback for details on how feedback drives routing optimization.
Embeddings
Section titled “Embeddings”OpenAI-compatible embeddings endpoint. Forwards to whichever provider is configured for the requested model. Currently routes through OpenAI-compatible providers (OpenAI, Mistral, Together, Groq, DeepInfra, Nebius, Fireworks, Cohere, Azure, etc.). Anthropic does not expose an embeddings API; Bedrock, Gemini, and Vertex are reserved for follow-up.
Endpoint
Section titled “Endpoint”POST /v1/embeddingsAuthentication
Section titled “Authentication”Same Bearer token as chat completions:
Authorization: Bearer <FLOOPY_API_KEY>Request Body
Section titled “Request Body”{ "model": "text-embedding-3-small", "input": "The quick brown fox jumps over the lazy dog"}input may be a single string or an array of strings for batch embedding:
{ "model": "text-embedding-3-small", "input": ["first text", "second text", "third text"]}Fields
Section titled “Fields”| Field | Type | Required | Description |
|---|---|---|---|
model | string | Yes | Embedding model id (e.g., text-embedding-3-small, text-embedding-3-large) |
input | string or string[] | Yes | Text(s) to embed |
dimensions | integer | No | Output vector length, when the model supports it (e.g. OpenAI text-embedding-3-*) |
encoding_format | string | No | float (default) or base64 |
user | string | No | Free-form caller identifier — forwarded to the provider for analytics |
Response Body
Section titled “Response Body”{ "object": "list", "data": [ { "object": "embedding", "embedding": [0.0123, -0.0456, 0.0789, "..."], "index": 0 } ], "model": "text-embedding-3-small", "usage": { "prompt_tokens": 9, "total_tokens": 9 }}For batched inputs, data contains one entry per input in request order.
Code Samples
Section titled “Code Samples”curl
curl https://api.floopy.ai/v1/embeddings \ -H "Authorization: Bearer $FLOOPY_API_KEY" \ -H "Content-Type: application/json" \ -d '{"model":"text-embedding-3-small","input":"hello"}'Python (OpenAI SDK)
from openai import OpenAIclient = OpenAI(api_key=os.environ["FLOOPY_API_KEY"], base_url="https://api.floopy.ai/v1")resp = client.embeddings.create(model="text-embedding-3-small", input="hello")print(resp.data[0].embedding[:5])JavaScript (OpenAI SDK)
import OpenAI from "openai";const client = new OpenAI({ apiKey: process.env.FLOOPY_API_KEY, baseURL: "https://api.floopy.ai/v1" });const resp = await client.embeddings.create({ model: "text-embedding-3-small", input: "hello" });console.log(resp.data[0].embedding.slice(0, 5));Error Response Format
Section titled “Error Response Format”OpenAI-compatible error envelope. When no provider configured for the org supports the requested embedding model, the gateway returns 400 with:
{ "error": { "message": "The model `claude-3-5-sonnet` does not exist or you do not have access to it.", "type": "invalid_request_error", "code": "model_not_found", "param": "model" }}Other failure modes (auth, rate limit, provider 5xx) reuse the same error shape used by /v1/chat/completions.
Health Endpoints
Section titled “Health Endpoints”Use these endpoints to monitor gateway and dependency status.
| Endpoint | Description |
|---|---|
GET /health | Liveness probe (always 200) |
GET /health/ready | Readiness probe (checks Redis, ClickHouse, Qdrant) |
GET /health/redis | Redis connectivity check |
GET /health/clickhouse | ClickHouse connectivity check |
GET /health/qdrant | Qdrant connectivity check |
GET /health/providers | Circuit breaker state per provider |