Skip to content
Sign In Get Started

API Reference

The Floopy gateway exposes an OpenAI-compatible API. Point any OpenAI SDK at the gateway by changing the baseURL, and your existing code works without modification.

POST https://api.floopy.ai/v1/chat/completions

Include your Floopy API key in the Authorization header:

Authorization: Bearer <your-floopy-api-key>
{
"model": "gpt-4o",
"messages": [
{ "role": "system", "content": "You are a helpful assistant." },
{ "role": "user", "content": "Hello!" }
],
"temperature": 0.7,
"max_tokens": 1000,
"stream": false,
"inputs": { "name": "Alice" }
}
FieldTypeRequiredDescription
modelstringYesModel ID (e.g., "gpt-4o"). Comma-separated for fallback: "gpt-4o,claude-sonnet-4-6"
messagesarrayYesArray of message objects with role ("system", "user", "assistant") and content (string)
temperaturenumberNo0.0-2.0. Controls randomness. Default: model default
max_tokensintegerNoMaximum tokens in response
top_pnumberNo0.0-1.0. Nucleus sampling threshold
frequency_penaltynumberNo-2.0 to 2.0. Reduces token repetition
presence_penaltynumberNo-2.0 to 2.0. Encourages new topics
stoparrayNoStrings that stop generation when encountered
reasoning_effortstringNo"low", "medium", "high". For o1/o3 models only
response_formatobjectNoJSON mode: {"type": "json_object"}
streambooleanNoEnable SSE streaming. Default: false
inputsobjectNoKey-value pairs for prompt template variable substitution. Stripped before forwarding to provider
{
"id": "chatcmpl-abc123",
"object": "chat.completion",
"created": 1717000000,
"model": "gpt-4o",
"choices": [
{
"index": 0,
"message": { "role": "assistant", "content": "Hello! How can I help?" },
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 12,
"completion_tokens": 8,
"total_tokens": 20
}
}

When stream: true, the gateway returns Server-Sent Events (SSE). Each event contains a chunk:

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1717000000,"model":"gpt-4o","choices":[{"index":0,"delta":{"content":"Hello"},"finish_reason":null}]}

The stream ends with:

data: [DONE]

The gateway adds these headers to every response:

HeaderDescriptionExample
Floopy-ProviderThe provider that handled the requestOpenAI
Floopy-ModelThe model that processed the requestgpt-4o
Floopy-Fallback-UsedWhether a fallback provider was used because the primary was unavailabletrue
Floopy-Reasoning-TokensNumber of reasoning tokens used (DeepSeek models)150
Floopy-Queue-TimeTime the request spent in the provider queue, in seconds (Groq)0.5
Floopy-Prompt-TimeTime spent processing the prompt, in seconds (Groq)0.2
Floopy-Completion-TimeTime spent generating the completion, in seconds (Groq)1.3

Standard errors return a JSON body:

{
"error": {
"type": "error",
"code": "ERROR_CODE",
"message": "Human-readable description"
}
}

Rate limit errors (429) include a Retry-After header with the number of seconds to wait.

Monthly limit errors include additional fields:

{
"error": {
"type": "error",
"code": "MONTHLY_LIMIT_EXCEEDED",
"message": "Monthly request limit reached",
"limit": 100000,
"used": 100000,
"reset_at": "2025-02-01T00:00:00Z"
}
}

See Troubleshooting for a full list of error codes and how to resolve them.

Submit session-level feedback to improve routing quality. Authenticated with the same API key used for chat requests.

POST https://api.floopy.ai/v1/feedback
{
"session_id": "sess_abc123",
"score": 8,
"useful": true
}
FieldTypeRequiredDescription
session_idstringYesThe session ID sent in the floopy-session-id header during chat requests
scoreintegerYesNPS-style score from 0 to 10. 0-6 = detractor, 7-8 = passive, 9-10 = promoter
usefulbooleanYesWhether the conversation was useful to the end user

Returns 200 OK with an empty JSON object on success.

{}

See Feedback for details on how feedback drives routing optimization.

OpenAI-compatible embeddings endpoint. Forwards to whichever provider is configured for the requested model. Currently routes through OpenAI-compatible providers (OpenAI, Mistral, Together, Groq, DeepInfra, Nebius, Fireworks, Cohere, Azure, etc.). Anthropic does not expose an embeddings API; Bedrock, Gemini, and Vertex are reserved for follow-up.

POST /v1/embeddings

Same Bearer token as chat completions:

Authorization: Bearer <FLOOPY_API_KEY>
{
"model": "text-embedding-3-small",
"input": "The quick brown fox jumps over the lazy dog"
}

input may be a single string or an array of strings for batch embedding:

{
"model": "text-embedding-3-small",
"input": ["first text", "second text", "third text"]
}
FieldTypeRequiredDescription
modelstringYesEmbedding model id (e.g., text-embedding-3-small, text-embedding-3-large)
inputstring or string[]YesText(s) to embed
dimensionsintegerNoOutput vector length, when the model supports it (e.g. OpenAI text-embedding-3-*)
encoding_formatstringNofloat (default) or base64
userstringNoFree-form caller identifier — forwarded to the provider for analytics
{
"object": "list",
"data": [
{
"object": "embedding",
"embedding": [0.0123, -0.0456, 0.0789, "..."],
"index": 0
}
],
"model": "text-embedding-3-small",
"usage": {
"prompt_tokens": 9,
"total_tokens": 9
}
}

For batched inputs, data contains one entry per input in request order.

curl

Terminal window
curl https://api.floopy.ai/v1/embeddings \
-H "Authorization: Bearer $FLOOPY_API_KEY" \
-H "Content-Type: application/json" \
-d '{"model":"text-embedding-3-small","input":"hello"}'

Python (OpenAI SDK)

from openai import OpenAI
client = OpenAI(api_key=os.environ["FLOOPY_API_KEY"], base_url="https://api.floopy.ai/v1")
resp = client.embeddings.create(model="text-embedding-3-small", input="hello")
print(resp.data[0].embedding[:5])

JavaScript (OpenAI SDK)

import OpenAI from "openai";
const client = new OpenAI({ apiKey: process.env.FLOOPY_API_KEY, baseURL: "https://api.floopy.ai/v1" });
const resp = await client.embeddings.create({ model: "text-embedding-3-small", input: "hello" });
console.log(resp.data[0].embedding.slice(0, 5));

OpenAI-compatible error envelope. When no provider configured for the org supports the requested embedding model, the gateway returns 400 with:

{
"error": {
"message": "The model `claude-3-5-sonnet` does not exist or you do not have access to it.",
"type": "invalid_request_error",
"code": "model_not_found",
"param": "model"
}
}

Other failure modes (auth, rate limit, provider 5xx) reuse the same error shape used by /v1/chat/completions.

Use these endpoints to monitor gateway and dependency status.

EndpointDescription
GET /healthLiveness probe (always 200)
GET /health/readyReadiness probe (checks Redis, ClickHouse, Qdrant)
GET /health/redisRedis connectivity check
GET /health/clickhouseClickHouse connectivity check
GET /health/qdrantQdrant connectivity check
GET /health/providersCircuit breaker state per provider