API Reference
The Floopy gateway exposes an OpenAI-compatible API. Point any OpenAI SDK at the gateway by changing the baseURL, and your existing code works without modification.
Endpoint
POST https://api.floopy.ai/v1/chat/completionsAuthentication
Include your Floopy API key in the Authorization header:
Authorization: Bearer <your-floopy-api-key>Request Body
{ "model": "gpt-4o", "messages": [ { "role": "system", "content": "You are a helpful assistant." }, { "role": "user", "content": "Hello!" } ], "temperature": 0.7, "max_tokens": 1000, "stream": false, "inputs": { "name": "Alice" }}Fields
| Field | Type | Required | Description |
|---|---|---|---|
model | string | Yes | Model ID (e.g., "gpt-4o"). Comma-separated for fallback: "gpt-4o,claude-sonnet-4-6" |
messages | array | Yes | Array of message objects with role ("system", "user", "assistant") and content (string) |
temperature | number | No | 0.0-2.0. Controls randomness. Default: model default |
max_tokens | integer | No | Maximum tokens in response |
top_p | number | No | 0.0-1.0. Nucleus sampling threshold |
frequency_penalty | number | No | -2.0 to 2.0. Reduces token repetition |
presence_penalty | number | No | -2.0 to 2.0. Encourages new topics |
stop | array | No | Strings that stop generation when encountered |
reasoning_effort | string | No | "low", "medium", "high". For o1/o3 models only |
response_format | object | No | JSON mode: {"type": "json_object"} |
stream | boolean | No | Enable SSE streaming. Default: false |
inputs | object | No | Key-value pairs for prompt template variable substitution. Stripped before forwarding to provider |
Response Body (non-streaming)
{ "id": "chatcmpl-abc123", "object": "chat.completion", "created": 1717000000, "model": "gpt-4o", "choices": [ { "index": 0, "message": { "role": "assistant", "content": "Hello! How can I help?" }, "finish_reason": "stop" } ], "usage": { "prompt_tokens": 12, "completion_tokens": 8, "total_tokens": 20 }}Response Body (streaming)
When stream: true, the gateway returns Server-Sent Events (SSE). Each event contains a chunk:
data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1717000000,"model":"gpt-4o","choices":[{"index":0,"delta":{"content":"Hello"},"finish_reason":null}]}The stream ends with:
data: [DONE]Response Headers
The gateway adds these headers to every response:
| Header | Description | Example |
|---|---|---|
Floopy-Provider | The provider that handled the request | OpenAI |
Floopy-Model | The model that processed the request | gpt-4o |
Floopy-Fallback-Used | Whether a fallback provider was used because the primary was unavailable | true |
Floopy-Reasoning-Tokens | Number of reasoning tokens used (DeepSeek models) | 150 |
Floopy-Queue-Time | Time the request spent in the provider queue, in seconds (Groq) | 0.5 |
Floopy-Prompt-Time | Time spent processing the prompt, in seconds (Groq) | 0.2 |
Floopy-Completion-Time | Time spent generating the completion, in seconds (Groq) | 1.3 |
Error Response Format
Standard errors return a JSON body:
{ "error": { "type": "error", "code": "ERROR_CODE", "message": "Human-readable description" }}Rate limit errors (429) include a Retry-After header with the number of seconds to wait.
Monthly limit errors include additional fields:
{ "error": { "type": "error", "code": "MONTHLY_LIMIT_EXCEEDED", "message": "Monthly request limit reached", "limit": 100000, "used": 100000, "reset_at": "2025-02-01T00:00:00Z" }}See Troubleshooting for a full list of error codes and how to resolve them.
Session Feedback
Submit session-level feedback to improve routing quality. Authenticated with the same API key used for chat requests.
POST https://api.floopy.ai/v1/feedbackRequest Body
{ "session_id": "sess_abc123", "score": 8, "useful": true}Fields
| Field | Type | Required | Description |
|---|---|---|---|
session_id | string | Yes | The session ID sent in the floopy-session-id header during chat requests |
score | integer | Yes | NPS-style score from 0 to 10. 0-6 = detractor, 7-8 = passive, 9-10 = promoter |
useful | boolean | Yes | Whether the conversation was useful to the end user |
Response
Returns 200 OK with an empty JSON object on success.
{}See Feedback for details on how feedback drives routing optimization.
Health Endpoints
Use these endpoints to monitor gateway and dependency status.
| Endpoint | Description |
|---|---|
GET /health | Liveness probe (always 200) |
GET /health/ready | Readiness probe (checks Redis, ClickHouse, Qdrant) |
GET /health/redis | Redis connectivity check |
GET /health/clickhouse | ClickHouse connectivity check |
GET /health/qdrant | Qdrant connectivity check |
GET /health/providers | Circuit breaker state per provider |