API Reference

The Floopy gateway exposes an OpenAI-compatible API. Point any OpenAI SDK at the gateway by changing the baseURL, and your existing code works without modification.

Endpoint

POST https://api.floopy.ai/v1/chat/completions

Authentication

Include your Floopy API key in the Authorization header:

Authorization: Bearer <your-floopy-api-key>

Request Body

{
  "model": "gpt-4o",
  "messages": [
    { "role": "system", "content": "You are a helpful assistant." },
    { "role": "user", "content": "Hello!" }
  ],
  "temperature": 0.7,
  "max_tokens": 1000,
  "stream": false,
  "inputs": { "name": "Alice" }
}

Fields

Field	Type	Required	Description
`model`	string	Yes	Model ID (e.g., `"gpt-4o"`). Comma-separated for fallback: `"gpt-4o,claude-sonnet-4-6"`
`messages`	array	Yes	Array of message objects with `role` (`"system"`, `"user"`, `"assistant"`) and `content` (string)
`temperature`	number	No	0.0-2.0. Controls randomness. Default: model default
`max_tokens`	integer	No	Maximum tokens in response
`top_p`	number	No	0.0-1.0. Nucleus sampling threshold
`frequency_penalty`	number	No	-2.0 to 2.0. Reduces token repetition
`presence_penalty`	number	No	-2.0 to 2.0. Encourages new topics
`stop`	array	No	Strings that stop generation when encountered
`reasoning_effort`	string	No	`"low"`, `"medium"`, `"high"`. For o1/o3 models only
`response_format`	object	No	JSON mode: `{"type": "json_object"}`
`stream`	boolean	No	Enable SSE streaming. Default: `false`
`inputs`	object	No	Key-value pairs for prompt template variable substitution. Stripped before forwarding to provider

Response Body (non-streaming)

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1717000000,
  "model": "gpt-4o",
  "choices": [
    {
      "index": 0,
      "message": { "role": "assistant", "content": "Hello! How can I help?" },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 12,
    "completion_tokens": 8,
    "total_tokens": 20
  }
}

Response Body (streaming)

When stream: true, the gateway returns Server-Sent Events (SSE). Each event contains a chunk:

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1717000000,"model":"gpt-4o","choices":[{"index":0,"delta":{"content":"Hello"},"finish_reason":null}]}

The stream ends with:

data: [DONE]

Response Headers

The gateway adds these headers to every response:

Header	Description	Example
`Floopy-Provider`	The provider that handled the request	`OpenAI`
`Floopy-Model`	The model that processed the request	`gpt-4o`
`Floopy-Fallback-Used`	Whether a fallback provider was used because the primary was unavailable	`true`
`Floopy-Reasoning-Tokens`	Number of reasoning tokens used (DeepSeek models)	`150`
`Floopy-Queue-Time`	Time the request spent in the provider queue, in seconds (Groq)	`0.5`
`Floopy-Prompt-Time`	Time spent processing the prompt, in seconds (Groq)	`0.2`
`Floopy-Completion-Time`	Time spent generating the completion, in seconds (Groq)	`1.3`

Error Response Format

Standard errors return a JSON body:

{
  "error": {
    "type": "error",
    "code": "ERROR_CODE",
    "message": "Human-readable description"
  }
}

Rate limit errors (429) include a Retry-After header with the number of seconds to wait.

Monthly limit errors include additional fields:

{
  "error": {
    "type": "error",
    "code": "MONTHLY_LIMIT_EXCEEDED",
    "message": "Monthly request limit reached",
    "limit": 100000,
    "used": 100000,
    "reset_at": "2025-02-01T00:00:00Z"
  }
}

See Troubleshooting for a full list of error codes and how to resolve them.

Session Feedback

Submit session-level feedback to improve routing quality. Authenticated with the same API key used for chat requests.

POST https://api.floopy.ai/v1/feedback

Request Body

{
  "session_id": "sess_abc123",
  "score": 8,
  "useful": true
}

Fields

Field	Type	Required	Description
`session_id`	string	Yes	The session ID sent in the `floopy-session-id` header during chat requests
`score`	integer	Yes	NPS-style score from 0 to 10. 0-6 = detractor, 7-8 = passive, 9-10 = promoter
`useful`	boolean	Yes	Whether the conversation was useful to the end user

Response

Returns 200 OK with an empty JSON object on success.

{}

See Feedback for details on how feedback drives routing optimization.

Embeddings

OpenAI-compatible embeddings endpoint. Forwards to whichever provider is configured for the requested model. Currently routes through OpenAI-compatible providers (OpenAI, Mistral, Together, Groq, DeepInfra, Nebius, Fireworks, Cohere, Azure, etc.). Anthropic does not expose an embeddings API; Bedrock, Gemini, and Vertex are reserved for follow-up.

Endpoint

POST /v1/embeddings

Authentication

Same Bearer token as chat completions:

Authorization: Bearer <FLOOPY_API_KEY>

Request Body

{
  "model": "text-embedding-3-small",
  "input": "The quick brown fox jumps over the lazy dog"
}

input may be a single string or an array of strings for batch embedding:

{
  "model": "text-embedding-3-small",
  "input": ["first text", "second text", "third text"]
}

Fields

Field	Type	Required	Description
`model`	string	Yes	Embedding model id (e.g., `text-embedding-3-small`, `text-embedding-3-large`)
`input`	string or string[]	Yes	Text(s) to embed
`dimensions`	integer	No	Output vector length, when the model supports it (e.g. OpenAI `text-embedding-3-*`)
`encoding_format`	string	No	`float` (default) or `base64`
`user`	string	No	Free-form caller identifier — forwarded to the provider for analytics

Response Body

{
  "object": "list",
  "data": [
    {
      "object": "embedding",
      "embedding": [0.0123, -0.0456, 0.0789, "..."],
      "index": 0
    }
  ],
  "model": "text-embedding-3-small",
  "usage": {
    "prompt_tokens": 9,
    "total_tokens": 9
  }
}

For batched inputs, data contains one entry per input in request order.

Code Samples

curl

curl https://api.floopy.ai/v1/embeddings \
  -H "Authorization: Bearer $FLOOPY_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"model":"text-embedding-3-small","input":"hello"}'

Python (OpenAI SDK)

from openai import OpenAI
client = OpenAI(api_key=os.environ["FLOOPY_API_KEY"], base_url="https://api.floopy.ai/v1")
resp = client.embeddings.create(model="text-embedding-3-small", input="hello")
print(resp.data[0].embedding[:5])

JavaScript (OpenAI SDK)

import OpenAI from "openai";
const client = new OpenAI({ apiKey: process.env.FLOOPY_API_KEY, baseURL: "https://api.floopy.ai/v1" });
const resp = await client.embeddings.create({ model: "text-embedding-3-small", input: "hello" });
console.log(resp.data[0].embedding.slice(0, 5));

Error Response Format

OpenAI-compatible error envelope. When no provider configured for the org supports the requested embedding model, the gateway returns 400 with:

{
  "error": {
    "message": "The model `claude-3-5-sonnet` does not exist or you do not have access to it.",
    "type": "invalid_request_error",
    "code": "model_not_found",
    "param": "model"
  }
}

Other failure modes (auth, rate limit, provider 5xx) reuse the same error shape used by /v1/chat/completions.

Health Endpoints

Use these endpoints to monitor gateway and dependency status.

Endpoint	Description
`GET /health`	Liveness probe (always 200)
`GET /health/ready`	Readiness probe (checks Redis, ClickHouse, Qdrant)
`GET /health/redis`	Redis connectivity check
`GET /health/clickhouse`	ClickHouse connectivity check
`GET /health/qdrant`	Qdrant connectivity check
`GET /health/providers`	Circuit breaker state per provider