Headers Reference

Floopy uses custom HTTP headers to control gateway behavior on a per-request basis. Pass these headers alongside your standard Authorization and Content-Type headers when calling the gateway.

Request Headers

Cache Headers

Control how the gateway caches requests and responses.

Header	Type	Description	Example
`Floopy-Cache-Enabled`	boolean	Enable or disable exact cache for this request.	`true`
`Floopy-Cache-Seed`	string	Isolates exact cache entries by seed value. Semantic and advanced tiers match by embedding similarity and are not affected by the seed.	`deterministic-seed-abc`
`Floopy-Cache-Bucket-Max-Size`	integer	Maximum number of cached responses stored per cache key.	`3`
`Floopy-Cache-Ignore-Keys`	string	Comma-separated list of message keys to ignore when computing the cache key.	`timestamp,request_id`
`floopy-cache-advanced`	boolean	Enable advanced semantic cache (Qdrant-backed). Matches requests by meaning rather than exact content.	`true`
`cache-control`	string	Standard HTTP cache-control header. Used to override the default TTL.	`max-age=3600`

Prompt Headers

Reference managed prompts from the Floopy prompt library.

Header	Type	Description	Example
`floopy-prompt-id`	string (UUID)	The UUID of a prompt from the prompt library. The gateway resolves and injects it at request time.	`e51a2820-8ab5-4d6a-96a0-cc7bb4759371`
`floopy-prompt-version`	integer	Pin a specific prompt version. If omitted, the gateway uses the latest version.	`2`

Body Field: `inputs`

When using a managed prompt with floopy-prompt-id, you can pass an inputs object in the JSON request body to fill template variables. The gateway substitutes {{key}} placeholders in the prompt template with the corresponding values from inputs.

{
  "model": "gpt-4o",
  "messages": [{"role": "user", "content": "placeholder"}],
  "inputs": {
    "language": "English",
    "topic": "quantum computing"
  }
}

Resolution order: inputs values take priority, then template defaults from the prompt config, then the placeholder is left as-is. The inputs field is stripped before the request reaches the LLM provider.

See the Prompt Management guide for full examples.

Security Headers

Enable the LLM firewall to scan prompts for injection attacks and unsafe content.

Header	Type	Description	Example
`floopy-llm-security-enabled`	boolean	Run the LLM firewall for this request. The firewall sends the prompt to a safety-tuned model (configured via `FIREWALL_MODEL`) and blocks any classified as `unsafe`. A Qdrant verdict cache short-circuits repeat unsafe prompts.	`true`

Token Handling Headers (Coming Soon)

Control how the gateway handles requests that exceed a model’s context window. This feature is planned and not yet available.

Header	Type	Description	Example
`Floopy-Token-Limit-Exception-Handler`	string	Strategy to apply when the request exceeds the model’s token limit.	`truncate`

The three planned strategies are:

truncate — Removes messages from the beginning of the conversation to fit within the model’s token limit. The system message and the most recent messages are preserved.
middle-out — Keeps the first and last messages in the conversation and removes messages from the middle. Useful when both the initial context and the latest user message are important.
fallback — Switches to a model with a larger context window instead of modifying the messages. The gateway selects an appropriate model from the same provider.

Routing Headers

Override the default routing behavior for a single request.

Header	Type	Description	Example
`floopy-model-override`	string	Override the model without changing the request body.	`gpt-4o-mini`
`floopy-routing-rule`	string (UUID)	Override the routing rule applied to this request.	`a3f1b2c4-5678-9def-ghij-klmnopqrstuv`
`floopy-ab-test`	string (UUID)	A/B test ID. The gateway resolves the assigned variant for this request.	`b7e2c3d4-1234-5678-abcd-ef0123456789`
`floopy-smart-select`	string (UUID)	Smart Selector ID. The gateway picks the best model based on the selector’s configuration.	`c8f3d4e5-2345-6789-bcde-f01234567890`

Rate Limit Headers

Override the default rate limit policy for a single request.

Header	Type	Description	Example
`floopy-ratelimit-policy`	string	Custom rate limit policy for this request.	`100;w=60;u=request;s=global`

The policy format is <limit>;w=<window>;u=<unit>;s=<segment>:

limit — The maximum number of allowed units within the window.
w (window) — Time window in seconds. The minimum window is 60 seconds.
u (unit) — The unit to count: request (number of requests) or cents (cost in cents).
s (segment) — How to segment the limit: global (shared across all users), user (per end-user via floopy-user-id), or custom (per custom key).

Example: 100;w=60;u=request;s=global means 100 requests per 60 seconds, applied globally.

Project Scoping

Segment requests by project for per-project cost tracking and analytics.

Header	Type	Description	Example
`floopy-project-id`	string (UUID)	Project identifier. Tags the request with a project for cost allocation and dashboard filtering. If the API key is hard-locked to a project, this header is optional. A mismatched UUID returns 403.	`a1b2c3d4-5678-9abc-def0-123456789abc`

See the Projects guide for the full fallback chain and environment model.

Session and Property Headers

Attach session metadata and custom properties to requests for tracking and analytics.

Header	Type	Description	Example
`floopy-user-id`	string	End-user identifier. Used for per-user rate limiting and analytics.	`user-alice-001`
`floopy-session-id`	string	Session identifier. Groups related requests together.	`sess-abc123`
`floopy-session-name`	string	Human-readable session name for display in the dashboard.	`math-tutoring`
`floopy-session-path`	string	Session path or location within your application.	`/dashboard/math`
`floopy-property-*`	string	Custom property header. Any suffix after `floopy-property-` becomes the property key.	`floopy-property-usertier: premium`

Custom properties appear in the observability dashboard and can be used to filter and group requests.

Response Headers

The gateway adds these headers to every response. They provide metadata about how the request was processed.

Header	Description	Example
`Floopy-Provider`	The provider that handled the request.	`OpenAI`
`Floopy-Model`	The model that processed the request.	`gpt-4o`
`Floopy-Fallback-Used`	Whether a fallback provider was used because the primary was unavailable.	`true`
`Floopy-Reasoning-Tokens`	Number of reasoning tokens used (DeepSeek models).	`150`
`Floopy-Queue-Time`	Time the request spent in the provider queue, in seconds (Groq).	`0.5`
`Floopy-Prompt-Time`	Time spent processing the prompt, in seconds (Groq).	`0.2`
`Floopy-Completion-Time`	Time spent generating the completion, in seconds (Groq).	`1.3`