Skip to content

Headers Reference

Floopy uses custom HTTP headers to control gateway behavior on a per-request basis. Pass these headers alongside your standard Authorization and Content-Type headers when calling the gateway.

Request Headers

Cache Headers

Control how the gateway caches requests and responses.

HeaderTypeDescriptionExample
Floopy-Cache-EnabledbooleanEnable or disable exact cache for this request.true
Floopy-Cache-SeedstringIsolates exact cache entries by seed value. Semantic and advanced tiers match by embedding similarity and are not affected by the seed.deterministic-seed-abc
Floopy-Cache-Bucket-Max-SizeintegerMaximum number of cached responses stored per cache key.3
Floopy-Cache-Ignore-KeysstringComma-separated list of message keys to ignore when computing the cache key.timestamp,request_id
floopy-cache-advancedbooleanEnable advanced semantic cache (Qdrant-backed). Matches requests by meaning rather than exact content.true
cache-controlstringStandard HTTP cache-control header. Used to override the default TTL.max-age=3600

Prompt Headers

Reference managed prompts from the Floopy prompt library.

HeaderTypeDescriptionExample
floopy-prompt-idstring (UUID)The UUID of a prompt from the prompt library. The gateway resolves and injects it at request time.e51a2820-8ab5-4d6a-96a0-cc7bb4759371
floopy-prompt-versionintegerPin a specific prompt version. If omitted, the gateway uses the latest version.2

Body Field: inputs

When using a managed prompt with floopy-prompt-id, you can pass an inputs object in the JSON request body to fill template variables. The gateway substitutes {{key}} placeholders in the prompt template with the corresponding values from inputs.

{
"model": "gpt-4o",
"messages": [{"role": "user", "content": "placeholder"}],
"inputs": {
"language": "English",
"topic": "quantum computing"
}
}

Resolution order: inputs values take priority, then template defaults from the prompt config, then the placeholder is left as-is. The inputs field is stripped before the request reaches the LLM provider.

See the Prompt Management guide for full examples.

Security Headers

Enable the LLM firewall to scan prompts for injection attacks and unsafe content.

HeaderTypeDescriptionExample
floopy-llm-security-enabledbooleanRun the LLM firewall for this request. The firewall sends the prompt to a safety-tuned model (configured via FIREWALL_MODEL) and blocks any classified as unsafe. A Qdrant verdict cache short-circuits repeat unsafe prompts.true

Token Handling Headers (Coming Soon)

Control how the gateway handles requests that exceed a model’s context window. This feature is planned and not yet available.

HeaderTypeDescriptionExample
Floopy-Token-Limit-Exception-HandlerstringStrategy to apply when the request exceeds the model’s token limit.truncate

The three planned strategies are:

  • truncate — Removes messages from the beginning of the conversation to fit within the model’s token limit. The system message and the most recent messages are preserved.
  • middle-out — Keeps the first and last messages in the conversation and removes messages from the middle. Useful when both the initial context and the latest user message are important.
  • fallback — Switches to a model with a larger context window instead of modifying the messages. The gateway selects an appropriate model from the same provider.

Routing Headers

Override the default routing behavior for a single request.

HeaderTypeDescriptionExample
floopy-model-overridestringOverride the model without changing the request body.gpt-4o-mini
floopy-routing-rulestring (UUID)Override the routing rule applied to this request.a3f1b2c4-5678-9def-ghij-klmnopqrstuv
floopy-ab-teststring (UUID)A/B test ID. The gateway resolves the assigned variant for this request.b7e2c3d4-1234-5678-abcd-ef0123456789
floopy-smart-selectstring (UUID)Smart Selector ID. The gateway picks the best model based on the selector’s configuration.c8f3d4e5-2345-6789-bcde-f01234567890

Rate Limit Headers

Override the default rate limit policy for a single request.

HeaderTypeDescriptionExample
floopy-ratelimit-policystringCustom rate limit policy for this request.100;w=60;u=request;s=global

The policy format is <limit>;w=<window>;u=<unit>;s=<segment>:

  • limit — The maximum number of allowed units within the window.
  • w (window) — Time window in seconds. The minimum window is 60 seconds.
  • u (unit) — The unit to count: request (number of requests) or cents (cost in cents).
  • s (segment) — How to segment the limit: global (shared across all users), user (per end-user via floopy-user-id), or custom (per custom key).

Example: 100;w=60;u=request;s=global means 100 requests per 60 seconds, applied globally.

Project Scoping

Segment requests by project for per-project cost tracking and analytics.

HeaderTypeDescriptionExample
floopy-project-idstring (UUID)Project identifier. Tags the request with a project for cost allocation and dashboard filtering. If the API key is hard-locked to a project, this header is optional. A mismatched UUID returns 403.a1b2c3d4-5678-9abc-def0-123456789abc

See the Projects guide for the full fallback chain and environment model.

Session and Property Headers

Attach session metadata and custom properties to requests for tracking and analytics.

HeaderTypeDescriptionExample
floopy-user-idstringEnd-user identifier. Used for per-user rate limiting and analytics.user-alice-001
floopy-session-idstringSession identifier. Groups related requests together.sess-abc123
floopy-session-namestringHuman-readable session name for display in the dashboard.math-tutoring
floopy-session-pathstringSession path or location within your application./dashboard/math
floopy-property-*stringCustom property header. Any suffix after floopy-property- becomes the property key.floopy-property-usertier: premium

Custom properties appear in the observability dashboard and can be used to filter and group requests.

Response Headers

The gateway adds these headers to every response. They provide metadata about how the request was processed.

HeaderDescriptionExample
Floopy-ProviderThe provider that handled the request.OpenAI
Floopy-ModelThe model that processed the request.gpt-4o
Floopy-Fallback-UsedWhether a fallback provider was used because the primary was unavailable.true
Floopy-Reasoning-TokensNumber of reasoning tokens used (DeepSeek models).150
Floopy-Queue-TimeTime the request spent in the provider queue, in seconds (Groq).0.5
Floopy-Prompt-TimeTime spent processing the prompt, in seconds (Groq).0.2
Floopy-Completion-TimeTime spent generating the completion, in seconds (Groq).1.3