LLM Firewall
Overview
The Floopy LLM Firewall inspects every prompt before it is forwarded to the AI provider. Requests that are flagged as malicious, unsafe, or policy-violating are blocked and never reach the model. This protects your application from prompt injection, jailbreak attempts, and harmful content generation.
The firewall runs inline — it adds minimal latency to safe requests and completely blocks dangerous ones.
How It Works
The firewall sends every prompt to a safety-tuned LLM that classifies the message as safe or unsafe against the categories below. The verdict gates the request: unsafe blocks; safe (or any failure of the LLM) lets the request through.
Categories considered unsafe:
- Prompt-injection attacks and jailbreak attempts
- Requests for self-harm content
- Sexual content involving minors
- Illegal-activity instructions
- Hate speech
To keep latency and cost bounded, the firewall keeps a verdict cache in Qdrant: when a recent unsafe verdict’s embedding is similar enough to the incoming request (configurable threshold), the cached verdict short-circuits the LLM call. Only unsafe verdicts are cached — safe results are always recomputed so a model upgrade can flip them without a runbook step.
Blocked Requests
When the firewall blocks a request, it returns a 400 status code with an error message indicating the request was rejected by the security filter. The original prompt is never sent to the AI provider.
Your application should handle this response gracefully — for example, by showing a user-friendly message that the input could not be processed.
Configuration
The firewall model and cache behaviour are controlled by environment variables:
| Variable | Purpose | Default |
|---|---|---|
FIREWALL_MODEL | Backend + model used for the safety classification, in [provider](model):weight notation | meta-llama/Llama-Guard-4-12B (Together) |
FIREWALL_SEMANTIC_CACHE_THRESHOLD | Cosine-similarity threshold for verdict-cache hits | 0.95 |
FIREWALL_VERDICT_CACHE_TTL_SECONDS | TTL on cached unsafe verdicts | 172800 (48h) |
Distributions like [bedrock](anthropic.claude-3-haiku-20240307-v1:0):0.5,[together](meta-llama/Llama-Guard-4-12B):0.5 route between providers using a request_id-seeded RNG so any single request always picks the same backend.
Monitoring Firewall Events
Every firewall decision — both allowed and blocked — is logged and visible in the dashboard under Firewall. Each event includes:
- The timestamp and API key used.
- The verdict (
safe/unsafe) and whether the cache short-circuited the LLM call. - The backend + model that produced the verdict (
backend,model_refcolumns on the request row). - Whether the request was allowed or blocked.
Use the event log to identify attack patterns and verify that legitimate requests are not being blocked.
Enabling the Firewall via Headers
You can enable the firewall on a per-request basis using the header below, regardless of the default configuration for your API key. This is useful when only certain requests need security scanning.
| Header | Values | Description |
|---|---|---|
floopy-llm-security-enabled | "true" | Run the firewall for this request |
import { OpenAI } from "openai";
const client = new OpenAI({ baseURL: "https://api.floopy.ai/v1", apiKey: process.env.FLOOPY_API_KEY,});
const response = await client.chat.completions.create( { model: "gpt-4o", messages: [{ role: "user", content: userInput }], }, { headers: { "floopy-llm-security-enabled": "true", }, },);
console.log(response.choices[0].message.content);from openai import OpenAIimport os
client = OpenAI( base_url="https://api.floopy.ai/v1", api_key=os.environ["FLOOPY_API_KEY"],)
response = client.chat.completions.create( model="gpt-4o", messages=[{"role": "user", "content": user_input}], extra_headers={ "floopy-llm-security-enabled": "true", },)
print(response.choices[0].message.content)curl https://api.floopy.ai/v1/chat/completions \ -H "Authorization: Bearer $FLOOPY_API_KEY" \ -H "Content-Type: application/json" \ -H "floopy-llm-security-enabled: true" \ -d '{ "model": "gpt-4o", "messages": [{"role": "user", "content": "Summarize this article for me."}] }'Plan Requirements
The LLM Firewall requires a plan with the has_advanced_firewall feature enabled. Check your current plan under Settings > Billing to see if the firewall is available for your organization.