LLM Firewall

Overview

The Floopy LLM Firewall inspects every prompt before it is forwarded to the AI provider. Requests that are flagged as malicious, unsafe, or policy-violating are blocked and never reach the model. This protects your application from prompt injection, jailbreak attempts, and harmful content generation.

The firewall runs inline — it adds minimal latency to safe requests and completely blocks dangerous ones.

How It Works

The firewall sends every prompt to a safety-tuned LLM that classifies the message as safe or unsafe against the categories below. The verdict gates the request: unsafe blocks; safe (or any failure of the LLM) lets the request through.

Categories considered unsafe:

Prompt-injection attacks and jailbreak attempts
Requests for self-harm content
Sexual content involving minors
Illegal-activity instructions
Hate speech

To keep latency and cost bounded, the firewall keeps a verdict cache in Qdrant: when a recent unsafe verdict’s embedding is similar enough to the incoming request (configurable threshold), the cached verdict short-circuits the LLM call. Only unsafe verdicts are cached — safe results are always recomputed so a model upgrade can flip them without a runbook step.

Blocked Requests

When the firewall blocks a request, it returns a 400 status code with an error message indicating the request was rejected by the security filter. The original prompt is never sent to the AI provider.

Your application should handle this response gracefully — for example, by showing a user-friendly message that the input could not be processed.

Configuration

The firewall model and cache behaviour are controlled by environment variables:

Variable	Purpose	Default
`FIREWALL_MODEL`	Backend + model used for the safety classification, in `[provider](model):weight` notation	`meta-llama/Llama-Guard-4-12B` (Together)
`FIREWALL_SEMANTIC_CACHE_THRESHOLD`	Cosine-similarity threshold for verdict-cache hits	`0.95`
`FIREWALL_VERDICT_CACHE_TTL_SECONDS`	TTL on cached `unsafe` verdicts	`172800` (48h)

Distributions like [bedrock](anthropic.claude-3-haiku-20240307-v1:0):0.5,[together](meta-llama/Llama-Guard-4-12B):0.5 route between providers using a request_id-seeded RNG so any single request always picks the same backend.

Monitoring Firewall Events

Every firewall decision — both allowed and blocked — is logged and visible in the dashboard under Firewall. Each event includes:

The timestamp and API key used.
The verdict (safe / unsafe) and whether the cache short-circuited the LLM call.
The backend + model that produced the verdict (backend, model_ref columns on the request row).
Whether the request was allowed or blocked.

Use the event log to identify attack patterns and verify that legitimate requests are not being blocked.

Enabling the Firewall via Headers

You can enable the firewall on a per-request basis using the header below, regardless of the default configuration for your API key. This is useful when only certain requests need security scanning.

Header	Values	Description
`floopy-llm-security-enabled`	`"true"`	Run the firewall for this request

import { OpenAI } from "openai";

const client = new OpenAI({
  baseURL: "https://api.floopy.ai/v1",
  apiKey: process.env.FLOOPY_API_KEY,
});

const response = await client.chat.completions.create(
  {
    model: "gpt-4o",
    messages: [{ role: "user", content: userInput }],
  },
  {
    headers: {
      "floopy-llm-security-enabled": "true",
    },
  },
);

console.log(response.choices[0].message.content);

from openai import OpenAI
import os

client = OpenAI(
    base_url="https://api.floopy.ai/v1",
    api_key=os.environ["FLOOPY_API_KEY"],
)

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": user_input}],
    extra_headers={
        "floopy-llm-security-enabled": "true",
    },
)

print(response.choices[0].message.content)

curl https://api.floopy.ai/v1/chat/completions \
  -H "Authorization: Bearer $FLOOPY_API_KEY" \
  -H "Content-Type: application/json" \
  -H "floopy-llm-security-enabled: true" \
  -d '{
    "model": "gpt-4o",
    "messages": [{"role": "user", "content": "Summarize this article for me."}]
  }'

Plan Requirements

The LLM Firewall requires a plan with the has_advanced_firewall feature enabled. Check your current plan under Settings > Billing to see if the firewall is available for your organization.