How Floopy Protects Your LLM Traffic

When you route your AI API calls through Floopy, you’re trusting us with sensitive data: API keys, prompts, completions, user information, and billing details. This post walks through every security layer we’ve built — what each one does, why it exists, and how they work together.

Floopy is an AI agent optimization platform — the closed feedback loop is the product, and the gateway layer is the surface that carries your traffic. This post focuses specifically on that traffic-handling surface: the checks, crypto, and sandboxing we apply between your app and the AI providers.

The threat model

Any layer that sits between your app and a third-party LLM provider faces a unique combination of risks:

API key theft — your provider keys (OpenAI, Anthropic) are worth thousands of dollars. A leak means someone else runs up your bill.
Prompt injection — malicious inputs can hijack your LLM to bypass instructions, exfiltrate data, or produce harmful output.
Cross-tenant leakage — if org A’s cached response is served to org B, you have a data breach.
PII in logs — users put personal information in prompts. If logs are stored unredacted, a log breach becomes a privacy incident.
Denial of wallet — an attacker or runaway loop burns through your token budget in minutes.

Floopy addresses each of these with a dedicated security layer.

Layer 1: The Rust advantage

The gateway is written in Rust — not Python, not Node.js, not Go. This isn’t a performance choice (though it helps — see our benchmarks). It’s a security choice.

Rust’s compiler eliminates entire vulnerability classes at compile time:

Buffer overflows — the #1 cause of CVEs in C/C++ software. Impossible in safe Rust.
Null pointer dereferences — the type system forces you to handle missing values explicitly.
Data races — the borrow checker prevents concurrent mutable access. Period.
Use-after-free — ownership semantics guarantee memory is valid when accessed.

These aren’t runtime checks. Code that violates memory safety doesn’t compile. The vulnerability never ships.

Layer 2: API key security

Your Floopy API keys use a structured format:

flo_sk_live_<32_random_chars>_<crc32_checksum>

Three design decisions make this format secure:

flo_sk_ prefix — if your key leaks in a GitHub commit, GitGuardian, TruffleHog, and GitHub Advanced Security detect it automatically. Generic hex strings don’t trigger these scanners.
SHA-256 hashing — keys are hashed before storage. We literally cannot retrieve your key from our database. Every lookup compares hashes.
Instant revocation — when you revoke a key in the dashboard, the Redis cache is invalidated immediately. The key stops working within seconds, not minutes.

Layer 3: Provider key encryption

Your OpenAI, Anthropic, and Gemini API keys are encrypted at rest using XChaCha20-Poly1305 envelope encryption.

Why not AES-256-GCM? AES-GCM uses a 96-bit nonce. With random nonce generation, the birthday paradox creates a collision risk after ~2^32 encryptions. A nonce collision in AES-GCM catastrophically breaks both confidentiality and authenticity. XChaCha20-Poly1305 uses a 192-bit nonce, making random collisions practically impossible.

Each provider key gets its own Data Encryption Key (DEK), encrypted by a KMS-managed Key Encryption Key (KEK). Keys are decrypted only at runtime, only when forwarding to the provider, and never cached in plaintext.

Layer 4: The LLM Firewall

Every prompt passes through an LLM-backed firewall before reaching any provider.

A safety-tuned LLM (configurable via FIREWALL_MODEL, defaulting to Llama Guard 4 on Together) classifies each prompt as safe or unsafe against categories including prompt injection, jailbreak attempts, illegal-activity instructions, and hate speech. Distributions like [bedrock](anthropic.claude-3-haiku-20240307-v1:0):0.5,[together](meta-llama/Llama-Guard-4-12B):0.5 route between providers — every backend call records backend + model_ref on the row so you can see exactly which model produced each verdict.

A Qdrant verdict cache sits in front of the LLM call. When an incoming prompt’s embedding is similar enough (configurable threshold, default 0.95) to a recent unsafe verdict, the cached verdict short-circuits without calling the LLM. Only unsafe verdicts are cached — safe results are recomputed each request so a model upgrade can flip them without a runbook step. The same embedding the response semantic cache computes is reused, so repeat prompts pay zero extra embed cost.

Enable per request via the floopy-llm-security-enabled: true header. The plan flag has_advanced_firewall gates the whole feature.

Blocked requests return 400 PROMPT_THREAT_DETECTED with a clear error — no silent failures. Any LLM failure (network, parse) is fail-open: the request proceeds with a loud log so drift is visible from metrics.

Layer 5: SSRF and header security

The gateway validates every outbound request against a strict provider allowlist: api.openai.com, api.anthropic.com, generativelanguage.googleapis.com, and four others. Customer-supplied URLs never influence the destination.

Resolved IPs are checked against private ranges (RFC 1918, loopback, link-local, CGNAT). An attacker cannot trick the gateway into making internal network requests.

Client headers are sanitized before forwarding:

Stripped: authorization, cookie, proxy-authorization, x-forwarded-for, host
Passed through: Everything else, including provider-specific headers
Provider auth is injected after stripping — your keys never mix with client headers

Layer 6: PII scrubbing

Before logs are written to ClickHouse, request and response bodies are scanned for PII:

What we detect	Replacement
Email addresses	`[REDACTED:email]`
CPF / SSN numbers	`[REDACTED:cpf]` / `[REDACTED:ssn]`
Credit card numbers	`[REDACTED:credit_card]`
Phone numbers	`[REDACTED:phone]`
API keys (`sk-`, `flo_sk_`)	`[REDACTED:api_key]`
Bearer tokens	`[REDACTED:bearer]`

This runs in the async logging path — a separate pipeline from the request. It never blocks or slows down your API call.

Layer 7: Rate limiting and denial-of-wallet protection

Three tiers of rate limiting prevent abuse:

Anonymous: 20 rpm per IP — blocks unauthenticated probing
Authenticated: Per-org limits (not per-IP) — safe behind corporate NATs
Per-key: Configurable RPM per API key — granular control per application

Rate limits use atomic Redis sliding windows. Authenticated requests are keyed by organization ID, not by IP address. This prevents false limiting for enterprise customers behind shared egress IPs — a problem that affects most gateways.

Monthly request quotas act as a hard cap. If a leaked key or runaway loop starts burning tokens, the gateway returns 429 MONTHLY_LIMIT_EXCEEDED before it can drain your budget.

Layer 8: Architecture separation

The gateway (Olympus, Rust) and the dashboard (Zeus, Next.js) are completely separate systems. Zeus connects directly to Supabase and ClickHouse — it never proxies through Olympus.

This means:

A compromised gateway cannot access dashboard data (user accounts, billing, org settings)
A compromised dashboard cannot intercept or modify gateway traffic
Each system can be patched, deployed, and scaled independently

The full request flow

Every request passes through 13 security checkpoints, in order:

Body size check (10MB limit)
IP rate limiting
API key extraction and SHA-256 validation
Organization rate limiting
Subscription status check
Monthly usage quota check
Prompt template resolution
Cache lookup
LLM firewall scan (verdict cache + safety-tuned LLM)
SSRF validation
Header sanitization
Provider dispatch with encrypted key decryption
PII-scrubbed async logging

Each layer can reject the request independently. A failure in one layer does not bypass the others.

What’s next

Security is never done. We’re actively working on:

SOC 2 Type II compliance program
Customer-managed encryption keys for ClickHouse logs
TOTP-based 2FA for dashboard access
Audit logging for all security events
WAF deployment at the edge

For the full technical details, see our Security documentation.

Run your AI traffic through the most secure agent optimization platform on the market. Sign up free — 5,000 requests/month included, no credit card required.