Skip to content
Ffloopy
Benchmarks · Q1 2026

Faster than calling
OpenAI directly.

Floopy with no features enabled is 4.8% faster than direct API calls. With cache and firewall, it's 58% faster. Tested with the OpenAI Node.js SDK, 50 rounds, isolated prompts across 10 languages.

Reframe: We ship a fast gateway, but the product is feedback-driven routing. See how session-level feedback propagation cuts cost 30–60% while keeping quality.
4.8%
Faster than direct
No features enabled
10ms
P50 with cache
Exact cache hits
41MB
Memory usage
Peak: 44MB
0ms
Firewall overhead
LLM Firewall (cached)
Results summary

Line-by-line latency.

ScenarioAvg (ms)P50 (ms)P99 (ms)vs Direct
OpenAI Direct664633983
Floopy (no features)632620879-4.8%
Floopy + Exact Cache19510773-70.6%
Floopy + Firewall607613826-8.6%
Floopy + Cache + Firewall2772601,171-58.3%
LiteLLM Proxy660665895-0.6%
Helicone Proxy680655980+2.4%

OpenAI Node.js SDK, gpt-4.1-nano, 50 rounds/scenario, worst outlier excluded, anti-cache timestamps.

Gateway comparison

Head-to-head.

MetricFloopyLiteLLMHelicone
Avg Latency632ms660ms680ms
vs Direct-4.8%-0.6%+2.4%
Written inRustPythonManaged
Memory41 MB~200-400 MBN/A
Caching3-tierBasic RedisNo
LLM FirewallOn-deviceExternalNo

Floopy is the only gateway that is measurably faster than calling the provider directly.

Why it's fast

The four things doing the work.

Rust, not Python.

Written in Rust with Axum and Tokio. No interpreter, no garbage collector, no VM warmup. 41MB memory footprint vs 200-400MB for Python gateways.

Persistent connection pooling.

Warm HTTPS connections shared across all API keys. Eliminates per-request TLS handshakes — saves 20-50ms, more than the gateway's processing overhead.

Firewall verdict cache.

Numbers above were measured with the legacy ONNX firewall. The firewall/classifier migration moved these paths to LLM-backed dispatch via the BackendRouter; a Qdrant verdict cache short-circuits repeat unsafe prompts. New benchmarks pending.

Background logging.

Request logs are queued via async channels and batch-inserted to ClickHouse. Logging never touches the response path.

Methodology

Published numbers, reproducible setup.

  • Client: OpenAI Node.js SDK (same SDK developers use in production)
  • Model: gpt-4.1-nano, 50 rounds per scenario
  • Anti-cache: Timestamp + index injected in every prompt — zero provider cache hits
  • Prompt isolation: 266 unique prompts across 10 languages, zero overlap between scenarios
  • Outliers: Worst result per scenario excluded
  • Competitors: LiteLLM (Docker, Python proxy), Helicone (managed cloud proxy)
Full methodology Read the blog post