Benchmarks · Q1 2026

Faster than calling
OpenAI directly.

Name: Floopy AI Gateway Performance Benchmarks
Creator: Floopy

Floopy with no features enabled is 4.8% faster than direct API calls. With cache and firewall, it's 58% faster. Tested with the OpenAI Node.js SDK, 50 rounds, isolated prompts across 10 languages.

Reframe: We ship a fast gateway, but the product is feedback-driven routing. See how session-level feedback propagation cuts cost 30–60% while keeping quality.

4.8%

Faster than direct

No features enabled

10ms

P50 with cache

Exact cache hits

41MB

Memory usage

Peak: 44MB

0ms

Firewall overhead

LLM Firewall (cached)

Results summary

Line-by-line latency.

Scenario	Avg (ms)	P50 (ms)	P99 (ms)	vs Direct
OpenAI Direct	664	633	983	—
Floopy (no features)	632	620	879	-4.8%
Floopy + Exact Cache	195	10	773	-70.6%
Floopy + Firewall	607	613	826	-8.6%
Floopy + Cache + Firewall	277	260	1,171	-58.3%
LiteLLM Proxy	660	665	895	-0.6%
Helicone Proxy	680	655	980	+2.4%

OpenAI Node.js SDK, gpt-4.1-nano, 50 rounds/scenario, worst outlier excluded, anti-cache timestamps.

Gateway comparison

Head-to-head.

Metric	Floopy	LiteLLM	Helicone
Avg Latency	632ms	660ms	680ms
vs Direct	-4.8%	-0.6%	+2.4%
Written in	Rust	Python	Managed
Memory	41 MB	~200-400 MB	N/A
Caching	3-tier	Basic Redis	No
LLM Firewall	On-device	External	No

Floopy is the only gateway that is measurably faster than calling the provider directly.

Why it's fast

The four things doing the work.

Rust, not Python.

Written in Rust with Axum and Tokio. No interpreter, no garbage collector, no VM warmup. 41MB memory footprint vs 200-400MB for Python gateways.

Persistent connection pooling.

Warm HTTPS connections shared across all API keys. Eliminates per-request TLS handshakes — saves 20-50ms, more than the gateway's processing overhead.

Firewall verdict cache.

Numbers above were measured with the legacy ONNX firewall. The firewall/classifier migration moved these paths to LLM-backed dispatch via the BackendRouter; a Qdrant verdict cache short-circuits repeat unsafe prompts. New benchmarks pending.

Background logging.

Request logs are queued via async channels and batch-inserted to ClickHouse. Logging never touches the response path.

Methodology

Published numbers, reproducible setup.

Client: OpenAI Node.js SDK (same SDK developers use in production)
Model: gpt-4.1-nano, 50 rounds per scenario
Anti-cache: Timestamp + index injected in every prompt — zero provider cache hits
Prompt isolation: 266 unique prompts across 10 languages, zero overlap between scenarios
Outliers: Worst result per scenario excluded
Competitors: LiteLLM (Docker, Python proxy), Helicone (managed cloud proxy)

Full methodology Read the blog post

Faster than callingOpenAI directly.