Rust, not Python.
Written in Rust with Axum and Tokio. No interpreter, no garbage collector, no VM warmup. 41MB memory footprint vs 200-400MB for Python gateways.
Floopy with no features enabled is 4.8% faster than direct API calls. With cache and firewall, it's 58% faster. Tested with the OpenAI Node.js SDK, 50 rounds, isolated prompts across 10 languages.
| Scenario | Avg (ms) | P50 (ms) | P99 (ms) | vs Direct |
|---|---|---|---|---|
| OpenAI Direct | 664 | 633 | 983 | — |
| Floopy (no features) | 632 | 620 | 879 | -4.8% |
| Floopy + Exact Cache | 195 | 10 | 773 | -70.6% |
| Floopy + Firewall | 607 | 613 | 826 | -8.6% |
| Floopy + Cache + Firewall | 277 | 260 | 1,171 | -58.3% |
| LiteLLM Proxy | 660 | 665 | 895 | -0.6% |
| Helicone Proxy | 680 | 655 | 980 | +2.4% |
OpenAI Node.js SDK, gpt-4.1-nano, 50 rounds/scenario, worst outlier excluded, anti-cache timestamps.
| Metric | Floopy | LiteLLM | Helicone |
|---|---|---|---|
| Avg Latency | 632ms | 660ms | 680ms |
| vs Direct | -4.8% | -0.6% | +2.4% |
| Written in | Rust | Python | Managed |
| Memory | 41 MB | ~200-400 MB | N/A |
| Caching | 3-tier | Basic Redis | No |
| LLM Firewall | On-device | External | No |
Floopy is the only gateway that is measurably faster than calling the provider directly.
Written in Rust with Axum and Tokio. No interpreter, no garbage collector, no VM warmup. 41MB memory footprint vs 200-400MB for Python gateways.
Warm HTTPS connections shared across all API keys. Eliminates per-request TLS handshakes — saves 20-50ms, more than the gateway's processing overhead.
Numbers above were measured with the legacy ONNX firewall. The firewall/classifier migration moved these paths to LLM-backed dispatch via the BackendRouter; a Qdrant verdict cache short-circuits repeat unsafe prompts. New benchmarks pending.
Request logs are queued via async channels and batch-inserted to ClickHouse. Logging never touches the response path.