Cerebras
Overview
Cerebras provides ultra-fast inference using custom wafer-scale engine (WSE) hardware, delivering some of the lowest latencies available for open-source models. Floopy proxies requests to Cerebras’ OpenAI-compatible API.
Supported Models
| Model | Context Window | Notes |
|---|---|---|
llama-3.3-70b | 128K | Llama 3.3 70B on WSE |
llama-3.1-8b | 128K | Llama 3.1 8B on WSE |
qwen-3-32b | 128K | Qwen 3 32B on WSE |
Setup
- Go to Settings > Providers in the dashboard.
- Click Add provider and select Cerebras.
- Paste your Cerebras API key and click Save.
Usage
import OpenAI from "openai";
const client = new OpenAI({ baseURL: "https://api.floopy.ai/v1", apiKey: process.env.FLOOPY_API_KEY,});
const response = await client.chat.completions.create({ model: "llama-3.3-70b", messages: [{ role: "user", content: "Explain quantum computing." }],});from openai import OpenAI
client = OpenAI(base_url="https://api.floopy.ai/v1", api_key=os.environ["FLOOPY_API_KEY"])
response = client.chat.completions.create( model="llama-3.3-70b", messages=[{"role": "user", "content": "Explain quantum computing."}],)curl https://api.floopy.ai/v1/chat/completions \ -H "Authorization: Bearer $FLOOPY_API_KEY" \ -H "Content-Type: application/json" \ -d '{"model": "llama-3.3-70b", "messages": [{"role": "user", "content": "Explain quantum computing."}]}'Provider-Specific Features
- Ultra-low latency — Cerebras WSE hardware delivers inference speeds significantly faster than GPU-based providers.
- High throughput — Ideal for real-time applications and high-volume workloads.
- Simple model names — Models use short names (e.g.,
llama-3.3-70b) without org prefixes.