Skip to content

Cerebras

Overview

Cerebras provides ultra-fast inference using custom wafer-scale engine (WSE) hardware, delivering some of the lowest latencies available for open-source models. Floopy proxies requests to Cerebras’ OpenAI-compatible API.

Supported Models

ModelContext WindowNotes
llama-3.3-70b128KLlama 3.3 70B on WSE
llama-3.1-8b128KLlama 3.1 8B on WSE
qwen-3-32b128KQwen 3 32B on WSE

Setup

  1. Go to Settings > Providers in the dashboard.
  2. Click Add provider and select Cerebras.
  3. Paste your Cerebras API key and click Save.

Usage

import OpenAI from "openai";
const client = new OpenAI({
baseURL: "https://api.floopy.ai/v1",
apiKey: process.env.FLOOPY_API_KEY,
});
const response = await client.chat.completions.create({
model: "llama-3.3-70b",
messages: [{ role: "user", content: "Explain quantum computing." }],
});

Provider-Specific Features

  • Ultra-low latency — Cerebras WSE hardware delivers inference speeds significantly faster than GPU-based providers.
  • High throughput — Ideal for real-time applications and high-volume workloads.
  • Simple model names — Models use short names (e.g., llama-3.3-70b) without org prefixes.