Skip to content

Groq

Overview

Groq provides ultra-low-latency inference using custom LPU hardware, hosting open-source models like Llama and Mixtral. Floopy proxies requests to Groq’s OpenAI-compatible API and surfaces additional latency metrics via response headers.

Supported Models

ModelContext WindowNotes
llama-3.3-70b-versatile128KLatest Llama, general purpose
llama-3.1-8b-instant128KFastest Llama model
mixtral-8x7b-3276832KMixture-of-experts architecture
gemma2-9b-it8KGoogle’s open-source Gemma

Setup

  1. Go to Settings > Providers in the dashboard.
  2. Click Add provider and select Groq.
  3. Paste your Groq API key and click Save.

Usage

import OpenAI from "openai";
const client = new OpenAI({
baseURL: "https://api.floopy.ai/v1",
apiKey: process.env.FLOOPY_API_KEY,
});
const response = await client.chat.completions.create({
model: "llama-3.3-70b-versatile",
messages: [{ role: "user", content: "Explain quantum computing." }],
});

Provider-Specific Features

  • Latency headers — Groq responses include additional timing headers forwarded by Floopy:
    • Floopy-Queue-Time — time spent in Groq’s inference queue (ms)
    • Floopy-Prompt-Time — time to process the prompt (ms)
    • Floopy-Completion-Time — time to generate the completion (ms)
  • Ultra-low latency — Groq’s LPU hardware delivers significantly faster inference than GPU-based providers, making it ideal for real-time applications.

Fallback

Route to OpenAI if Groq is unavailable:

Terminal window
curl https://api.floopy.ai/v1/chat/completions \
-H "Authorization: Bearer $FLOOPY_API_KEY" \
-H "x-floopy-fallback-provider: openai" \
-H "x-floopy-fallback-model: gpt-4o-mini" \
-H "Content-Type: application/json" \
-d '{"model": "llama-3.3-70b-versatile", "messages": [{"role": "user", "content": "Hello"}]}'