Groq

Overview

Groq provides ultra-low-latency inference using custom LPU hardware, hosting open-source models like Llama and Mixtral. Floopy proxies requests to Groq’s OpenAI-compatible API and surfaces additional latency metrics via response headers.

Supported Models

Model	Context Window	Notes
`llama-3.3-70b-versatile`	128K	Latest Llama, general purpose
`llama-3.1-8b-instant`	128K	Fastest Llama model
`mixtral-8x7b-32768`	32K	Mixture-of-experts architecture
`gemma2-9b-it`	8K	Google’s open-source Gemma

Setup

Go to Settings > Providers in the dashboard.
Click Add provider and select Groq.
Paste your Groq API key and click Save.

Usage

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://api.floopy.ai/v1",
  apiKey: process.env.FLOOPY_API_KEY,
});

const response = await client.chat.completions.create({
  model: "llama-3.3-70b-versatile",
  messages: [{ role: "user", content: "Explain quantum computing." }],
});

from openai import OpenAI

client = OpenAI(base_url="https://api.floopy.ai/v1", api_key=os.environ["FLOOPY_API_KEY"])

response = client.chat.completions.create(
    model="llama-3.3-70b-versatile",
    messages=[{"role": "user", "content": "Explain quantum computing."}],
)

curl https://api.floopy.ai/v1/chat/completions \
  -H "Authorization: Bearer $FLOOPY_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"model": "llama-3.3-70b-versatile", "messages": [{"role": "user", "content": "Explain quantum computing."}]}'

Provider-Specific Features

Latency headers — Groq responses include additional timing headers forwarded by Floopy:
- Floopy-Queue-Time — time spent in Groq’s inference queue (ms)
- Floopy-Prompt-Time — time to process the prompt (ms)
- Floopy-Completion-Time — time to generate the completion (ms)
Ultra-low latency — Groq’s LPU hardware delivers significantly faster inference than GPU-based providers, making it ideal for real-time applications.

Fallback

Route to OpenAI if Groq is unavailable:

curl https://api.floopy.ai/v1/chat/completions \
  -H "Authorization: Bearer $FLOOPY_API_KEY" \
  -H "x-floopy-fallback-provider: openai" \
  -H "x-floopy-fallback-model: gpt-4o-mini" \
  -H "Content-Type: application/json" \
  -d '{"model": "llama-3.3-70b-versatile", "messages": [{"role": "user", "content": "Hello"}]}'