Skip to content
Sign In Get Started

Batch API

The Floopy Batch API runs large jobs asynchronously at lower cost. It is OpenAI-shaped: point the OpenAI SDK’s baseURL at the gateway and use client.files + client.batches exactly as you would against OpenAI — no new client, no new shapes to learn.

Unlike a raw provider Batch API, every batched request is fully observable: each line lands in your Requests dashboard tagged with its batch_id, counts toward your usage, is metered, and flows through the same feedback/scoring pipeline as a live call.

graph LR
    U[Upload JSONL<br/>POST /v1/files] --> C[Create batch<br/>POST /v1/batches]
    C --> P[Provider batch API<br/>OpenAI / Anthropic]
    P --> R[Reconcile worker]
    R --> CH[(Per-line logs<br/>batch_id)]
    R --> M[Metering + usage]
    R --> O[Output JSONL<br/>GET /v1/files/&#123;id&#125;/content]
  1. Upload your input as a JSONL file (purpose: "batch"). Each line is a standard chat-completions request with a custom_id.
  2. Create a batch referencing the uploaded file.
  3. Poll the batch until it is completed.
  4. Download the output file — OpenAI-shaped JSONL, one result per custom_id.

When the batch finishes, a background worker backfills every request so it is indistinguishable from a live call on the dashboard.

The public surface is always OpenAI-shaped. The provider is resolved exactly like chat:

  • the floopy-provider header (openai or anthropic), or
  • the single provider key configured for your organization.

For Anthropic, the gateway transparently translates your OpenAI-shaped input into Anthropic Message Batches on the way in, and the results back into OpenAI-shaped output — so the same code works against either provider just by changing the header.

There is no plan gate and no separate flag: the Batch API is available whenever the gateway is configured for it.

from openai import OpenAI
client = OpenAI(
base_url="https://api.floopy.ai/v1",
api_key="YOUR_FLOOPY_API_KEY",
)
# 1. Upload the input file (JSONL, one chat request per line).
f = client.files.create(
file=open("requests.jsonl", "rb"),
purpose="batch",
)
# 2. Create the batch.
batch = client.batches.create(
input_file_id=f.id,
endpoint="/v1/chat/completions",
completion_window="24h",
)
# 3. Poll until done, then download the output.
batch = client.batches.retrieve(batch.id)
if batch.status == "completed":
out = client.files.content(batch.output_file_id)
print(out.text)

Each input line is a standard request, with a custom_id you use to match results:

{"custom_id":"req-1","method":"POST","url":"/v1/chat/completions","body":{"model":"gpt-4o","messages":[{"role":"user","content":"Hello"}]}}
MethodPathPurpose
POST/v1/filesUpload an input file (purpose: "batch").
POST/v1/batchesCreate a batch from an uploaded file.
GET/v1/batchesList your batches.
GET/v1/batches/{id}Retrieve a batch (refreshes status).
POST/v1/batches/{id}/cancelRequest cancellation.
GET/v1/files/{id}/contentDownload input or finished output.

Batches are org-scoped: you only ever see your own.

Batched requests are not a blind passthrough. Once a batch completes, every line is reconciled into the same pipeline as live traffic:

  • Requests dashboard — one row per request, tagged with batch_id. Filter the Requests view by batch to see exactly what ran, including session id and the headers you sent at submit time.
  • Usage & billing — each batched request counts toward your monthly usage and is metered. Batch traffic counts toward the monthly cap but not your per-minute/second rate limits.
  • Feedback & scoring — successful chat lines flow through the feedback pipeline (heuristic always; the LLM judge at a sampled rate) when your plan includes the feedback feature, so batch results keep improving routing just like live calls.

Caching is intentionally not applied to batch requests.