Batch API
Batch API
Section titled “Batch API”The Floopy Batch API runs large jobs asynchronously at lower cost. It is OpenAI-shaped: point the OpenAI SDK’s baseURL at the gateway and use client.files + client.batches exactly as you would against OpenAI — no new client, no new shapes to learn.
Unlike a raw provider Batch API, every batched request is fully observable: each line lands in your Requests dashboard tagged with its batch_id, counts toward your usage, is metered, and flows through the same feedback/scoring pipeline as a live call.
How it works
Section titled “How it works”graph LR
U[Upload JSONL<br/>POST /v1/files] --> C[Create batch<br/>POST /v1/batches]
C --> P[Provider batch API<br/>OpenAI / Anthropic]
P --> R[Reconcile worker]
R --> CH[(Per-line logs<br/>batch_id)]
R --> M[Metering + usage]
R --> O[Output JSONL<br/>GET /v1/files/{id}/content]- Upload your input as a JSONL file (
purpose: "batch"). Each line is a standard chat-completions request with acustom_id. - Create a batch referencing the uploaded file.
- Poll the batch until it is
completed. - Download the output file — OpenAI-shaped JSONL, one result per
custom_id.
When the batch finishes, a background worker backfills every request so it is indistinguishable from a live call on the dashboard.
Providers
Section titled “Providers”The public surface is always OpenAI-shaped. The provider is resolved exactly like chat:
- the
floopy-providerheader (openaioranthropic), or - the single provider key configured for your organization.
For Anthropic, the gateway transparently translates your OpenAI-shaped input into Anthropic Message Batches on the way in, and the results back into OpenAI-shaped output — so the same code works against either provider just by changing the header.
There is no plan gate and no separate flag: the Batch API is available whenever the gateway is configured for it.
Example
Section titled “Example”from openai import OpenAI
client = OpenAI( base_url="https://api.floopy.ai/v1", api_key="YOUR_FLOOPY_API_KEY",)
# 1. Upload the input file (JSONL, one chat request per line).f = client.files.create( file=open("requests.jsonl", "rb"), purpose="batch",)
# 2. Create the batch.batch = client.batches.create( input_file_id=f.id, endpoint="/v1/chat/completions", completion_window="24h",)
# 3. Poll until done, then download the output.batch = client.batches.retrieve(batch.id)if batch.status == "completed": out = client.files.content(batch.output_file_id) print(out.text)# 1. Uploadcurl https://api.floopy.ai/v1/files \ -H "Authorization: Bearer $FLOOPY_API_KEY" \ -F purpose=batch \ -F file=@requests.jsonl
# 2. Create (use floopy-provider to pick the provider)curl https://api.floopy.ai/v1/batches \ -H "Authorization: Bearer $FLOOPY_API_KEY" \ -H "floopy-provider: anthropic" \ -H "Content-Type: application/json" \ -d '{"input_file_id":"file-abc","endpoint":"/v1/chat/completions","completion_window":"24h"}'
# 3. Retrieve / downloadcurl https://api.floopy.ai/v1/batches/batch_abc \ -H "Authorization: Bearer $FLOOPY_API_KEY"curl https://api.floopy.ai/v1/files/file-out/content \ -H "Authorization: Bearer $FLOOPY_API_KEY"Each input line is a standard request, with a custom_id you use to match results:
{"custom_id":"req-1","method":"POST","url":"/v1/chat/completions","body":{"model":"gpt-4o","messages":[{"role":"user","content":"Hello"}]}}Endpoints
Section titled “Endpoints”| Method | Path | Purpose |
|---|---|---|
POST | /v1/files | Upload an input file (purpose: "batch"). |
POST | /v1/batches | Create a batch from an uploaded file. |
GET | /v1/batches | List your batches. |
GET | /v1/batches/{id} | Retrieve a batch (refreshes status). |
POST | /v1/batches/{id}/cancel | Request cancellation. |
GET | /v1/files/{id}/content | Download input or finished output. |
Batches are org-scoped: you only ever see your own.
Observability
Section titled “Observability”Batched requests are not a blind passthrough. Once a batch completes, every line is reconciled into the same pipeline as live traffic:
- Requests dashboard — one row per request, tagged with
batch_id. Filter the Requests view by batch to see exactly what ran, including session id and the headers you sent at submit time. - Usage & billing — each batched request counts toward your monthly usage and is metered. Batch traffic counts toward the monthly cap but not your per-minute/second rate limits.
- Feedback & scoring — successful chat lines flow through the feedback pipeline (heuristic always; the LLM judge at a sampled rate) when your plan includes the feedback feature, so batch results keep improving routing just like live calls.
Caching is intentionally not applied to batch requests.