Batch API

The Floopy Batch API runs large jobs asynchronously at lower cost. It is OpenAI-shaped: point the OpenAI SDK’s baseURL at the gateway and use client.files + client.batches exactly as you would against OpenAI — no new client, no new shapes to learn.

Unlike a raw provider Batch API, every batched request is fully observable: each line lands in your Requests dashboard tagged with its batch_id, counts toward your usage, is metered, and flows through the same feedback/scoring pipeline as a live call.

How it works

graph LR
    U[Upload JSONL<br/>POST /v1/files] --> C[Create batch<br/>POST /v1/batches]
    C --> P[Provider batch API<br/>OpenAI / Anthropic]
    P --> R[Reconcile worker]
    R --> CH[(Per-line logs<br/>batch_id)]
    R --> M[Metering + usage]
    R --> O[Output JSONL<br/>GET /v1/files/&#123;id&#125;/content]

Upload your input as a JSONL file (purpose: "batch"). Each line is a standard chat-completions request with a custom_id.
Create a batch referencing the uploaded file.
Poll the batch until it is completed.
Download the output file — OpenAI-shaped JSONL, one result per custom_id.

When the batch finishes, a background worker backfills every request so it is indistinguishable from a live call on the dashboard.

Providers

The public surface is always OpenAI-shaped. The provider is resolved exactly like chat:

the floopy-provider header (openai or anthropic), or
the single provider key configured for your organization.

For Anthropic, the gateway transparently translates your OpenAI-shaped input into Anthropic Message Batches on the way in, and the results back into OpenAI-shaped output — so the same code works against either provider just by changing the header.

There is no plan gate and no separate flag: the Batch API is available whenever the gateway is configured for it.

from openai import OpenAI

client = OpenAI(
    base_url="https://api.floopy.ai/v1",
    api_key="YOUR_FLOOPY_API_KEY",
)

# 1. Upload the input file (JSONL, one chat request per line).
f = client.files.create(
    file=open("requests.jsonl", "rb"),
    purpose="batch",
)

# 2. Create the batch.
batch = client.batches.create(
    input_file_id=f.id,
    endpoint="/v1/chat/completions",
    completion_window="24h",
)

# 3. Poll until done, then download the output.
batch = client.batches.retrieve(batch.id)
if batch.status == "completed":
    out = client.files.content(batch.output_file_id)
    print(out.text)

# 1. Upload
curl https://api.floopy.ai/v1/files \
  -H "Authorization: Bearer $FLOOPY_API_KEY" \
  -F purpose=batch \
  -F file=@requests.jsonl

# 2. Create (use floopy-provider to pick the provider)
curl https://api.floopy.ai/v1/batches \
  -H "Authorization: Bearer $FLOOPY_API_KEY" \
  -H "floopy-provider: anthropic" \
  -H "Content-Type: application/json" \
  -d '{"input_file_id":"file-abc","endpoint":"/v1/chat/completions","completion_window":"24h"}'

# 3. Retrieve / download
curl https://api.floopy.ai/v1/batches/batch_abc \
  -H "Authorization: Bearer $FLOOPY_API_KEY"
curl https://api.floopy.ai/v1/files/file-out/content \
  -H "Authorization: Bearer $FLOOPY_API_KEY"

Each input line is a standard request, with a custom_id you use to match results:

{"custom_id":"req-1","method":"POST","url":"/v1/chat/completions","body":{"model":"gpt-4o","messages":[{"role":"user","content":"Hello"}]}}

Endpoints

Method	Path	Purpose
`POST`	`/v1/files`	Upload an input file (`purpose: "batch"`).
`POST`	`/v1/batches`	Create a batch from an uploaded file.
`GET`	`/v1/batches`	List your batches.
`GET`	`/v1/batches/{id}`	Retrieve a batch (refreshes status).
`POST`	`/v1/batches/{id}/cancel`	Request cancellation.
`GET`	`/v1/files/{id}/content`	Download input or finished output.

Batches are org-scoped: you only ever see your own.

Observability

Batched requests are not a blind passthrough. Once a batch completes, every line is reconciled into the same pipeline as live traffic:

Requests dashboard — one row per request, tagged with batch_id. Filter the Requests view by batch to see exactly what ran, including session id and the headers you sent at submit time.
Usage & billing — each batched request counts toward your monthly usage and is metered. Batch traffic counts toward the monthly cap but not your per-minute/second rate limits.
Feedback & scoring — successful chat lines flow through the feedback pipeline (heuristic always; the LLM judge at a sampled rate) when your plan includes the feedback feature, so batch results keep improving routing just like live calls.

Caching is intentionally not applied to batch requests.