Observability

Overview

Every request that flows through the Floopy gateway is logged automatically. You get full visibility into what happened, how long it took, how much it cost, and whether caching or the firewall was involved. All data is stored in a high-performance analytics engine designed for fast querying over large volumes of log data.

No additional instrumentation or SDK is required. Logging is built into the gateway.

What Gets Logged

Each request record includes:

Request and response — the full prompt and completion (redactable if needed).
Latency — end-to-end response time, broken down by stage.
Token usage — input and output tokens consumed.
Cost — calculated cost based on the provider’s pricing.
Cache status — whether the response was served from cache and which tier matched.
Firewall events — threat scores and whether the request was flagged or blocked.
Model and provider — which model handled the request and through which provider.
Custom properties — any metadata your application attached to the request.

Dashboard Pages

Requests

The main log view. Browse all requests with filters for time range, model, provider, status code, cache hit, and more. Click any row to open a detail panel showing the full request and response, token breakdown, latency stages, and associated events.

Sessions

Requests grouped by session ID. When your application sends a session identifier with requests, Floopy groups them into conversations so you can follow multi-turn interactions from start to finish.

Users

Per end-user analytics. If your application passes a user identifier, Floopy tracks usage, cost, and request volume per user. Useful for identifying power users, detecting abuse, and understanding usage patterns.

Properties

Segment your traffic by custom properties. Tag requests with metadata like environment, feature name, or customer tier, then filter and aggregate by those dimensions. This lets you answer questions like “how much does the summarization feature cost per day?” or “which customer tier generates the most tokens?”

Distributed Tracing

Each request generates a trace that shows the full lifecycle through the gateway:

Authentication — API key validation and organization lookup.
Cache check — exact, semantic, and advanced cache lookups with timing.
Firewall — Qdrant verdict-cache lookup followed by an LLM safety classification (backend and model_ref recorded per call).
Routing — provider selection and request transformation.
Provider dispatch — the call to the upstream AI provider with response time.

The trace view displays these stages as spans on a timeline, so you can see exactly where time is spent and identify bottlenecks.

Custom Properties and Session Tracking

Attach metadata to requests using headers so you can filter, segment, and analyze traffic in the dashboard.

Header	Description
`floopy-user-id`	Associate the request with a specific end user
`floopy-session-id`	Group requests into a conversation session
`floopy-session-name`	Human-readable name for the session
`floopy-session-path`	Path or location identifier for the session
`floopy-property-<name>`	Each header matching this pattern becomes a custom property. For example, `floopy-property-usertier` creates a property called `usertier`

Each custom property is sent as its own header. There is no combined JSON properties header — each property gets an individual floopy-property-<name> header.

import { OpenAI } from "openai";

const client = new OpenAI({
  baseURL: "https://api.floopy.ai/v1",
  apiKey: process.env.FLOOPY_API_KEY,
});

const response = await client.chat.completions.create(
  {
    model: "gpt-4o",
    messages: [{ role: "user", content: "Hello" }],
  },
  {
    headers: {
      "floopy-user-id": "user-alice-001",
      "floopy-session-id": "sess-alice-a1b2c3",
      "floopy-session-name": "math-tutoring",
      "floopy-session-path": "/dashboard/math",
      "floopy-property-usertier": "premium",
      "floopy-property-usertype": "power-user",
      "floopy-property-industry": "education",
    },
  },
);

console.log(response.choices[0].message.content);

from openai import OpenAI
import os

client = OpenAI(
    base_url="https://api.floopy.ai/v1",
    api_key=os.environ["FLOOPY_API_KEY"],
)

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello"}],
    extra_headers={
        "floopy-user-id": "user-alice-001",
        "floopy-session-id": "sess-alice-a1b2c3",
        "floopy-session-name": "math-tutoring",
        "floopy-session-path": "/dashboard/math",
        "floopy-property-usertier": "premium",
        "floopy-property-usertype": "power-user",
        "floopy-property-industry": "education",
    },
)

print(response.choices[0].message.content)

curl https://api.floopy.ai/v1/chat/completions \
  -H "Authorization: Bearer $FLOOPY_API_KEY" \
  -H "Content-Type: application/json" \
  -H "floopy-user-id: user-alice-001" \
  -H "floopy-session-id: sess-alice-a1b2c3" \
  -H "floopy-session-name: math-tutoring" \
  -H "floopy-session-path: /dashboard/math" \
  -H "floopy-property-usertier: premium" \
  -H "floopy-property-usertype: power-user" \
  -H "floopy-property-industry: education" \
  -d '{
    "model": "gpt-4o",
    "messages": [{"role": "user", "content": "Hello"}]
  }'

Reliability: Fallback and Replay

The gateway never drops observability data when the analytics backend is unavailable. Logs, circuit-breaker events, and trace spans are written by dedicated batch workers with a disk-based fallback:

Disk fallback. If ClickHouse is unreachable, each worker writes pending batches to a local JSONL fallback file (one per sink). Your requests keep flowing — observability is never on the hot path.
Bounded rotation. Fallback files are capped at 100 MiB and rotated through three generations (.jsonl, .jsonl.1, .jsonl.2), so a prolonged outage cannot exhaust disk. Older generations are dropped when rotation advances.
Automatic replay on recovery. A background task monitors ClickHouse availability. When connectivity is restored, the worker re-reads pending fallback files, replays them in batches, and deletes each file only after the insert is confirmed.
Concurrent-safe replay. A .replaying lock file prevents multiple gateway instances from double-inserting the same rows during replay.

This applies to request logs, circuit-breaker events, and distributed trace spans. If you run a multi-region deployment and your analytics store has a brief incident, you will see logs stream back in as replay completes rather than losing the window entirely.

Retention and Performance

Logs are retained according to your plan’s data retention policy. Queries over millions of records return in seconds, so you can search and filter without waiting.

For high-volume applications, Floopy aggregates metrics (cost, token usage, latency percentiles) into pre-computed rollups that power the dashboard charts without scanning raw logs.