Skip to content

Agent loop (chat completions)

When an organization has at least one enabled outbound MCP server (configured under MCP → Servers), every /v1/chat/completions request automatically runs through the agent loop. The decision is made server-side from the mcp_agent_active flag on organization_profiles:

mcp_agent_active = has_mcp_outbound AND EXISTS(SELECT 1 FROM mcp_outbound_servers s WHERE s.org_id = ... AND s.enabled = true)

Disable every server (or remove has_mcp_outbound from the org’s plan) and the next chat request flows through the standard pass-through path with zero behavior change.

A client whose org has the agent loop active can still run a pure pass-through chat completion by sending the floopy-mcp-disabled header:

floopy-mcp-disabled: true

Truthy values (true / 1 / yes, case-insensitive) skip the agent path for that single request. Useful when the caller knows the prompt needs no tools and wants to skip the agent’s per-round overhead.


client → /v1/chat/completions
├─ org_ctx.mcp_agent_active = true AND no opt-out header
agent::agent_loop::run
│ for each round:
│ 1. budget.check_round (max_rounds, wall-clock)
│ 2. LLM call via execute_strategies ← per-round span
│ • cancellable via tokio::select!
│ • dispatches to provider (OpenAI/Anthropic/Gemini/Bedrock)
│ 3. if no tool_calls → return final assistant message
│ 4. else dispatch_round (parallel, allowlist-checked)
│ 4a. cache hit? reuse value
│ 4b. miss? resolve secret_ref, call upstream, cache result
│ 4c. SSRF guard checks resolved IP — block aborts the
│ full request, not just the tool call (see Security)
│ 5. emit audit_log.agent.tool_call per dispatch
│ 6. append tool messages, loop

Tool discovery (auto-discovered, not declared)

Section titled “Tool discovery (auto-discovered, not declared)”

Floopy follows the market-standard MCP model: clients reference servers, not individual tools. The agent loop calls tools/list on each enabled server in parallel (cache TTL 10 minutes), aggregates the results, and exposes the union to the LLM. Tools are deduplicated by name with first-server (configuration order) winning on collisions.

Strict input-schema validation: tools whose inputSchema is missing, null, or not a JSON object are dropped from the catalog. The drop is logged as mcp_tool_invalid_schema; if a server has zero valid tools after validation, a server-level mcp_server_no_valid_tools warning fires. The MCP spec mandates inputSchema, so a server hitting this in production is misconfigured.


Three independent ceilings, enforced inside the loop:

  • Rounds — capped at runtime.max_rounds (1..50, default 10), and additionally at MAX_ROUNDS_CAP = 50 regardless of the per-org config.
  • Wall clock — 120s deadline sealed at acquisition. Each round races the LLM call and the tool dispatch against tokio::time::sleep_until(deadline) via tokio::select!, so a slow upstream cannot push the run past the deadline.
  • Concurrency — 16 simultaneous agent runs per org, enforced by an atomic Redis counter (Lua GET-and-conditional-INCR). Excess returns 429 with retry_after_secs = 60.

Exhausting rounds or wall-clock emits one audit_log.agent.budget_exhausted row and returns the partial messages with finish_reason = "length".


Identical tool calls (same (org_id, server_id, tool_name, canonical_args)) within runtime.tool_cache_ttl_seconds reuse the previous result. The cache value is HMAC-signed with FLOOPY_AGENT_CACHE_PEPPER so:

  • A Redis writer cannot inject a forged result.
  • A value from one (org, server, tool) cannot be migrated to another — the cache key is part of the MAC input.
  • A pepper rotation invalidates every cached entry wholesale.

tool_cache_ttl_seconds = 0 disables the cache entirely.


runtime.stream_mode is the org-level switch between two modes:

  • final_only — when the client requests stream: true, the loop runs to completion (intermediate rounds buffered server-side), then the final assistant message is split into multiple ~64-byte delta.content SSE frames so clients see the answer progressively. UTF-8 boundaries are preserved. Intermediate-round text (e.g. “Vou consultar o clima.”) is not streamed live in this version — it’s reserved for a follow-up that drives each round through the upstream provider as a stream.
  • disabled — even if the client requests stream: true, the response is a single non-streaming chat.completion JSON. Useful for integrations that cannot consume SSE.

When the client doesn’t request streaming, both modes return JSON.


Tools, tool_use / tool_calls, and tool_result round-trip through every provider Floopy supports:

ProviderTool field nameResult field nameStop reason
OpenAI / OpenAI-compattools[].function.parameterstool_calls[]tool_calls
Anthropictools[].input_schemacontent[].tool_usetool_use
Geminitools[].functionDeclarationsparts[].functionCall(inferred from part type)
Bedrock ConversetoolConfig.tools[].toolSpecoutput.message.content[].toolUsetool_use

Floopy translates between these on the wire — clients always speak the OpenAI shape.


Every outbound HTTP from the agent loop (and from webhook delivery) passes through a cluster-wide SSRF policy loaded once from environment variables at startup. Default behavior matches the pre-PR-2 hardcoded blocks; operators can adjust without code changes:

FLOOPY_OUTBOUND_SSRF_BLOCK_LOOPBACK=true
FLOOPY_OUTBOUND_SSRF_BLOCK_PRIVATE=true # RFC1918
FLOOPY_OUTBOUND_SSRF_BLOCK_LINK_LOCAL=true # incl. 169.254.169.254 cloud metadata
FLOOPY_OUTBOUND_SSRF_BLOCK_MULTICAST=true
FLOOPY_OUTBOUND_SSRF_BLOCK_CGNAT=true # 100.64/10
FLOOPY_OUTBOUND_SSRF_BLOCK_ULA_V6=true # fc00::/7
FLOOPY_OUTBOUND_SSRF_EXTRA_BLOCKED_CIDRS= # CSV of v4 CIDRs
FLOOPY_OUTBOUND_SSRF_ALLOWLIST_CIDRS= # CSV; overrides every block

A block during the agent loop aborts the entire request with HTTP 403 (outbound_ssrf_blocked: ssrf_blocked:{reason}:{ip}). It does not feed the error back to the LLM as a tool result, because that would let an attacker who controls an MCP server URL probe Floopy’s internal network one tool call at a time.


Every tool call produced by the LLM is re-validated against the inputSchema returned by the upstream’s tools/list before Floopy dispatches it. A schema violation surfaces as a tool error fed back to the LLM rather than a hard crash — the loop continues, and the model gets a chance to retry with corrected arguments. SSRF blocks (above) are the only fatal class.


Each round writes to ClickHouse request_response_rmt with:

  • surface = 'agent'
  • agent_run_id — UUID; equals the request_id of round 0 and is reused across rounds.
  • round_index — zero-based.
  • tool_call_indexNULL on the LLM-call row; carried separately in audit rows for per-tool dispatches.

Each tool call writes one audit_log.agent.tool_call row with (agent_run_id, round_index, tool_call_index, server_id, tool, status, latency_ms).

The orchestrator opens a routing_execute span around the whole agent invocation, with N child agent_round_{idx} spans (one per LLM dispatch) carrying round_index and agent_run_id attributes. Join all of these by agent_run_id for end-to-end traceability.


  • Per-request: send floopy-mcp-disabled: true header.
  • Per-server: toggle the Switch on the Server card.
  • Per-org: remove has_mcp_outbound from the plan, or disable every server.
  • Per-runtime tweak: set tool_cache_ttl_seconds = 0 to disable just the cache, or max_rounds = 1 to force single-shot behavior.

The loop’s overhead when inactive is one boolean check on OrgContext.mcp_agent_active plus one optional header read.