Agent loop (chat completions)

Auto-activation

When an organization has at least one enabled outbound MCP server (configured under MCP → Servers), every /v1/chat/completions request automatically runs through the agent loop. The decision is made server-side from the mcp_agent_active flag on organization_profiles:

mcp_agent_active = has_mcp_outbound AND EXISTS(SELECT 1 FROM mcp_outbound_servers s WHERE s.org_id = ... AND s.enabled = true)

Disable every server (or remove has_mcp_outbound from the org’s plan) and the next chat request flows through the standard pass-through path with zero behavior change.

Per-request opt-out

A client whose org has the agent loop active can still run a pure pass-through chat completion by sending the floopy-mcp-disabled header:

floopy-mcp-disabled: true

Truthy values (true / 1 / yes, case-insensitive) skip the agent path for that single request. Useful when the caller knows the prompt needs no tools and wants to skip the agent’s per-round overhead.

What happens per round

client → /v1/chat/completions
        │
        ├─ org_ctx.mcp_agent_active = true   AND   no opt-out header
        │
        ▼
  agent::agent_loop::run
        │ for each round:
        │   1. budget.check_round (max_rounds, wall-clock)
        │   2. LLM call via execute_strategies  ← per-round span
        │      • cancellable via tokio::select!
        │      • dispatches to provider (OpenAI/Anthropic/Gemini/Bedrock)
        │   3. if no tool_calls → return final assistant message
        │   4. else dispatch_round (parallel, allowlist-checked)
        │       4a. cache hit? reuse value
        │       4b. miss? resolve secret_ref, call upstream, cache result
        │       4c. SSRF guard checks resolved IP — block aborts the
        │           full request, not just the tool call (see Security)
        │   5. emit audit_log.agent.tool_call per dispatch
        │   6. append tool messages, loop

Tool discovery (auto-discovered, not declared)

Floopy follows the market-standard MCP model: clients reference servers, not individual tools. The agent loop calls tools/list on each enabled server in parallel (cache TTL 10 minutes), aggregates the results, and exposes the union to the LLM. Tools are deduplicated by name with first-server (configuration order) winning on collisions.

Strict input-schema validation: tools whose inputSchema is missing, null, or not a JSON object are dropped from the catalog. The drop is logged as mcp_tool_invalid_schema; if a server has zero valid tools after validation, a server-level mcp_server_no_valid_tools warning fires. The MCP spec mandates inputSchema, so a server hitting this in production is misconfigured.

Budgets

Three independent ceilings, enforced inside the loop:

Rounds — capped at runtime.max_rounds (1..50, default 10), and additionally at MAX_ROUNDS_CAP = 50 regardless of the per-org config.
Wall clock — 120s deadline sealed at acquisition. Each round races the LLM call and the tool dispatch against tokio::time::sleep_until(deadline) via tokio::select!, so a slow upstream cannot push the run past the deadline.
Concurrency — 16 simultaneous agent runs per org, enforced by an atomic Redis counter (Lua GET-and-conditional-INCR). Excess returns 429 with retry_after_secs = 60.

Exhausting rounds or wall-clock emits one audit_log.agent.budget_exhausted row and returns the partial messages with finish_reason = "length".

Tool result cache

Identical tool calls (same (org_id, server_id, tool_name, canonical_args)) within runtime.tool_cache_ttl_seconds reuse the previous result. The cache value is HMAC-signed with FLOOPY_AGENT_CACHE_PEPPER so:

A Redis writer cannot inject a forged result.
A value from one (org, server, tool) cannot be migrated to another — the cache key is part of the MAC input.
A pepper rotation invalidates every cached entry wholesale.

tool_cache_ttl_seconds = 0 disables the cache entirely.

Stream modes

runtime.stream_mode is the org-level switch between two modes:

final_only — when the client requests stream: true, the loop runs to completion (intermediate rounds buffered server-side), then the final assistant message is split into multiple ~64-byte delta.content SSE frames so clients see the answer progressively. UTF-8 boundaries are preserved. Intermediate-round text (e.g. “Vou consultar o clima.”) is not streamed live in this version — it’s reserved for a follow-up that drives each round through the upstream provider as a stream.
disabled — even if the client requests stream: true, the response is a single non-streaming chat.completion JSON. Useful for integrations that cannot consume SSE.

When the client doesn’t request streaming, both modes return JSON.

Provider compatibility

Tools, tool_use / tool_calls, and tool_result round-trip through every provider Floopy supports:

Provider	Tool field name	Result field name	Stop reason
OpenAI / OpenAI-compat	`tools[].function.parameters`	`tool_calls[]`	`tool_calls`
Anthropic	`tools[].input_schema`	`content[].tool_use`	`tool_use`
Gemini	`tools[].functionDeclarations`	`parts[].functionCall`	(inferred from part type)
Bedrock Converse	`toolConfig.tools[].toolSpec`	`output.message.content[].toolUse`	`tool_use`

Floopy translates between these on the wire — clients always speak the OpenAI shape.

Security: outbound SSRF guard

Every outbound HTTP from the agent loop (and from webhook delivery) passes through a cluster-wide SSRF policy loaded once from environment variables at startup. Default behavior matches the pre-PR-2 hardcoded blocks; operators can adjust without code changes:

FLOOPY_OUTBOUND_SSRF_BLOCK_LOOPBACK=true
FLOOPY_OUTBOUND_SSRF_BLOCK_PRIVATE=true       # RFC1918
FLOOPY_OUTBOUND_SSRF_BLOCK_LINK_LOCAL=true    # incl. 169.254.169.254 cloud metadata
FLOOPY_OUTBOUND_SSRF_BLOCK_MULTICAST=true
FLOOPY_OUTBOUND_SSRF_BLOCK_CGNAT=true         # 100.64/10
FLOOPY_OUTBOUND_SSRF_BLOCK_ULA_V6=true        # fc00::/7
FLOOPY_OUTBOUND_SSRF_EXTRA_BLOCKED_CIDRS=     # CSV of v4 CIDRs
FLOOPY_OUTBOUND_SSRF_ALLOWLIST_CIDRS=         # CSV; overrides every block

A block during the agent loop aborts the entire request with HTTP 403 (outbound_ssrf_blocked: ssrf_blocked:{reason}:{ip}). It does not feed the error back to the LLM as a tool result, because that would let an attacker who controls an MCP server URL probe Floopy’s internal network one tool call at a time.

Tool argument validation

Every tool call produced by the LLM is re-validated against the inputSchema returned by the upstream’s tools/list before Floopy dispatches it. A schema violation surfaces as a tool error fed back to the LLM rather than a hard crash — the loop continues, and the model gets a chance to retry with corrected arguments. SSRF blocks (above) are the only fatal class.

Observability

Each round writes to ClickHouse request_response_rmt with:

surface = 'agent'
agent_run_id — UUID; equals the request_id of round 0 and is reused across rounds.
round_index — zero-based.
tool_call_index — NULL on the LLM-call row; carried separately in audit rows for per-tool dispatches.

Each tool call writes one audit_log.agent.tool_call row with (agent_run_id, round_index, tool_call_index, server_id, tool, status, latency_ms).

The orchestrator opens a routing_execute span around the whole agent invocation, with N child agent_round_{idx} spans (one per LLM dispatch) carrying round_index and agent_run_id attributes. Join all of these by agent_run_id for end-to-end traceability.

Disabling

Per-request: send floopy-mcp-disabled: true header.
Per-server: toggle the Switch on the Server card.
Per-org: remove has_mcp_outbound from the plan, or disable every server.
Per-runtime tweak: set tool_cache_ttl_seconds = 0 to disable just the cache, or max_rounds = 1 to force single-shot behavior.

The loop’s overhead when inactive is one boolean check on OrgContext.mcp_agent_active plus one optional header read.