Agent loop (chat completions)
Auto-activation
Section titled “Auto-activation”When an organization has at least one enabled outbound MCP server (configured under MCP → Servers), every /v1/chat/completions request automatically runs through the agent loop. The decision is made server-side from the mcp_agent_active flag on organization_profiles:
mcp_agent_active = has_mcp_outbound AND EXISTS(SELECT 1 FROM mcp_outbound_servers s WHERE s.org_id = ... AND s.enabled = true)Disable every server (or remove has_mcp_outbound from the org’s plan) and the next chat request flows through the standard pass-through path with zero behavior change.
Per-request opt-out
Section titled “Per-request opt-out”A client whose org has the agent loop active can still run a pure pass-through chat completion by sending the floopy-mcp-disabled header:
floopy-mcp-disabled: trueTruthy values (true / 1 / yes, case-insensitive) skip the agent path for that single request. Useful when the caller knows the prompt needs no tools and wants to skip the agent’s per-round overhead.
What happens per round
Section titled “What happens per round”client → /v1/chat/completions │ ├─ org_ctx.mcp_agent_active = true AND no opt-out header │ ▼ agent::agent_loop::run │ for each round: │ 1. budget.check_round (max_rounds, wall-clock) │ 2. LLM call via execute_strategies ← per-round span │ • cancellable via tokio::select! │ • dispatches to provider (OpenAI/Anthropic/Gemini/Bedrock) │ 3. if no tool_calls → return final assistant message │ 4. else dispatch_round (parallel, allowlist-checked) │ 4a. cache hit? reuse value │ 4b. miss? resolve secret_ref, call upstream, cache result │ 4c. SSRF guard checks resolved IP — block aborts the │ full request, not just the tool call (see Security) │ 5. emit audit_log.agent.tool_call per dispatch │ 6. append tool messages, loopTool discovery (auto-discovered, not declared)
Section titled “Tool discovery (auto-discovered, not declared)”Floopy follows the market-standard MCP model: clients reference servers, not individual tools. The agent loop calls tools/list on each enabled server in parallel (cache TTL 10 minutes), aggregates the results, and exposes the union to the LLM. Tools are deduplicated by name with first-server (configuration order) winning on collisions.
Strict input-schema validation: tools whose inputSchema is missing, null, or not a JSON object are dropped from the catalog. The drop is logged as mcp_tool_invalid_schema; if a server has zero valid tools after validation, a server-level mcp_server_no_valid_tools warning fires. The MCP spec mandates inputSchema, so a server hitting this in production is misconfigured.
Budgets
Section titled “Budgets”Three independent ceilings, enforced inside the loop:
- Rounds — capped at
runtime.max_rounds(1..50, default 10), and additionally atMAX_ROUNDS_CAP = 50regardless of the per-org config. - Wall clock — 120s deadline sealed at acquisition. Each round races the LLM call and the tool dispatch against
tokio::time::sleep_until(deadline)viatokio::select!, so a slow upstream cannot push the run past the deadline. - Concurrency — 16 simultaneous agent runs per org, enforced by an atomic Redis counter (Lua
GET-and-conditional-INCR). Excess returns429withretry_after_secs = 60.
Exhausting rounds or wall-clock emits one audit_log.agent.budget_exhausted row and returns the partial messages with finish_reason = "length".
Tool result cache
Section titled “Tool result cache”Identical tool calls (same (org_id, server_id, tool_name, canonical_args)) within runtime.tool_cache_ttl_seconds reuse the previous result. The cache value is HMAC-signed with FLOOPY_AGENT_CACHE_PEPPER so:
- A Redis writer cannot inject a forged result.
- A value from one
(org, server, tool)cannot be migrated to another — the cache key is part of the MAC input. - A pepper rotation invalidates every cached entry wholesale.
tool_cache_ttl_seconds = 0 disables the cache entirely.
Stream modes
Section titled “Stream modes”runtime.stream_mode is the org-level switch between two modes:
final_only— when the client requestsstream: true, the loop runs to completion (intermediate rounds buffered server-side), then the final assistant message is split into multiple~64-bytedelta.contentSSE frames so clients see the answer progressively. UTF-8 boundaries are preserved. Intermediate-round text (e.g. “Vou consultar o clima.”) is not streamed live in this version — it’s reserved for a follow-up that drives each round through the upstream provider as a stream.disabled— even if the client requestsstream: true, the response is a single non-streamingchat.completionJSON. Useful for integrations that cannot consume SSE.
When the client doesn’t request streaming, both modes return JSON.
Provider compatibility
Section titled “Provider compatibility”Tools, tool_use / tool_calls, and tool_result round-trip through every provider Floopy supports:
| Provider | Tool field name | Result field name | Stop reason |
|---|---|---|---|
| OpenAI / OpenAI-compat | tools[].function.parameters | tool_calls[] | tool_calls |
| Anthropic | tools[].input_schema | content[].tool_use | tool_use |
| Gemini | tools[].functionDeclarations | parts[].functionCall | (inferred from part type) |
| Bedrock Converse | toolConfig.tools[].toolSpec | output.message.content[].toolUse | tool_use |
Floopy translates between these on the wire — clients always speak the OpenAI shape.
Security: outbound SSRF guard
Section titled “Security: outbound SSRF guard”Every outbound HTTP from the agent loop (and from webhook delivery) passes through a cluster-wide SSRF policy loaded once from environment variables at startup. Default behavior matches the pre-PR-2 hardcoded blocks; operators can adjust without code changes:
FLOOPY_OUTBOUND_SSRF_BLOCK_LOOPBACK=trueFLOOPY_OUTBOUND_SSRF_BLOCK_PRIVATE=true # RFC1918FLOOPY_OUTBOUND_SSRF_BLOCK_LINK_LOCAL=true # incl. 169.254.169.254 cloud metadataFLOOPY_OUTBOUND_SSRF_BLOCK_MULTICAST=trueFLOOPY_OUTBOUND_SSRF_BLOCK_CGNAT=true # 100.64/10FLOOPY_OUTBOUND_SSRF_BLOCK_ULA_V6=true # fc00::/7FLOOPY_OUTBOUND_SSRF_EXTRA_BLOCKED_CIDRS= # CSV of v4 CIDRsFLOOPY_OUTBOUND_SSRF_ALLOWLIST_CIDRS= # CSV; overrides every blockA block during the agent loop aborts the entire request with HTTP 403 (outbound_ssrf_blocked: ssrf_blocked:{reason}:{ip}). It does not feed the error back to the LLM as a tool result, because that would let an attacker who controls an MCP server URL probe Floopy’s internal network one tool call at a time.
Tool argument validation
Section titled “Tool argument validation”Every tool call produced by the LLM is re-validated against the inputSchema returned by the upstream’s tools/list before Floopy dispatches it. A schema violation surfaces as a tool error fed back to the LLM rather than a hard crash — the loop continues, and the model gets a chance to retry with corrected arguments. SSRF blocks (above) are the only fatal class.
Observability
Section titled “Observability”Each round writes to ClickHouse request_response_rmt with:
surface = 'agent'agent_run_id— UUID; equals the request_id of round 0 and is reused across rounds.round_index— zero-based.tool_call_index—NULLon the LLM-call row; carried separately in audit rows for per-tool dispatches.
Each tool call writes one audit_log.agent.tool_call row with (agent_run_id, round_index, tool_call_index, server_id, tool, status, latency_ms).
The orchestrator opens a routing_execute span around the whole agent invocation, with N child agent_round_{idx} spans (one per LLM dispatch) carrying round_index and agent_run_id attributes. Join all of these by agent_run_id for end-to-end traceability.
Disabling
Section titled “Disabling”- Per-request: send
floopy-mcp-disabled: trueheader. - Per-server: toggle the Switch on the Server card.
- Per-org: remove
has_mcp_outboundfrom the plan, or disable every server. - Per-runtime tweak: set
tool_cache_ttl_seconds = 0to disable just the cache, ormax_rounds = 1to force single-shot behavior.
The loop’s overhead when inactive is one boolean check on OrgContext.mcp_agent_active plus one optional header read.