MCP Client (Agentic Loop)
Overview
Floopy can act as an MCP client: it connects to external MCP servers on behalf of your agents and injects their tools into the conversation. When the LLM decides to call a tool, Floopy executes it, appends the result to the conversation, and loops back to the model — all transparently.
This is the agentic loop: the model reasons, calls tools, observes results, and reasons again until it reaches a final answer.
The Agent Loop
flowchart TD
A[User message] --> B[LLM call]
B --> C{Tool call requested?}
C -- No --> D[Return final response]
C -- Yes --> E[Execute tools via MCP server]
E --> F[Tool result post-processing]
F --> G[Append tool results to messages]
G --> H{Round < max_rounds?}
H -- Yes --> B
H -- No --> I[Return last model response]The loop is bounded by max_rounds to prevent infinite execution. When the limit is reached, Floopy returns the last model response. A global timeout (default 120s) also applies to the entire loop.
Plugin YAML Schema
Configure the agentic loop with a plugin YAML attached to your routing rule or sent as a request header (floopy-mcp-plugin).
Full Example
version: "1"
mcp_servers: - id: web_search url: "https://mcp.example.com/search" auth: type: bearer secret_ref: "secret.mcp_search_api_key" # resolved from Floopy Vault tools: - search_web - fetch_page timeout_ms: 5000 max_retries: 2
- id: code_interpreter url: "https://mcp.example.com/code" auth: type: api_key header: "X-Api-Key" secret_ref: "secret.mcp_code_api_key" tools: "*" # expose all tools from this server timeout_ms: 15000
agent: max_rounds: 10 stream_mode: final_only # final_only | disabled tool_call_parallel: true # execute independent tool calls in parallel tool_cache_ttl_seconds: 300 # cache tool results (0 = disabled) # prompt_guard_on_tool_output was the pre-migration ONNX-based scan # of tool results — currently a no-op while the firewall sync→async # interface is reworked. Field accepted for backwards compatibility.Field Reference
mcp_servers[]
| Field | Type | Required | Description |
|---|---|---|---|
id | string | yes | Unique identifier for this server within the plugin |
url | string | yes | HTTP(S) endpoint of the MCP server (must pass SSRF validator) |
auth | object | no | Authentication to use when calling the server |
tools | string[] or "*" | no | Tools to expose. Defaults to "*" (all) |
timeout_ms | integer | no | Per-request timeout. Default: 5000 |
max_retries | integer | no | Retry attempts on transient errors. Default: 1 |
auth
| Auth type | Fields | Description |
|---|---|---|
bearer | secret_ref | Sends Authorization: Bearer <secret> |
api_key | header, secret_ref | Sends the secret in a custom header |
hmac | secret_ref, algorithm | Signs the request body (SHA-256 default) |
none | — | No authentication |
agent
| Field | Type | Default | Description |
|---|---|---|---|
max_rounds | integer | 5 | Maximum tool-call iterations before returning |
stream_mode | enum | final_only | When to stream: final_only or disabled |
tool_call_parallel | boolean | true | Execute non-dependent tool calls in parallel |
tool_cache_ttl_seconds | integer | 0 | Cache identical tool calls (by args hash) |
prompt_guard_on_tool_output | boolean | false | Pre-migration ONNX scan of tool outputs. Currently a no-op while the sync Validator interface is reworked. Field still accepted for backwards compatibility. |
Secret Management
Never put API keys directly in the YAML. Store them in Floopy Vault and reference them by name.
Storing a Secret
- Go to Settings > Secrets in the dashboard
- Click Add Secret
- Enter the name (e.g.,
mcp_search_api_key) and value - Click Save
The secret is encrypted at rest (AES-256) and injected at runtime — it is never logged or returned in API responses.
Referencing a Secret
Use the secret. prefix followed by the name you stored:
auth: type: bearer secret_ref: "secret.mcp_search_api_key"The format is always secret.<name> where <name> matches exactly what you stored in the dashboard. Only alphanumeric characters, hyphens, and underscores are allowed (max 64 characters). Characters like :, /, and . (beyond the prefix) are rejected for security reasons.
Internally, each secret is isolated per organization — your secrets are never accessible by other tenants.
Streaming Modes
| Mode | Behavior |
|---|---|
final_only | Streams the final LLM response after all tool calls complete. Intermediate tool calls are not streamed. |
disabled | Returns the complete response as a single JSON object when the loop finishes. |
Note: intermediate tool call steps are always available in the request log under Observability, regardless of streaming mode.
Loop Limits and Timeouts
Set max_rounds to a value appropriate for your use case:
| Use case | Recommended max_rounds |
|---|---|
| Single-tool lookup | 2–3 |
| Multi-step research | 5–8 |
| Complex autonomous agent | 10–15 |
Each round adds LLM latency plus tool execution time. Keep timeout_ms per server low to avoid stalling the loop.
A hard gateway timeout of 120 seconds applies to the entire agentic loop. Requests exceeding this limit are terminated and the partial response is returned with a timeout finish reason.
Sending the Plugin via Header
Instead of attaching the plugin to a routing rule, you can send it inline per-request using the floopy-mcp-plugin header with a base64-encoded YAML value:
import { OpenAI } from "openai";import { Buffer } from "buffer";
const plugin = `version: "1"mcp_servers: - id: search url: "https://mcp.example.com/search" auth: type: bearer secret_ref: "secret.mcp_search_api_key"agent: max_rounds: 5`;
const client = new OpenAI({ baseURL: "https://api.floopy.ai/v1", apiKey: process.env.FLOOPY_API_KEY, defaultHeaders: { "floopy-mcp-plugin": Buffer.from(plugin).toString("base64"), },});
const response = await client.chat.completions.create({ model: "gpt-4o", messages: [{ role: "user", content: "What is the current price of BTC?" }],});import base64, osfrom openai import OpenAI
plugin = """version: "1"mcp_servers: - id: search url: "https://mcp.example.com/search" auth: type: bearer secret_ref: "secret.mcp_search_api_key"agent: max_rounds: 5"""
client = OpenAI( base_url="https://api.floopy.ai/v1", api_key=os.environ["FLOOPY_API_KEY"], default_headers={ "floopy-mcp-plugin": base64.b64encode(plugin.encode()).decode(), },)
response = client.chat.completions.create( model="gpt-4o", messages=[{"role": "user", "content": "What is the current price of BTC?"}],)End-to-End Example
The following example wires a web search MCP server to a GPT-4o agent that answers research questions.
Plugin YAML (attached to routing rule “Research Agent”):
version: "1"
mcp_servers: - id: brave_search url: "https://mcp.brave.com/search" auth: type: bearer secret_ref: "secret.brave_api_key" tools: - web_search timeout_ms: 8000
agent: max_rounds: 6 stream_mode: final_only tool_call_parallel: false # prompt_guard_on_tool_output is currently a no-op (see Field Reference table)Request:
const response = await client.chat.completions.create({ model: "gpt-4o", messages: [ { role: "user", content: "What are the three most cited papers on transformer attention published in 2024?", }, ],});
console.log(response.choices[0].message.content);// The model searched the web, read results, and synthesized a final answer.What happened internally:
- GPT-4o called
web_search("transformer attention papers 2024") - Floopy executed the tool via the Brave MCP server
- Results were appended to the conversation
- GPT-4o called
web_search("citation counts transformer 2024")for follow-up - Floopy returned the synthesized final answer after round 2
Observability
Every agentic loop execution is logged in full:
- Tool calls made (name, arguments, duration)
- Tool results (sanitized — secrets redacted)
- Number of rounds completed
- Total tokens consumed across all rounds
- Whether the loop hit
max_rounds
View logs under Observability > Requests in the dashboard. Filter by has_tool_calls: true to isolate agentic sessions.