MCP Client (Agentic Loop)

Overview

Floopy can act as an MCP client: it connects to external MCP servers on behalf of your agents and injects their tools into the conversation. When the LLM decides to call a tool, Floopy executes it, appends the result to the conversation, and loops back to the model — all transparently.

This is the agentic loop: the model reasons, calls tools, observes results, and reasons again until it reaches a final answer.

The Agent Loop

flowchart TD
    A[User message] --> B[LLM call]
    B --> C{Tool call requested?}
    C -- No --> D[Return final response]
    C -- Yes --> E[Execute tools via MCP server]
    E --> F[Tool result post-processing]
    F --> G[Append tool results to messages]
    G --> H{Round < max_rounds?}
    H -- Yes --> B
    H -- No --> I[Return last model response]

The loop is bounded by max_rounds to prevent infinite execution. When the limit is reached, Floopy returns the last model response. A global timeout (default 120s) also applies to the entire loop.

Plugin YAML Schema

Configure the agentic loop with a plugin YAML attached to your routing rule or sent as a request header (floopy-mcp-plugin).

Full Example

version: "1"

mcp_servers:
  - id: web_search
    url: "https://mcp.example.com/search"
    auth:
      type: bearer
      secret_ref: "secret.mcp_search_api_key"      # resolved from Floopy Vault
    tools:
      - search_web
      - fetch_page
    timeout_ms: 5000
    max_retries: 2

  - id: code_interpreter
    url: "https://mcp.example.com/code"
    auth:
      type: api_key
      header: "X-Api-Key"
      secret_ref: "secret.mcp_code_api_key"
    tools: "*"                               # expose all tools from this server
    timeout_ms: 15000

agent:
  max_rounds: 10
  stream_mode: final_only                    # final_only | disabled
  tool_call_parallel: true                   # execute independent tool calls in parallel
  tool_cache_ttl_seconds: 300               # cache tool results (0 = disabled)
  # prompt_guard_on_tool_output was the pre-migration ONNX-based scan
  # of tool results — currently a no-op while the firewall sync→async
  # interface is reworked. Field accepted for backwards compatibility.

Field Reference

`mcp_servers[]`

Field	Type	Required	Description
`id`	string	yes	Unique identifier for this server within the plugin
`url`	string	yes	HTTP(S) endpoint of the MCP server (must pass SSRF validator)
`auth`	object	no	Authentication to use when calling the server
`tools`	string[] or `"*"`	no	Tools to expose. Defaults to `"*"` (all)
`timeout_ms`	integer	no	Per-request timeout. Default: `5000`
`max_retries`	integer	no	Retry attempts on transient errors. Default: `1`

`auth`

Auth type	Fields	Description
`bearer`	`secret_ref`	Sends `Authorization: Bearer <secret>`
`api_key`	`header`, `secret_ref`	Sends the secret in a custom header
`hmac`	`secret_ref`, `algorithm`	Signs the request body (SHA-256 default)
`none`	—	No authentication

`agent`

Field	Type	Default	Description
`max_rounds`	integer	`5`	Maximum tool-call iterations before returning
`stream_mode`	enum	`final_only`	When to stream: `final_only` or `disabled`
`tool_call_parallel`	boolean	`true`	Execute non-dependent tool calls in parallel
`tool_cache_ttl_seconds`	integer	`0`	Cache identical tool calls (by args hash)
`prompt_guard_on_tool_output`	boolean	`false`	Pre-migration ONNX scan of tool outputs. Currently a no-op while the sync `Validator` interface is reworked. Field still accepted for backwards compatibility.

Secret Management

Never put API keys directly in the YAML. Store them in Floopy Vault and reference them by name.

Storing a Secret

Go to Settings > Secrets in the dashboard
Click Add Secret
Enter the name (e.g., mcp_search_api_key) and value
Click Save

The secret is encrypted at rest (AES-256) and injected at runtime — it is never logged or returned in API responses.

Referencing a Secret

Use the secret. prefix followed by the name you stored:

auth:
  type: bearer
  secret_ref: "secret.mcp_search_api_key"

The format is always secret.<name> where <name> matches exactly what you stored in the dashboard. Only alphanumeric characters, hyphens, and underscores are allowed (max 64 characters). Characters like :, /, and . (beyond the prefix) are rejected for security reasons.

Internally, each secret is isolated per organization — your secrets are never accessible by other tenants.

Streaming Modes

Mode	Behavior
`final_only`	Streams the final LLM response after all tool calls complete. Intermediate tool calls are not streamed.
`disabled`	Returns the complete response as a single JSON object when the loop finishes.

Note: intermediate tool call steps are always available in the request log under Observability, regardless of streaming mode.

Loop Limits and Timeouts

Set max_rounds to a value appropriate for your use case:

Use case	Recommended `max_rounds`
Single-tool lookup	2–3
Multi-step research	5–8
Complex autonomous agent	10–15

Each round adds LLM latency plus tool execution time. Keep timeout_ms per server low to avoid stalling the loop.

A hard gateway timeout of 120 seconds applies to the entire agentic loop. Requests exceeding this limit are terminated and the partial response is returned with a timeout finish reason.

Sending the Plugin via Header

Instead of attaching the plugin to a routing rule, you can send it inline per-request using the floopy-mcp-plugin header with a base64-encoded YAML value:

Node.js
Python

import { OpenAI } from "openai";
import { Buffer } from "buffer";

const plugin = `
version: "1"
mcp_servers:
  - id: search
    url: "https://mcp.example.com/search"
    auth:
      type: bearer
      secret_ref: "secret.mcp_search_api_key"
agent:
  max_rounds: 5
`;

const client = new OpenAI({
  baseURL: "https://api.floopy.ai/v1",
  apiKey: process.env.FLOOPY_API_KEY,
  defaultHeaders: {
    "floopy-mcp-plugin": Buffer.from(plugin).toString("base64"),
  },
});

const response = await client.chat.completions.create({
  model: "gpt-4o",
  messages: [{ role: "user", content: "What is the current price of BTC?" }],
});

import base64, os
from openai import OpenAI

plugin = """
version: "1"
mcp_servers:
  - id: search
    url: "https://mcp.example.com/search"
    auth:
      type: bearer
      secret_ref: "secret.mcp_search_api_key"
agent:
  max_rounds: 5
"""

client = OpenAI(
    base_url="https://api.floopy.ai/v1",
    api_key=os.environ["FLOOPY_API_KEY"],
    default_headers={
        "floopy-mcp-plugin": base64.b64encode(plugin.encode()).decode(),
    },
)

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "What is the current price of BTC?"}],
)

End-to-End Example

The following example wires a web search MCP server to a GPT-4o agent that answers research questions.

Plugin YAML (attached to routing rule “Research Agent”):

version: "1"

mcp_servers:
  - id: brave_search
    url: "https://mcp.brave.com/search"
    auth:
      type: bearer
      secret_ref: "secret.brave_api_key"
    tools:
      - web_search
    timeout_ms: 8000

agent:
  max_rounds: 6
  stream_mode: final_only
  tool_call_parallel: false
  # prompt_guard_on_tool_output is currently a no-op (see Field Reference table)

Request:

const response = await client.chat.completions.create({
  model: "gpt-4o",
  messages: [
    {
      role: "user",
      content: "What are the three most cited papers on transformer attention published in 2024?",
    },
  ],
});

console.log(response.choices[0].message.content);
// The model searched the web, read results, and synthesized a final answer.

What happened internally:

GPT-4o called web_search("transformer attention papers 2024")
Floopy executed the tool via the Brave MCP server
Results were appended to the conversation
GPT-4o called web_search("citation counts transformer 2024") for follow-up
Floopy returned the synthesized final answer after round 2

Observability

Every agentic loop execution is logged in full:

Tool calls made (name, arguments, duration)
Tool results (sanitized — secrets redacted)
Number of rounds completed
Total tokens consumed across all rounds
Whether the loop hit max_rounds

View logs under Observability > Requests in the dashboard. Filter by has_tool_calls: true to isolate agentic sessions.