Skip to content

MCP Client (Agentic Loop)

Overview

Floopy can act as an MCP client: it connects to external MCP servers on behalf of your agents and injects their tools into the conversation. When the LLM decides to call a tool, Floopy executes it, appends the result to the conversation, and loops back to the model — all transparently.

This is the agentic loop: the model reasons, calls tools, observes results, and reasons again until it reaches a final answer.


The Agent Loop

flowchart TD
    A[User message] --> B[LLM call]
    B --> C{Tool call requested?}
    C -- No --> D[Return final response]
    C -- Yes --> E[Execute tools via MCP server]
    E --> F[Tool result post-processing]
    F --> G[Append tool results to messages]
    G --> H{Round < max_rounds?}
    H -- Yes --> B
    H -- No --> I[Return last model response]

The loop is bounded by max_rounds to prevent infinite execution. When the limit is reached, Floopy returns the last model response. A global timeout (default 120s) also applies to the entire loop.


Plugin YAML Schema

Configure the agentic loop with a plugin YAML attached to your routing rule or sent as a request header (floopy-mcp-plugin).

Full Example

version: "1"
mcp_servers:
- id: web_search
url: "https://mcp.example.com/search"
auth:
type: bearer
secret_ref: "secret.mcp_search_api_key" # resolved from Floopy Vault
tools:
- search_web
- fetch_page
timeout_ms: 5000
max_retries: 2
- id: code_interpreter
url: "https://mcp.example.com/code"
auth:
type: api_key
header: "X-Api-Key"
secret_ref: "secret.mcp_code_api_key"
tools: "*" # expose all tools from this server
timeout_ms: 15000
agent:
max_rounds: 10
stream_mode: final_only # final_only | disabled
tool_call_parallel: true # execute independent tool calls in parallel
tool_cache_ttl_seconds: 300 # cache tool results (0 = disabled)
# prompt_guard_on_tool_output was the pre-migration ONNX-based scan
# of tool results — currently a no-op while the firewall sync→async
# interface is reworked. Field accepted for backwards compatibility.

Field Reference

mcp_servers[]

FieldTypeRequiredDescription
idstringyesUnique identifier for this server within the plugin
urlstringyesHTTP(S) endpoint of the MCP server (must pass SSRF validator)
authobjectnoAuthentication to use when calling the server
toolsstring[] or "*"noTools to expose. Defaults to "*" (all)
timeout_msintegernoPer-request timeout. Default: 5000
max_retriesintegernoRetry attempts on transient errors. Default: 1

auth

Auth typeFieldsDescription
bearersecret_refSends Authorization: Bearer <secret>
api_keyheader, secret_refSends the secret in a custom header
hmacsecret_ref, algorithmSigns the request body (SHA-256 default)
noneNo authentication

agent

FieldTypeDefaultDescription
max_roundsinteger5Maximum tool-call iterations before returning
stream_modeenumfinal_onlyWhen to stream: final_only or disabled
tool_call_parallelbooleantrueExecute non-dependent tool calls in parallel
tool_cache_ttl_secondsinteger0Cache identical tool calls (by args hash)
prompt_guard_on_tool_outputbooleanfalsePre-migration ONNX scan of tool outputs. Currently a no-op while the sync Validator interface is reworked. Field still accepted for backwards compatibility.

Secret Management

Never put API keys directly in the YAML. Store them in Floopy Vault and reference them by name.

Storing a Secret

  1. Go to Settings > Secrets in the dashboard
  2. Click Add Secret
  3. Enter the name (e.g., mcp_search_api_key) and value
  4. Click Save

The secret is encrypted at rest (AES-256) and injected at runtime — it is never logged or returned in API responses.

Referencing a Secret

Use the secret. prefix followed by the name you stored:

auth:
type: bearer
secret_ref: "secret.mcp_search_api_key"

The format is always secret.<name> where <name> matches exactly what you stored in the dashboard. Only alphanumeric characters, hyphens, and underscores are allowed (max 64 characters). Characters like :, /, and . (beyond the prefix) are rejected for security reasons.

Internally, each secret is isolated per organization — your secrets are never accessible by other tenants.


Streaming Modes

ModeBehavior
final_onlyStreams the final LLM response after all tool calls complete. Intermediate tool calls are not streamed.
disabledReturns the complete response as a single JSON object when the loop finishes.

Note: intermediate tool call steps are always available in the request log under Observability, regardless of streaming mode.


Loop Limits and Timeouts

Set max_rounds to a value appropriate for your use case:

Use caseRecommended max_rounds
Single-tool lookup2–3
Multi-step research5–8
Complex autonomous agent10–15

Each round adds LLM latency plus tool execution time. Keep timeout_ms per server low to avoid stalling the loop.

A hard gateway timeout of 120 seconds applies to the entire agentic loop. Requests exceeding this limit are terminated and the partial response is returned with a timeout finish reason.


Sending the Plugin via Header

Instead of attaching the plugin to a routing rule, you can send it inline per-request using the floopy-mcp-plugin header with a base64-encoded YAML value:

import { OpenAI } from "openai";
import { Buffer } from "buffer";
const plugin = `
version: "1"
mcp_servers:
- id: search
url: "https://mcp.example.com/search"
auth:
type: bearer
secret_ref: "secret.mcp_search_api_key"
agent:
max_rounds: 5
`;
const client = new OpenAI({
baseURL: "https://api.floopy.ai/v1",
apiKey: process.env.FLOOPY_API_KEY,
defaultHeaders: {
"floopy-mcp-plugin": Buffer.from(plugin).toString("base64"),
},
});
const response = await client.chat.completions.create({
model: "gpt-4o",
messages: [{ role: "user", content: "What is the current price of BTC?" }],
});

End-to-End Example

The following example wires a web search MCP server to a GPT-4o agent that answers research questions.

Plugin YAML (attached to routing rule “Research Agent”):

version: "1"
mcp_servers:
- id: brave_search
url: "https://mcp.brave.com/search"
auth:
type: bearer
secret_ref: "secret.brave_api_key"
tools:
- web_search
timeout_ms: 8000
agent:
max_rounds: 6
stream_mode: final_only
tool_call_parallel: false
# prompt_guard_on_tool_output is currently a no-op (see Field Reference table)

Request:

const response = await client.chat.completions.create({
model: "gpt-4o",
messages: [
{
role: "user",
content: "What are the three most cited papers on transformer attention published in 2024?",
},
],
});
console.log(response.choices[0].message.content);
// The model searched the web, read results, and synthesized a final answer.

What happened internally:

  1. GPT-4o called web_search("transformer attention papers 2024")
  2. Floopy executed the tool via the Brave MCP server
  3. Results were appended to the conversation
  4. GPT-4o called web_search("citation counts transformer 2024") for follow-up
  5. Floopy returned the synthesized final answer after round 2

Observability

Every agentic loop execution is logged in full:

  • Tool calls made (name, arguments, duration)
  • Tool results (sanitized — secrets redacted)
  • Number of rounds completed
  • Total tokens consumed across all rounds
  • Whether the loop hit max_rounds

View logs under Observability > Requests in the dashboard. Filter by has_tool_calls: true to isolate agentic sessions.