Skip to content

How to Build an Agentic Workflow with Floopy's MCP Integration

Build a production agentic loop with Floopy: plugin YAML config, MCP server connection, secret management, and full workflow testing.

Floopy Team | | 6 min read
mcp agentic-loop ai-gateway tool-calling tutorial guides

This tutorial walks through building a real agentic workflow: a GPT-4o agent that can search the web to answer research questions. We’ll use Floopy as the AI gateway and the Brave Search MCP server as the tool provider.

By the end, you’ll have a working agentic loop where the model decides when to search, executes queries, reads results, and synthesizes a final answer — all through Floopy’s gateway.

What We’re Building

User: "What are the key differences between Rust's async runtimes Tokio and async-std?"
Agent:
Round 1 → calls web_search("Rust Tokio vs async-std comparison 2025")
← tool result: [search snippets]
Round 2 → calls web_search("async-std maintenance status 2025")
← tool result: [search snippets]
Round 3 → synthesizes final answer from gathered information
→ returns response to user

Prerequisites

  • A Floopy account with the Pro plan
  • A Brave Search API key (free tier available at brave.com/search/api)
  • Node.js 18+ for testing

Step 1: Store Your Secret

Never put API keys directly in configuration files. Start by storing your Brave Search API key in Floopy Vault.

  1. Open the Floopy dashboard
  2. Go to Settings > Secrets
  3. Click Add Secret
  4. Name: brave_search_api_key
  5. Value: your Brave Search API key
  6. Click Save

The secret is now encrypted at rest and will be injected at runtime. It will never appear in logs or API responses.


Step 2: Create a Routing Rule

Agentic plugin configurations are attached to routing rules.

  1. Go to Routing in the dashboard
  2. Click New Rule
  3. Name it research-agent
  4. Set the default model to gpt-4o
  5. Leave other settings at defaults for now
  6. Click Save

Step 3: Write the Plugin YAML

The plugin YAML tells Floopy which MCP servers to connect to and how to run the agentic loop.

Create a file called research-agent.yaml:

version: "1"
mcp_servers:
- id: brave_search
url: "https://api.search.brave.com/mcp"
auth:
type: api_key
header: "X-Subscription-Token"
secret_ref: "secret.brave_search_api_key"
tools:
- web_search
timeout_ms: 8000
max_retries: 2
agent:
max_rounds: 6
stream_mode: final_only
tool_call_parallel: false
tool_cache_ttl_seconds: 300
prompt_guard_on_tool_output: true

What each field does:

  • secret_ref: "secret.brave_search_api_key" — references the secret you stored in Step 1
  • tools: [web_search] — only expose this one tool from the Brave server
  • max_rounds: 6 — allow up to 6 tool call iterations
  • tool_cache_ttl_seconds: 300 — cache identical search queries for 5 minutes
  • prompt_guard_on_tool_output: true — scan search results for prompt injection

Step 4: Attach the Plugin to Your Routing Rule

  1. Go to Routing > research-agent in the dashboard
  2. Click MCP Plugin
  3. Paste the YAML content
  4. Click Save

Floopy validates the YAML schema and checks that the secret reference exists. If validation fails, the error message will tell you exactly what’s wrong.


Step 5: Test in the Playground

Before writing code, test the setup in the Floopy Playground.

  1. Go to Playground in the dashboard
  2. Select routing rule: research-agent
  3. Enter this message:
What are the main performance differences between PostgreSQL and ClickHouse for analytical workloads?
  1. Click Send

Watch the response stream in. In the Trace panel on the right, you’ll see each tool call logged in real time: the search query sent, the results received, and which round of the loop you’re in.

If you see finish_reason: tool_calls followed by tool results and then a final stop, the loop is working correctly.


Step 6: Call from Your Application

Once tested, call it from your application using the standard OpenAI SDK:

import { OpenAI } from "openai";
const client = new OpenAI({
baseURL: "https://api.floopy.ai/v1",
apiKey: process.env.FLOOPY_API_KEY,
defaultHeaders: {
"floopy-routing-rule": "research-agent", // activate your routing rule
},
});
async function researchQuestion(question: string): Promise<string> {
const response = await client.chat.completions.create({
model: "gpt-4o",
messages: [
{
role: "system",
content:
"You are a research assistant. Use web search to find accurate, up-to-date information before answering. Always cite your sources.",
},
{
role: "user",
content: question,
},
],
stream: true,
});
let result = "";
for await (const chunk of response) {
const delta = chunk.choices[0]?.delta?.content;
if (delta) {
result += delta;
process.stdout.write(delta); // stream to console
}
}
return result;
}
const answer = await researchQuestion(
"What are the key differences between Rust's Tokio and async-std runtimes?"
);

The routing rule activates your MCP plugin. When GPT-4o decides to call web_search, Floopy handles it — your application code never needs to know tool calls are happening.


Step 7: Send the Plugin Inline (Alternative)

If you want to use the agentic loop without a routing rule — useful for development or per-request customization — send the plugin YAML inline via a request header:

import { readFileSync } from "fs";
const pluginYaml = readFileSync("./research-agent.yaml", "utf-8");
const pluginBase64 = Buffer.from(pluginYaml).toString("base64");
const client = new OpenAI({
baseURL: "https://api.floopy.ai/v1",
apiKey: process.env.FLOOPY_API_KEY,
defaultHeaders: {
"floopy-mcp-plugin": pluginBase64,
},
});

Step 8: Monitor in Observability

Every agentic session is fully logged. To inspect your sessions:

  1. Go to Observability > Requests in the dashboard
  2. Filter by has_tool_calls: true
  3. Click any request to expand the full trace

For each session you’ll see:

FieldExample
Rounds completed3
Total tokens4,821
Tool callsweb_search × 2
Tool execution time1.2s avg
Total latency8.4s
Cache hits0 / 2

If a session hit max_rounds, the log shows finish_reason: max_rounds on the last response.


Tuning the Configuration

The agent is searching too many times

Reduce max_rounds or add a more explicit system prompt:

You are a research assistant. Search at most twice before synthesizing your answer.

Tool calls are slow

Lower timeout_ms to fail fast on slow servers. Add max_retries: 1 to retry once before failing.

Identical searches are being repeated

tool_cache_ttl_seconds: 300 caches results for 5 minutes. If the agent sends the same query twice in a session, it hits the cache instead of calling the MCP server again.

You want to see intermediate tool calls streamed

Change stream_mode from final_only to… actually, intermediate streaming is not available yet — only the final response can be streamed. The full trace is always available in Observability.


What to Build Next

With this foundation, you can build more complex agentic systems:

Multi-server agent: Add a code interpreter MCP server alongside web search. The agent can search for solutions and then run code to verify them.

Domain-specific agent: Replace web search with a company knowledge base MCP server. The agent retrieves internal documentation to answer employee questions.

Cost-aware agent: Add estimate_cost calls in your system prompt. The agent can decide whether a question is worth multiple search rounds or should be answered directly.

Full reference docs: MCP Client · MCP Server · MCP Tokens