How to Build an Agentic Workflow with Floopy's MCP Integration
Build a production agentic loop with Floopy: plugin YAML config, MCP server connection, secret management, and full workflow testing.
This tutorial walks through building a real agentic workflow: a GPT-4o agent that can search the web to answer research questions. We’ll use Floopy as the AI gateway and the Brave Search MCP server as the tool provider.
By the end, you’ll have a working agentic loop where the model decides when to search, executes queries, reads results, and synthesizes a final answer — all through Floopy’s gateway.
What We’re Building
User: "What are the key differences between Rust's async runtimes Tokio and async-std?"
Agent: Round 1 → calls web_search("Rust Tokio vs async-std comparison 2025") ← tool result: [search snippets] Round 2 → calls web_search("async-std maintenance status 2025") ← tool result: [search snippets] Round 3 → synthesizes final answer from gathered information → returns response to userPrerequisites
- A Floopy account with the Pro plan
- A Brave Search API key (free tier available at brave.com/search/api)
- Node.js 18+ for testing
Step 1: Store Your Secret
Never put API keys directly in configuration files. Start by storing your Brave Search API key in Floopy Vault.
- Open the Floopy dashboard
- Go to Settings > Secrets
- Click Add Secret
- Name:
brave_search_api_key - Value: your Brave Search API key
- Click Save
The secret is now encrypted at rest and will be injected at runtime. It will never appear in logs or API responses.
Step 2: Create a Routing Rule
Agentic plugin configurations are attached to routing rules.
- Go to Routing in the dashboard
- Click New Rule
- Name it
research-agent - Set the default model to
gpt-4o - Leave other settings at defaults for now
- Click Save
Step 3: Write the Plugin YAML
The plugin YAML tells Floopy which MCP servers to connect to and how to run the agentic loop.
Create a file called research-agent.yaml:
version: "1"
mcp_servers: - id: brave_search url: "https://api.search.brave.com/mcp" auth: type: api_key header: "X-Subscription-Token" secret_ref: "secret.brave_search_api_key" tools: - web_search timeout_ms: 8000 max_retries: 2
agent: max_rounds: 6 stream_mode: final_only tool_call_parallel: false tool_cache_ttl_seconds: 300 prompt_guard_on_tool_output: trueWhat each field does:
secret_ref: "secret.brave_search_api_key"— references the secret you stored in Step 1tools: [web_search]— only expose this one tool from the Brave servermax_rounds: 6— allow up to 6 tool call iterationstool_cache_ttl_seconds: 300— cache identical search queries for 5 minutesprompt_guard_on_tool_output: true— scan search results for prompt injection
Step 4: Attach the Plugin to Your Routing Rule
- Go to Routing > research-agent in the dashboard
- Click MCP Plugin
- Paste the YAML content
- Click Save
Floopy validates the YAML schema and checks that the secret reference exists. If validation fails, the error message will tell you exactly what’s wrong.
Step 5: Test in the Playground
Before writing code, test the setup in the Floopy Playground.
- Go to Playground in the dashboard
- Select routing rule:
research-agent - Enter this message:
What are the main performance differences between PostgreSQL and ClickHouse for analytical workloads?- Click Send
Watch the response stream in. In the Trace panel on the right, you’ll see each tool call logged in real time: the search query sent, the results received, and which round of the loop you’re in.
If you see finish_reason: tool_calls followed by tool results and then a final stop, the loop is working correctly.
Step 6: Call from Your Application
Once tested, call it from your application using the standard OpenAI SDK:
import { OpenAI } from "openai";
const client = new OpenAI({ baseURL: "https://api.floopy.ai/v1", apiKey: process.env.FLOOPY_API_KEY, defaultHeaders: { "floopy-routing-rule": "research-agent", // activate your routing rule },});
async function researchQuestion(question: string): Promise<string> { const response = await client.chat.completions.create({ model: "gpt-4o", messages: [ { role: "system", content: "You are a research assistant. Use web search to find accurate, up-to-date information before answering. Always cite your sources.", }, { role: "user", content: question, }, ], stream: true, });
let result = ""; for await (const chunk of response) { const delta = chunk.choices[0]?.delta?.content; if (delta) { result += delta; process.stdout.write(delta); // stream to console } } return result;}
const answer = await researchQuestion( "What are the key differences between Rust's Tokio and async-std runtimes?");The routing rule activates your MCP plugin. When GPT-4o decides to call web_search, Floopy handles it — your application code never needs to know tool calls are happening.
Step 7: Send the Plugin Inline (Alternative)
If you want to use the agentic loop without a routing rule — useful for development or per-request customization — send the plugin YAML inline via a request header:
import { readFileSync } from "fs";
const pluginYaml = readFileSync("./research-agent.yaml", "utf-8");const pluginBase64 = Buffer.from(pluginYaml).toString("base64");
const client = new OpenAI({ baseURL: "https://api.floopy.ai/v1", apiKey: process.env.FLOOPY_API_KEY, defaultHeaders: { "floopy-mcp-plugin": pluginBase64, },});Step 8: Monitor in Observability
Every agentic session is fully logged. To inspect your sessions:
- Go to Observability > Requests in the dashboard
- Filter by
has_tool_calls: true - Click any request to expand the full trace
For each session you’ll see:
| Field | Example |
|---|---|
| Rounds completed | 3 |
| Total tokens | 4,821 |
| Tool calls | web_search × 2 |
| Tool execution time | 1.2s avg |
| Total latency | 8.4s |
| Cache hits | 0 / 2 |
If a session hit max_rounds, the log shows finish_reason: max_rounds on the last response.
Tuning the Configuration
The agent is searching too many times
Reduce max_rounds or add a more explicit system prompt:
You are a research assistant. Search at most twice before synthesizing your answer.Tool calls are slow
Lower timeout_ms to fail fast on slow servers. Add max_retries: 1 to retry once before failing.
Identical searches are being repeated
tool_cache_ttl_seconds: 300 caches results for 5 minutes. If the agent sends the same query twice in a session, it hits the cache instead of calling the MCP server again.
You want to see intermediate tool calls streamed
Change stream_mode from final_only to… actually, intermediate streaming is not available yet — only the final response can be streamed. The full trace is always available in Observability.
What to Build Next
With this foundation, you can build more complex agentic systems:
Multi-server agent: Add a code interpreter MCP server alongside web search. The agent can search for solutions and then run code to verify them.
Domain-specific agent: Replace web search with a company knowledge base MCP server. The agent retrieves internal documentation to answer employee questions.
Cost-aware agent: Add estimate_cost calls in your system prompt. The agent can decide whether a question is worth multiple search rounds or should be answered directly.
Full reference docs: MCP Client · MCP Server · MCP Tokens