Build an Agentic Workflow with MCP and Floopy AI Gateway

This tutorial walks through building a real agentic workflow: a GPT-4o agent that can search the web to answer research questions. We’ll use Floopy as the AI gateway and the Brave Search MCP server as the tool provider.

By the end, you’ll have a working agentic loop where the model decides when to search, executes queries, reads results, and synthesizes a final answer — all through Floopy’s gateway.

What We’re Building

User: "What are the key differences between Rust's async runtimes Tokio and async-std?"

Agent:
  Round 1 → calls web_search("Rust Tokio vs async-std comparison 2025")
           ← tool result: [search snippets]
  Round 2 → calls web_search("async-std maintenance status 2025")
           ← tool result: [search snippets]
  Round 3 → synthesizes final answer from gathered information
           → returns response to user

Prerequisites

A Floopy account with the Pro plan
A Brave Search API key (free tier available at brave.com/search/api)
Node.js 18+ for testing

Step 1: Store Your Secret

Never put API keys directly in configuration files. Start by storing your Brave Search API key in Floopy Vault.

Open the Floopy dashboard
Go to Settings > Secrets
Click Add Secret
Name: brave_search_api_key
Value: your Brave Search API key
Click Save

The secret is now encrypted at rest and will be injected at runtime. It will never appear in logs or API responses.

Step 2: Create a Routing Rule

Agentic plugin configurations are attached to routing rules.

Go to Routing in the dashboard
Click New Rule
Name it research-agent
Set the default model to gpt-4o
Leave other settings at defaults for now
Click Save

Step 3: Write the Plugin YAML

The plugin YAML tells Floopy which MCP servers to connect to and how to run the agentic loop.

Create a file called research-agent.yaml:

version: "1"

mcp_servers:
  - id: brave_search
    url: "https://api.search.brave.com/mcp"
    auth:
      type: api_key
      header: "X-Subscription-Token"
      secret_ref: "secret.brave_search_api_key"
    tools:
      - web_search
    timeout_ms: 8000
    max_retries: 2

agent:
  max_rounds: 6
  stream_mode: final_only
  tool_call_parallel: false
  tool_cache_ttl_seconds: 300
  prompt_guard_on_tool_output: true

What each field does:

secret_ref: "secret.brave_search_api_key" — references the secret you stored in Step 1
tools: [web_search] — only expose this one tool from the Brave server
max_rounds: 6 — allow up to 6 tool call iterations
tool_cache_ttl_seconds: 300 — cache identical search queries for 5 minutes
prompt_guard_on_tool_output: true — scan search results for prompt injection

Step 4: Attach the Plugin to Your Routing Rule

Go to Routing > research-agent in the dashboard
Click MCP Plugin
Paste the YAML content
Click Save

Floopy validates the YAML schema and checks that the secret reference exists. If validation fails, the error message will tell you exactly what’s wrong.

Step 5: Test in the Playground

Before writing code, test the setup in the Floopy Playground.

Go to Playground in the dashboard
Select routing rule: research-agent
Enter this message:

What are the main performance differences between PostgreSQL and ClickHouse for analytical workloads?

Click Send

Watch the response stream in. In the Trace panel on the right, you’ll see each tool call logged in real time: the search query sent, the results received, and which round of the loop you’re in.

If you see finish_reason: tool_calls followed by tool results and then a final stop, the loop is working correctly.

Step 6: Call from Your Application

Once tested, call it from your application using the standard OpenAI SDK:

import { OpenAI } from "openai";

const client = new OpenAI({
  baseURL: "https://api.floopy.ai/v1",
  apiKey: process.env.FLOOPY_API_KEY,
  defaultHeaders: {
    "floopy-routing-rule": "research-agent",  // activate your routing rule
  },
});

async function researchQuestion(question: string): Promise<string> {
  const response = await client.chat.completions.create({
    model: "gpt-4o",
    messages: [
      {
        role: "system",
        content:
          "You are a research assistant. Use web search to find accurate, up-to-date information before answering. Always cite your sources.",
      },
      {
        role: "user",
        content: question,
      },
    ],
    stream: true,
  });

  let result = "";
  for await (const chunk of response) {
    const delta = chunk.choices[0]?.delta?.content;
    if (delta) {
      result += delta;
      process.stdout.write(delta);  // stream to console
    }
  }
  return result;
}

const answer = await researchQuestion(
  "What are the key differences between Rust's Tokio and async-std runtimes?"
);

The routing rule activates your MCP plugin. When GPT-4o decides to call web_search, Floopy handles it — your application code never needs to know tool calls are happening.

Step 7: Send the Plugin Inline (Alternative)

If you want to use the agentic loop without a routing rule — useful for development or per-request customization — send the plugin YAML inline via a request header:

import { readFileSync } from "fs";

const pluginYaml = readFileSync("./research-agent.yaml", "utf-8");
const pluginBase64 = Buffer.from(pluginYaml).toString("base64");

const client = new OpenAI({
  baseURL: "https://api.floopy.ai/v1",
  apiKey: process.env.FLOOPY_API_KEY,
  defaultHeaders: {
    "floopy-mcp-plugin": pluginBase64,
  },
});

Step 8: Monitor in Observability

Every agentic session is fully logged. To inspect your sessions:

Go to Observability > Requests in the dashboard
Filter by has_tool_calls: true
Click any request to expand the full trace

For each session you’ll see:

Field	Example
Rounds completed	3
Total tokens	4,821
Tool calls	`web_search` × 2
Tool execution time	1.2s avg
Total latency	8.4s
Cache hits	0 / 2

If a session hit max_rounds, the log shows finish_reason: max_rounds on the last response.

Tuning the Configuration

The agent is searching too many times

Reduce max_rounds or add a more explicit system prompt:

You are a research assistant. Search at most twice before synthesizing your answer.

Tool calls are slow

Lower timeout_ms to fail fast on slow servers. Add max_retries: 1 to retry once before failing.

Identical searches are being repeated

tool_cache_ttl_seconds: 300 caches results for 5 minutes. If the agent sends the same query twice in a session, it hits the cache instead of calling the MCP server again.

You want to see intermediate tool calls streamed

Change stream_mode from final_only to… actually, intermediate streaming is not available yet — only the final response can be streamed. The full trace is always available in Observability.

What to Build Next

With this foundation, you can build more complex agentic systems:

Multi-server agent: Add a code interpreter MCP server alongside web search. The agent can search for solutions and then run code to verify them.

Domain-specific agent: Replace web search with a company knowledge base MCP server. The agent retrieves internal documentation to answer employee questions.

Cost-aware agent: Add estimate_cost calls in your system prompt. The agent can decide whether a question is worth multiple search rounds or should be answered directly.

Full reference docs: MCP Client · MCP Server · MCP Tokens