MCP Builder

Turn an API into tools an agent can actually use.

An MCP server’s quality isn’t its endpoint count — it’s whether an LLM can finish a real task with it. A thin wrapper over every REST route often leaves the agent worse off than no tools at all. MCP Builder is a four-phase guide that designs tools around tasks, types every input and output, and proves the result with an evaluation suite before you ship.

AnthropicTypeScript SDKFastMCPZod / PydanticMCP Inspector

Four phases

Research & plan  →  Implement  →  Review & test  →  Evaluate

Research & plan — study the MCP spec and the target API; decide tool coverage.
Implement — shared API client, typed tools, response formatting, pagination.
Review & test — build, lint, and probe with the MCP Inspector.
Evaluate — 10 realistic questions that prove an LLM can do real work with the server.

Key concept — tools around tasks, not endpoints

The design question is never “what endpoints exist?” but “what will an agent try to do?”

Workflow tools bundle a multi-step task into one call; comprehensive coverage gives the agent room to compose. When uncertain, prefer coverage.
Discoverable names — consistent, action-oriented prefixes like github_create_issue, github_list_repos.
Actionable errors — every error names a likely cause and a next step, so the agent can recover instead of stalling.
Focused results — concise descriptions, filtering and pagination, so a tool call doesn’t flood the context window.

Worked example — a GitHub server

Plan tools around tasks: github_create_issue, github_list_repos, github_search_code.
Type the input with Zod (TS) or Pydantic (Python) — constraints, descriptions, an example per field.
Annotate each tool: readOnlyHint, destructiveHint, idempotentHint, openWorldHint.
Test with npx @modelcontextprotocol/inspector.
Evaluate — write 10 read-only questions that each need several tool calls, solve them yourself, and store the answers for verification.

Under the hood

Per-tool checklist

Element	What to provide
Input schema	Zod / Pydantic, with constraints and field-level examples
Output schema	Define `outputSchema`; return `structuredContent` (TS SDK)
Description	Concise summary, parameter docs, return shape
Implementation	Async I/O, pagination, actionable error messages
Annotations	`readOnlyHint` · `destructiveHint` · `idempotentHint` · `openWorldHint`

Recommended stack

Language — TypeScript: strong SDK, good in execution environments like MCPB, and models generate well-typed, lintable TS reliably. Python via FastMCP is fully supported too.
Transport — streamable HTTP with stateless JSON for remote servers (simpler to scale than stateful sessions); stdio for local servers.
Reference files ship with the skill: mcp_best_practices.md, node_mcp_server.md, python_mcp_server.md, evaluation.md — loaded only as needed.

Evaluation format

Ten questions, each independent · read-only · complex · realistic · verifiable · stable — a single answer that string-compares cleanly and won’t drift over time.

<evaluation>
  <qa_pair>
    <question>Find discussions about AI model launches with animal codenames. One needed an ASL-X safety designation. What number X was set for the model named after a spotted wild cat?</question>
    <answer>3</answer>
  </qa_pair>
</evaluation>