MCP Builder
Turn an API into tools an agent can actually use.
An MCP server’s quality isn’t its endpoint count — it’s whether an LLM can finish a real task with it. A thin wrapper over every REST route often leaves the agent worse off than no tools at all. MCP Builder is a four-phase guide that designs tools around tasks, types every input and output, and proves the result with an evaluation suite before you ship.
Four phases
Research & plan → Implement → Review & test → Evaluate- Research & plan — study the MCP spec and the target API; decide tool coverage.
- Implement — shared API client, typed tools, response formatting, pagination.
- Review & test — build, lint, and probe with the MCP Inspector.
- Evaluate — 10 realistic questions that prove an LLM can do real work with the server.
Key concept — tools around tasks, not endpoints
The design question is never “what endpoints exist?” but “what will an agent try to do?”
- Workflow tools bundle a multi-step task into one call; comprehensive coverage gives the agent room to compose. When uncertain, prefer coverage.
- Discoverable names — consistent, action-oriented prefixes like
github_create_issue,github_list_repos. - Actionable errors — every error names a likely cause and a next step, so the agent can recover instead of stalling.
- Focused results — concise descriptions, filtering and pagination, so a tool call doesn’t flood the context window.
Worked example — a GitHub server
- Plan tools around tasks:
github_create_issue,github_list_repos,github_search_code. - Type the input with Zod (TS) or Pydantic (Python) — constraints, descriptions, an example per field.
- Annotate each tool:
readOnlyHint,destructiveHint,idempotentHint,openWorldHint. - Test with
npx @modelcontextprotocol/inspector. - Evaluate — write 10 read-only questions that each need several tool calls, solve them yourself, and store the answers for verification.
Under the hood
Per-tool checklist
| Element | What to provide |
|---|---|
| Input schema | Zod / Pydantic, with constraints and field-level examples |
| Output schema | Define outputSchema; return structuredContent (TS SDK) |
| Description | Concise summary, parameter docs, return shape |
| Implementation | Async I/O, pagination, actionable error messages |
| Annotations | readOnlyHint · destructiveHint · idempotentHint · openWorldHint |
Recommended stack
- Language — TypeScript: strong SDK, good in execution environments like MCPB, and models generate well-typed, lintable TS reliably. Python via FastMCP is fully supported too.
- Transport — streamable HTTP with stateless JSON for remote servers (simpler to scale than stateful sessions); stdio for local servers.
- Reference files ship with the skill:
mcp_best_practices.md,node_mcp_server.md,python_mcp_server.md,evaluation.md— loaded only as needed.
Evaluation format
Ten questions, each independent · read-only · complex · realistic · verifiable · stable — a single answer that string-compares cleanly and won’t drift over time.
<evaluation> <qa_pair> <question>Find discussions about AI model launches with animal codenames. One needed an ASL-X safety designation. What number X was set for the model named after a spotted wild cat?</question> <answer>3</answer> </qa_pair></evaluation>Read next
Anthropic Skills reference The full official catalogue.
Agent Runtime The harness that loads and runs agents and their tools.
Back to Skills The section overview.