Short answer: To build an MCP server, you wrap a system or data source in the Model Context Protocol so any compliant AI agent can call it. The work is the same six steps every time: pick the tools, resources, and prompts the server will expose; scaffold a server with the official SDK; implement each tool as a typed function with a clear schema; choose a transport (stdio for local, Streamable HTTP for remote); add authentication and authorization if it is remote; then test it with a real client before you ship. The protocol itself is small and standardized — MCP crossed 97 million monthly SDK downloads by March 2026 and is now governed by the Linux Foundation with OpenAI, Google, and Microsoft as co-sponsors. The hard part is not the protocol. It is everything around it: scoping the tools so an agent can actually use them, securing a remote server correctly, and proving the thing works under load before you connect it to a customer-facing agent.
Exposing one tool over stdio is a genuinely small task — the official quickstart gets you there in an afternoon. Exposing a server that real agents hit thousands of times a day, that handles auth, that fails safely, and that you can debug when a tool starts returning garbage, is an engineering project. Below is what an MCP server actually is, how to build one step by step, how to choose a transport, how to secure a remote server, and the mistakes that keep MCP servers stuck in a demo.
What an MCP server actually is
An MCP server is a program that exposes capabilities to AI agents through a single standard interface. Before MCP, every team that wanted an agent to use a tool wrote a custom, one-off integration; MCP replaces that N-by-M mess with one protocol, which is exactly why it spread so fast. The same agent that talks to your server can talk to any of the roughly 9,650 servers in the official registry, and any of the ~15,900 GitHub repositories carrying the topic as of May 2026. For the conceptual background — why the protocol exists and what problem it solves — see our explainer on what the Model Context Protocol is. This guide is the build.
A server exposes three kinds of capability, and getting the distinction right is most of the design work:
A useful mental model: tools are verbs, resources are nouns, prompts are recipes. Most servers lean heavily on tools, add a few resources, and use prompts sparingly. The mistake beginners make is dumping every API endpoint they have into tools and calling it done. A good MCP server is curated, not exhaustive.
Before you write code: scope the server
The single biggest determinant of whether an MCP server is useful is the quality of its tool design, and that is a decision you make before writing a line of code. An agent does not read your documentation. It reads your tool names, descriptions, and parameter schemas, and it decides — in one shot, with no human in the loop — which tool to call and what to pass. Vague names and fuzzy schemas produce an agent that calls the wrong tool or hallucinates arguments.
Three questions to answer first. What does the agent actually need to do? Start from the agent's job, not your API surface. A support agent needs , , and — not a one-to-one mirror of forty REST endpoints. Which tools write, and which only read? Write actions (refunds, deletes, sends) need different guardrails than reads, and you want them clearly separated so you can require confirmation or stricter auth on the destructive ones. What is the smallest set that delivers value? A server with six well-named, well-described tools beats one with sixty that the agent cannot reliably choose between. This is harness engineering applied to the server side: the action space you expose is part of the agent's reasoning, not a neutral pipe.
How to build an MCP server, step by step
With the scope settled, the build follows a predictable path. The official SDKs (Python and TypeScript are the most mature) handle the protocol plumbing, so most of your code is the actual logic.
- Scaffold the server with the official SDK. Install the SDK for your language, create a server instance, and give it a name and version. The SDK manages the JSON-RPC message handling, capability negotiation, and lifecycle so you do not implement the wire protocol yourself.
- Define each tool with a strict schema. For every tool, declare a precise name, a one-sentence description written for the model (not for a human reader), and a typed input schema. The schema is your contract: it is what stops the agent from passing a string where you need an integer, and what makes the tool self-documenting to the client.
- Implement the tool logic. Inside each tool handler, do the real work — call your database, hit your internal API, run the computation — and return a structured result. Keep handlers thin: validate inputs, call your existing service layer, format the output. Do not bury business logic the rest of your system also needs inside an MCP handler.
- Add resources and prompts where they help. Expose read-only context as resources (a customer record, a policy document) and package any repeatable multi-step task as a prompt. Skip these if your server is purely action-oriented; not every server needs all three primitives.
- Handle errors as data, not crashes. When a tool fails — bad input, a downstream timeout, a permission problem — return a clear, structured error the agent can read and recover from, rather than throwing an unhandled exception that kills the connection. Agents reason over error messages; a good one ("order not found — check the order ID") lets the agent self-correct, while a stack trace does not.
- Choose and wire up a transport. Decide whether the server runs locally next to the client (stdio) or as a networked service many clients reach (Streamable HTTP). This choice drives everything about deployment and security, so it gets its own section below.
That is a working MCP server. What separates it from a production one is the next three sections: the transport decision, the security layer, and the testing.
stdio vs Streamable HTTP: choosing a transport
MCP defines two transports, and the choice is not cosmetic — it determines your security model, your deployment, and how far the server can scale.
stdio runs the server as a local subprocess of the client, communicating over standard input and output. There is no network and no authentication between client and server, because the client launches the server directly and passes secrets through environment variables. This is the right choice for personal tooling, developer utilities, and anything that runs on the same machine as the agent. It is simple and fast, and it does not scale beyond one user, because there is no network for a second user to connect over.
Streamable HTTP, introduced in the 2025-06-18 revision of the spec, runs the server as a networked service that many clients reach over HTTP. It replaced the older HTTP+SSE transport and is now the default for anything that is not purely local. A server using this transport is a remote MCP server, and the ecosystem is clearly converging on it — remote, network-reachable servers are what real products ship. The cost of that reach is that you now own an authentication and authorization problem you did not have with stdio.
The decision rule is simple. Building a tool for yourself or a single developer machine? Use stdio. Building a server that a product, a team, or many agents will call over a network? Build it on Streamable HTTP from the start, and treat the security work below as part of the build, not a later phase. Retrofitting auth onto a server that assumed a trusted local caller is more painful than designing for it on day one.
Securing a remote MCP server
The moment your server is reachable over a network, it is an attack surface, and the spec is specific about how to defend it. The 2025-06-18 revision made OAuth 2.1 the basis of MCP authorization. A correctly secured remote MCP server mandates PKCE for all clients, uses RFC 9728 Protected Resource Metadata so clients can discover how to authenticate, uses RFC 8414 for authorization-server metadata, and uses RFC 8707 resource indicators so that an issued token is bound to your specific server and cannot be replayed against another.
The practical guidance from the security community is consistent: implement OAuth 2.1 with resource indicators from day one rather than deferring authorization to "later." The attack surface is well understood, and the spec hands you a clear implementation path through dynamic client registration and PKCE. Deferring it means rebuilding the server's trust model after it is already deployed.
Authentication is only half the job. An MCP server is, by design, a way to let a language model take real actions in your systems — which means it inherits the entire prompt injection problem. If an agent calling your server can be manipulated by malicious content into invoking your tool, the cleanest OAuth setup in the world will not save you. Production servers scope tokens narrowly, separate read tools from write tools, require explicit confirmation or stricter checks on destructive actions, and log every tool call so a bad sequence can be traced and stopped. Security here is not a single OAuth flow; it is the whole chain from token to tool to side effect.
Testing and evaluating your MCP server
A server that returns the right answer when you call it by hand is not tested — it is demoed. The reason is that the consumer is not a human running clean inputs; it is a non-deterministic model that will call your tools in orders you did not anticipate, with arguments you did not expect, in response to inputs you never wrote.
Test at three levels. First, unit-test each tool handler the ordinary way: given an input, does it return the right structured output and the right structured error on failure? Second, test the server against a real MCP client — connect it to an actual agent and watch whether the agent can discover your tools, pick the right one, and pass valid arguments based only on your names and descriptions. This is where vague tool design gets exposed: if the agent keeps choosing the wrong tool, the fix is your description, not the model. Third, run evals on the end-to-end behavior — a graded set of realistic tasks the agent must complete using your server — so you catch regressions when you add a tool or change a schema. Pair that with observability: trace every tool call, its arguments, its latency, and its outcome, because when a server starts failing in production it fails quietly, one bad tool call at a time, exactly the way agent systems tend to.
Common mistakes that keep MCP servers out of production
The failures repeat across teams, and almost none of them are about the protocol.
How Moai Team approaches this
We treat an MCP server as part of an agent's reasoning surface, not as a thin wrapper over an API. That means we start from the agent's job and design a small, sharply described set of tools, separating reads from writes so the destructive actions can carry their own guardrails. For remote servers we build on Streamable HTTP with OAuth 2.1, PKCE, and resource-bound tokens from the first commit, because retrofitting authorization is more expensive than designing it in. We handle every tool failure as structured data the agent can recover from, scope tokens and confirmations around the prompt-injection reality that any tool-calling system inherits, and we do not consider a server done until it has evals and tracing around it. The goal is not a server that works when we call it by hand. It is one that an agent can use correctly, thousands of times a day, without quietly going wrong.
Frequently Asked Questions
What is an MCP server?
An MCP server is a program that exposes capabilities to AI agents through the Model Context Protocol, a single standard interface that replaced the custom, one-off integrations teams used to write for every tool. A server can expose three things: tools (functions an agent calls to take action), resources (file-like data an agent can read), and prompts (reusable task templates). Because the protocol is standardized — and now governed by the Linux Foundation with cross-vendor support from Anthropic, OpenAI, Google, Microsoft, and others — any compliant agent can use any compliant server.
How do I build an MCP server?
Use the official SDK (Python or TypeScript are the most mature) and follow six steps: scope the tools, resources, and prompts the agent actually needs; scaffold a server with the SDK; define each tool with a strict typed schema and a model-facing description; implement the tool logic against your existing services; handle errors as structured data rather than crashes; and choose a transport — stdio for local use, Streamable HTTP for remote. The SDK handles the protocol plumbing, so most of your code is the actual tool logic and the design decisions around it.
What is the difference between stdio and Streamable HTTP transports?
stdio runs the server as a local subprocess of the client with no network and no authentication — ideal for personal tools and developer utilities on one machine, but it does not scale to multiple users. Streamable HTTP, introduced in the 2025-06-18 spec, runs the server as a networked service many clients can reach, and it is the default for anything that ships as a product. The trade-off is that a remote server requires an authentication and authorization layer (OAuth 2.1 with PKCE and resource indicators) that a local stdio server does not.
How do I secure a remote MCP server?
Build OAuth 2.1 in from day one. The 2025-06-18 MCP spec mandates PKCE for all clients, RFC 9728 Protected Resource Metadata for discovery, RFC 8414 authorization-server metadata, and RFC 8707 resource indicators so tokens are bound to your specific server. Beyond authentication, treat the server as a tool-calling surface that inherits the prompt-injection problem: scope tokens narrowly, separate read tools from write tools, require confirmation on destructive actions, and log every tool call so a malicious or malformed sequence can be traced and stopped.
Moai Team builds MCP servers and the agents that use them the honest way — curated tools designed for how a model actually reasons, OAuth 2.1 and least-privilege security wired in from the first commit, and evals and observability around the whole thing so it holds up in production, not just in a demo. Schedule a call.