Embedded MCP Servers in Production: AI Agents Inside Your TypeScript Backend

TL;DR

I’ve shipped two production MCP servers in TypeScript using the @modelcontextprotocol/sdk:

Heartbeat PIM — an embedded MCP server, running inside the same Express backend that serves the catalog’s REST API. A Claude agent connects over stdio and makes direct tool calls against the product database. No HTTP overhead, no public surface.
Rorsa Tools — a standalone MCP server, distributed as a Claude Code plugin. The same TypeScript codebase doubles as a CLI binary so developers can use the toolkit from the terminal too.

The “embedded vs standalone” decision determines almost everything else about the build. This article unpacks both shapes, the actual architecture, the trade-offs, and when to pick which.

What an MCP server actually is

The Model Context Protocol is a JSON-RPC-over-stdio (or HTTP / SSE) protocol Anthropic publishes that lets an AI agent (Claude, Claude Code, Continue, etc.) call tools you define and read resources you expose. From the agent’s point of view, your tools look indistinguishable from the built-in ones.

A minimal TypeScript MCP server looks like this:

import { McpServer } from '@modelcontextprotocol/sdk/server/mcp.js';
import { StdioServerTransport } from '@modelcontextprotocol/sdk/server/stdio.js';

const server = new McpServer({ name: 'pim', version: '1.0.0' });

server.tool(
  'get_product',
  { sku: { type: 'string' } },
  async ({ sku }) => {
    const product = await db.products.findOne({ where: { sku } });
    return { content: [{ type: 'text', text: JSON.stringify(product) }] };
  }
);

await server.connect(new StdioServerTransport());

That’s the whole “API.” A Claude agent on the other end gets a tool called get_product it can invoke with a SKU.

The standalone shape: Rorsa Tools

Rorsa Tools is a published MCP toolkit (and Claude Code plugin) that exposes 14 image-generation, editing, and processing tools to AI agents. It’s the toolkit that generated every pixel-art image on this very portfolio.

The architecture is:

CLI binary (vertex-img) — a Commander-driven CLI for developers who want to run the same tools from the terminal
MCP server (vertex-image-mcp) — exposes the same 14 tools to Claude agents
One codebase, two interfaces — the central architectural decision

Tool surface examples:

generate_image — Vertex AI Nano Banana or Gemini Flash, depending on text-density heuristic
remove_background — runs against a Python sidecar process (RMBG-2.0 model) because PyTorch only
generate_responsive_sets — Sharp pipeline, produces 640w/960w/1280w/1920w from a source
compress_images, convert_image_formats, resize_images, get_image_metadata — Sharp again
generate_favicon — png-to-ico for the full favicon bundle

A Claude Code agent can chain generate_image → remove_background → generate_responsive_sets → compress_images in a single conversation, no script needed.

Standalone is the right shape when:

The tools are useful across multiple projects
You want versioned distribution (Claude Code plugin manifest, npm package)
The tool surface is portable and doesn’t need your private data

The embedded shape: Heartbeat PIM

The second MCP shipment lives inside the Heartbeat product information management backend. The PIM is an Express 5 + TypeORM service that:

Hosts the REST API the React admin uses for catalog browsing, editing, and order ops
Also runs an @modelcontextprotocol/sdk server in the same Node process, talking to the same Postgres database

The Claude agent connects over stdio (local IPC, no HTTP) and can:

get_product(sku) — read product details
update_description(sku, description) — rewrite a description after AI enrichment
enrich_batch(skus, strategy) — push a list of SKUs through the enrichment pipeline

The point is that the agent has the same data view as the REST API. There’s no separate microservice, no public network egress, no API key to rotate.

Shared state, shared transactions

The hard part of an embedded MCP server is transaction scoping. Both the REST handlers and the MCP tools use the same TypeORM data source. If a tool starts a transaction to update a product, and a REST request comes in mid-call, both must be safe.

Two rules made it tractable:

Read-only tools get a read-only data source. get_product, list_products, search_catalog all use a Postgres role that has no write privileges. The agent literally cannot mutate state through these.
Write tools take explicit transactions. update_description wraps its work in a dataSource.transaction(async (tx) => {...}) block. If the REST layer is doing a concurrent update, Postgres serializes the writes; if a write fails, the transaction rolls back; the MCP response surfaces the error to the agent.

Why not just hit the REST API?

This is the question I get asked most. Why doesn’t the Claude agent just call the existing REST endpoints? Three reasons:

No HTTP overhead. Stdio is local IPC. A 1ms call replaces a 30ms HTTP round-trip.
No public surface. The MCP server isn’t network-exposed. There’s no port to firewall, no API key to manage, no rate-limit to tune.
Tool semantics > endpoint semantics. A REST endpoint is shaped around HTTP verbs and resource paths. An MCP tool is shaped around what the AI agent needs to do. enrich_batch(skus, strategy: 'descriptions' | 'keywords' | 'all') is a verb. POST /api/products/bulk-enrich is a route. They’re not the same level of abstraction.

Embedded is the right shape when:

The data is private and shouldn’t leave the backend’s process boundary
The agent needs a high-frequency tool surface (think enrichment over thousands of products)
The tools are coupled to your business data model in a way that wouldn’t make sense as a generic toolkit

Two production decisions worth stealing

1. Don’t pass JSON blobs as tool args

The temptation is to make a generic query_product(filter: object) tool that takes an arbitrary filter. Don’t. Define narrow, named tools — find_by_sku, find_by_vendor, list_low_stock — each with explicit args. Claude is dramatically better at picking the right tool than at constructing the right filter blob.

2. Return text content, not raw JSON

Tools return content via { content: [{ type: 'text', text: '...' }] }. The agent reads this text. If you return raw JSON, the agent has to parse it. If you return a one-paragraph summary plus the JSON, it can both read the summary fast and parse the detail if needed:

return {
  content: [
    { type: 'text', text: `Product ${product.sku}: ${product.name} — ${product.stock} in stock, last updated ${product.updatedAt}` },
    { type: 'text', text: JSON.stringify(product, null, 2) },
  ],
};

This is the single biggest tool-quality win I’ve measured. The first text block is a token-cheap summary; the second is the raw data the agent can fall back on.

When standalone, when embedded

	Standalone (Rorsa Tools shape)	Embedded (Heartbeat PIM shape)
Distribution	npm / Claude Code plugin	None — internal to the host backend
Data	Public / no secrets	Private — your business data
Tool design	Generic, reusable	Tied to your domain model
Transport	stdio + optional HTTP	stdio only
Auth	Sometimes a token	Process boundary IS the auth
Cost to ship	Versioned package + docs	Just another module in your backend

If you’re building agentic features for your own product, embed. If you’re shipping tools that other teams should use, standalone.

Heartbeat Pharmacy Platform — full case study for the embedded MCP server build
Rorsa Tools — full case study for the standalone MCP toolkit