Open Source Maintained 2025–present

Rorsa Tools

Maintainer

An MCP toolkit that lets Claude (and other AI agents) generate, edit, and process images at production scale. 14 tools, batch processing, GPU-accelerated background removal. The toolkit that built this very portfolio's image set.

Stack

TypeScript
Node.js 20+
Vertex AI Nano Banana
Gemini SDK
MCP SDK 1.29
Sharp + png-to-ico
Python + RMBG-2.0
Vitest

The system

Rorsa Tools (repo currently named vertex-img, rebrand in flight) is the published MCP toolkit that lets AI agents (Claude, primarily) generate, edit, and process images at production scale. It exposes 14 tools through both an MCP server (so a Claude agent can call them directly) and a CLI binary (so a developer can use the same toolkit from the terminal). Image generation runs on Vertex AI Nano Banana for high-quality and text-heavy work, with Gemini Flash as a faster fallback for thumbnails and simple edits, and smart model selection routes work to the right backend automatically. Image processing uses Sharp for resize, compression, and format conversion. Background removal runs on a Python sidecar process with the RMBG-2.0 model, GPU-accelerated locally. 30 commits, 18 test files. The toolkit generated every pixel-art image on this very portfolio.

Architecture

CLI binary: vertex-img (in the repo) / rorsa-img (in the rebrand): Commander-driven
MCP server: vertex-image-mcp binary, exposes 14 tools to AI agents
Image generation: @google/genai 1.50, Vertex AI Nano Banana (pro), Gemini fallback (flash) with smart model selection
Image processing: Sharp 0.34.5 for resize/compress/format conversion, png-to-ico for favicon bundles
Background removal: Python sidecar worker (scripts/bg-removal/worker.py) with RMBG-2.0 (HuggingFace gated model, CC-BY-NC-4.0)
Batch concurrency: Per-item error recovery, retry logic, async batch runner
Distribution: GitHub source + Claude Code plugin manifest (plugin.json)
Tests: 18 test files with Vitest

My contribution

I designed and built the entire toolkit as sole maintainer. The central architectural decision was exposing the same tool surface as both a CLI binary and an MCP server: one codebase, two interfaces. A Claude Code agent calls generate_image directly as an MCP tool; a developer runs the same command from the terminal. The 14-tool MCP surface was designed for composability: generate_image → remove_background → generate_responsive_sets → compress_images is a natural pipeline a Claude agent can execute autonomously without needing me to script it. The Python sidecar for background removal was a forced choice (the RMBG-2.0 model only runs in Python + PyTorch), so I wrote a JSON-over-stdio IPC protocol between the Node.js MCP server and a long-running Python worker, with process health monitoring and automatic respawn if the worker crashes from an OOM on a large image. The batch runner has per-item error recovery, so a single failed image does not kill the whole batch.

Stack details

The smart model selection logic is worth explaining: Vertex AI Nano Banana (the pro model) costs more tokens but gives better quality for text rendering and high-resolution output. Gemini Flash is faster and cheaper for simple edits or thumbnails. The logic: if the image is > 1024px or the prompt contains text requirements, use pro, otherwise flash. The Python sidecar architecture: the Node.js worker spawns a Python process that loads RMBG-2.0 from the HuggingFace cache, communicating via a stdin/stdout JSON protocol. The RMBG-2.0 license (CC-BY-NC-4.0) means non-commercial only, documented in the README. The 18 Vitest test files cover model selection logic, batch runner behaviour, and CLI argument parsing.

Outcomes

14 MCP tools made directly available to Claude Code agents
Production-grade batch processing with retry logic + per-item error recovery
GPU-accelerated background removal via Python sidecar (RMBG-2.0)
Combined Vertex AI Nano Banana + Gemini with smart model selection

The challenge

The most complex technical challenge was the Python sidecar IPC for GPU background removal. The RMBG-2.0 model runs in Python + PyTorch, there is no Node.js binding. So the MCP server (Node.js) has to communicate with a Python process. The solution: spawning a persistent Python worker process at MCP server startup, with a JSON-over-stdio protocol for requests and responses. Each remove_background call sends {"image_path": "...", "output_path": "..."} to the Python process’s stdin, which responds with {"success": true} or {"error": "..."}. The tricky part: if the Python process crashes (e.g. OOM from a large image), the MCP server must detect this and respawn, without crashing itself. I implemented process health monitoring with automatic respawn and graceful degradation (returns a meaningful error instead of an unhandled rejection).

Gallery