gpt-image-2-mcp

An MCP server that exposes OpenAI's gpt-image-2 (released 2026-04-21) to any MCP client — Claude Desktop, Claude Code, Cursor, MCP Inspector, etc.

Six tools:

Tool	What it does
`generate_image`	text → image
`edit_image`	1–8 reference images (+ optional mask) → image
`start_edit_session`	begin an iterative multi-turn edit
`continue_edit_session`	apply another refinement turn — previous output becomes the new input
`end_edit_session`	release a session
`list_edit_sessions`	show active sessions

Every generated image is saved to disk and returned inline so the calling model sees it.

Requirements

Node.js ≥ 20
An OpenAI API key on an org with gpt-image-2 access (Organization Verification may be required)

Install

pnpm install
pnpm run build

This produces build/index.js, which is the server entry point.

Configure a client

Claude Desktop

Edit ~/Library/Application Support/Claude/claude_desktop_config.json (macOS) or %APPDATA%\Claude\claude_desktop_config.json (Windows):

{
  "mcpServers": {
    "gpt-image-2": {
      "command": "node",
      "args": ["/absolute/path/to/gpt_image_2_mcp/build/index.js"],
      "env": {
        "OPENAI_API_KEY": "sk-..."
      }
    }
  }
}

Claude Code

Either add to ~/.claude.json under mcpServers with the same shape, or drop an .mcp.json next to your project:

{
  "mcpServers": {
    "gpt-image-2": {
      "command": "node",
      "args": ["/absolute/path/to/gpt_image_2_mcp/build/index.js"],
      "env": { "OPENAI_API_KEY": "sk-..." }
    }
  }
}

MCP Inspector (interactive testing)

pnpm run inspect

Launches the official inspector UI pointed at your local build.

Environment variables

Var	Required	Purpose
`OPENAI_API_KEY`	✅	Auth
`OPENAI_BASE_URL`		Override for proxies / enterprise routes
`OPENAI_ORG_ID`		Forwarded as `organization`
`OPENAI_PROJECT_ID`		Forwarded as `project`
`GPT_IMAGE_2_OUTPUT_DIR`		Global default for where images are saved. Absolute paths used as-is, relative resolved from CWD.
`GPT_IMAGE_2_MCP_DEBUG`		Set to `1` to emit verbose debug logs on stderr.
`GPT_IMAGE_2_SESSION_MAX`		Max concurrent in-memory edit sessions, LRU-evicted beyond this (default 20; `0` = no cap).
`GPT_IMAGE_2_SESSION_TTL_MS`		Idle TTL before an edit session is swept (default 3600000 = 1h; `0` = never expire).
`OPENAI_FORCE_RESPONSES_EDITS`		Set to `1` to pin edits to the Responses-API fallback route instead of `/v1/images/edits`. See Edit routing below.
`OPENAI_RESPONSES_EDIT_MODEL`		Host model used by the Responses-API fallback edit route (default `gpt-4.1-mini`). See Edit routing below.

Where images go

Unless overridden, each tool writes to:

<OS config dir>/gpt-image-2-mcp/output/<project-name>-<hash>/

macOS/Linux: ~/.config/gpt-image-2-mcp/output/<project>-<hash>/
Windows: %APPDATA%\gpt-image-2-mcp\output\<project>-<hash>\

<project>-<hash> is derived from the git root (if any) or the current working directory — each project gets its own folder so generations don't collide.

Per-call override: pass output_dir: "/some/path" to any tool.

Filenames look like image-20260422-150301-a1b2c3.png. If you pass filename_prefix: "hero-banner", it becomes image-20260422-150301-a1b2c3-hero-banner.png.

What the tools return

Every tool result contains:

An inline ImageContent block per generated image (so the LLM sees the image)
A text summary: applied settings, file path, token usage, estimated cost
structuredContent for programmatic consumers:

{
  "model": "gpt-image-2",
  "prompt": "…",
  "requested": { "size": "auto", "quality": "auto", "n": 1, "format": "png" },
  "applied":   { "size": "1024x1024", "quality": "high", "background": "opaque", "output_format": "png" },
  "images": [ { "file_path": "…", "filename": "…", "size_bytes": 123456, "mime_type": "image/png" } ],
  "usage":   { "input_tokens": …, "output_tokens": …, "total_tokens": …, "input_tokens_details": { … } },
  "cost_usd_estimated": 0.2112
}

Session tools additionally return session_id and turn.

Sizes

Default is auto (the model picks). You can pass:

A preset: 1024x1024, 1536x1024, 1024x1536
Any custom WxH where:
- Both edges are multiples of 16
- Max edge ≤ 3840px (outputs above 2K are beta)
- Aspect ratio within 1:3 and 3:1
- Total pixels between 655,360 and 8,294,400

Invalid sizes fail before the API call with a clear error — no wasted requests.

background: "transparent" is NOT supported by gpt-image-2. Use a model that supports it if you need alpha.

Iterative editing example

start_edit_session    prompt: "A coastal lighthouse at dawn, photorealistic", images: ["./sketch.png"]
  → session_id: edit-1761149123-a1b2c3d4, turn 1, saved to …/session-…-turn1.png

continue_edit_session session_id: "edit-…-a1b2c3d4", prompt: "Make the sky more orange. Keep everything else the same."
  → turn 2

continue_edit_session session_id: "edit-…-a1b2c3d4", prompt: "Add a small boat on the horizon."
  → turn 3

end_edit_session      session_id: "edit-…-a1b2c3d4"

Sessions are in-memory only and discarded on server restart — this is intentional (keeps the server stateless on the wire) and mirrors the Gemini MCP pattern.

Image inputs for `edit_image` and `start_edit_session`

Accepts any mix of:

Absolute path: /Users/me/photo.png
Relative path: ./photo.png (resolved from CWD)
file:///Users/me/photo.png
https://example.com/photo.png (downloaded, size-capped)
data:image/png;base64,iVBOR…

Up to 8 images per call. Each ≤ 50MB. PNG/WEBP/JPG supported.

Cost guardrails

The server ships no hard spending limits — you should watch your OpenAI usage dashboard. Each tool result includes an estimated cost in USD computed from the token usage returned by the API, plus an approximate pre-flight estimate logged to stderr.

Rough per-image cost at common sizes:

Quality	1024×1024	1024×1536 / 1536×1024
low	~$0.006	~$0.005
medium	~$0.053	~$0.041
high	~$0.211	~$0.165

Custom sizes scale with pixel count. Edit calls additionally tokenize input images at high fidelity — large reference images are expensive.

Edit routing

edit_image, start_edit_session, and continue_edit_session call POST /v1/images/edits directly. This is the canonical endpoint: it supports n > 1, masks, and returns accurate per-call token usage for cost estimation.

History: at launch (2026-04-21) the endpoint rejected gpt-image-2 (and gpt-image-1.5) with 400 Invalid value: 'gpt-image-2'. Value must be 'dall-e-2'. — an OpenAI-side bug. Versions ≤ 0.2.0 of this server therefore routed edits through the Responses API by default. OpenAI fixed the endpoint silently in early May 2026 (verified live 2026-06-11), and since 0.3.0 the direct endpoint is the default again.

The Responses-API workaround is kept as a fallback (src/utils/edit-via-responses.ts):

It engages automatically if the direct endpoint ever returns the launch-era 400 again (matched narrowly; the rejection is remembered for 10 minutes so only the first call in that window pays the failed attempt, then the direct endpoint is re-probed).
Set OPENAI_FORCE_RESPONSES_EDITS=1 to pin it explicitly.
The legacy OPENAI_USE_DIRECT_EDITS toggle from 0.2.0 is deprecated and ignored (its only meaningful setting was 1 — opt into the direct endpoint, which is now the default).

Fallback mechanics: input images are uploaded via the Files API (purpose: "vision"), a cheap host model (default gpt-4.1-mini, override with OPENAI_RESPONSES_EDIT_MODEL) is forced to invoke the image_generation tool, the base64 result is extracted, and uploaded files are deleted afterwards.

Fallback trade-offs versus the direct endpoint (only apply when the fallback is active — the tool result carries route: "responses" and a note when they do):

n > 1 is not supported — the Responses path returns one image per call.
Cost accounting undercounts — usage only reports the host chat model's text tokens; the image tool is billed separately (~$0.04–0.05 extra for a 1024×1536 medium edit).
Masks still work — uploaded and referenced via input_image_mask.file_id.

Troubleshooting

"OPENAI_API_KEY is not set" — add it to the env block of your MCP config.
403 / organization verification — gpt-image-2 may require Organization Verification on your OpenAI org. Check the dashboard.
429 — you hit the IPM (images per minute) cap for your tier. Lower n, or wait.
Image doesn't appear in the client — check the file path in the text block; the image is saved regardless of inline display.
Protocol disconnects silently — something printed to stdout. Check src/**/*.ts — all logs must use utils/logger.ts (stderr). This is the single biggest MCP footgun.

Development

pnpm run dev         # tsx watch
pnpm run typecheck   # tsc --noEmit
pnpm run build       # compile to build/
pnpm run inspect     # launch MCP Inspector

License

MIT