omni-ai-mcp

The complete AI bridge for Claude Code — Gemini's exclusive capabilities (video, TTS, 1M context, RAG, Deep Research) plus 400+ models via OpenRouter. One MCP server, every AI model, zero friction.

Why This Exists

Claude is exceptional at reasoning and code generation. But sometimes you need more:

A second opinion from a different AI model (GPT-4o, Llama, Mistral, Claude via OpenRouter)
Real-time web search with Google grounding and source citations
Autonomous deep research that runs for minutes and produces structured reports from 40+ sources
Video generation with Veo 3.1 — the only MCP server with native audio video generation
Image generation with Gemini 3 Pro Image (Nano Banana Pro) up to 4K resolution
Text-to-speech with 30 natural voices and multi-speaker support
RAG for querying your own documents with citations
Large codebase analysis with Gemini's 1M token context window
Multi-turn conversations with cloud persistence (55-day retention, resume from any device)
Access to 400+ models through one unified interface

omni-ai-mcp bridges Claude Code with Google Gemini and OpenRouter, enabling Claude to orchestrate any AI model as a tool.

What's New in v4.4.0

All model defaults are now aligned with the latest Gemini IDs, verified live against the Gemini API:

Text Flash → gemini-3.5-flash · Flash-Lite → gemini-3.1-flash-lite
Image → gemini-3-pro-image (Nano Banana Pro) / gemini-3.1-flash-image (Nano Banana 2)
TTS → gemini-3.1-flash-tts-preview
Deep Research → deep-research-preview-04-2026 (fixes the previous 404 NOT_FOUND agent)

Every default stays overridable via the GEMINI_MODEL_* environment variables.

Multi-Provider: Gemini + OpenRouter

Quick start

# Ask any of 400+ models — auto-routes from model name
ask_model("Explain quantum computing", model="openai/gpt-4o")
ask_model("Write a poem", model="meta-llama/llama-3.3-70b-instruct")
ask_model("Review this code", model="gemini-3.1-pro-preview")  # auto-routes to Gemini native API

# If no Gemini key but OpenRouter key exists, Gemini models route via OpenRouter automatically
ask_model("Summarize this", model="gemini-3.5-flash")  # -> google/ prefix on OpenRouter

# Discover all available models
gemini_list_models()

Dynamic Model Registry

No more hardcoded model IDs. The server discovers available models at runtime and always uses the latest. If a model is deprecated, it automatically falls back to the next available option.

# Override via env vars if needed:
export GEMINI_MODEL_PRO=gemini-3.1-pro-preview
export OPENROUTER_DEFAULT_MODEL=openai/gpt-4o

Smart Routing Rules

Explicit Gemini model + GEMINI_API_KEY -> always Gemini native API (fastest, cheapest)
Gemini model + no Gemini key + OPENROUTER_API_KEY -> OpenRouter google/ prefix (automatic fallback)
veo-*, imagen-*, deep-research-* models -> Gemini native only (no OpenRouter equivalent)
OpenRouter model (openai/, meta-llama/, etc.) -> OpenRouter (requires OPENROUTER_API_KEY)

PyPI Distribution

pip install omni-ai-mcp
omni-ai-mcp-setup  # interactive setup wizard

Claude Desktop Extension (.dxt)

Install with one click — no Python setup required:

Download omni-ai-mcp-vX.Y.Z.dxt from GitHub Releases
Double-click the file (macOS/Windows) or drag it into Claude Desktop
Enter your Gemini API key when prompted (OpenRouter key is optional)
Done — all 20 tools are immediately available in Claude Desktop

The .dxt bundle includes all Python dependencies — users don't need to install anything else.

20 Tools

Multi-Provider

Tool	Description
`ask_model`	Ask any AI: Gemini or 400+ models via OpenRouter — auto-routes from model name
`gemini_list_models`	Live model discovery: Gemini registry + OpenRouter catalog, deprecation warnings

Text & Reasoning

Tool	Description	Model
`ask_gemini`	Text generation with thinking mode, multi-turn, dual storage (local/cloud)	Gemini 3.1 Pro
`gemini_code_review`	Security, performance, and quality analysis	Gemini 3.1 Pro
`gemini_brainstorm`	Creative ideation with 6 methodologies (SCAMPER, TRIZ, etc.)	Gemini 3.1 Pro
`gemini_challenge`	Devil's advocate — find flaws in ideas, plans, and code	Gemini 3.1 Pro

Code

Tool	Description	Model
`gemini_analyze_codebase`	Whole-codebase analysis up to 1M tokens / 5MB	Gemini 3.1 Pro
`gemini_generate_code`	Structured code generation with dry-run preview and XML output	Gemini 3.1 Pro

Research & Web

Tool	Description	Model
`gemini_web_search`	Real-time search with Google grounding & citations	Gemini 3 Flash
`gemini_deep_research`	Autonomous 5-60 min research, 40+ sources, structured report	Deep Research Agent

RAG

Tool	Description
`gemini_file_search`	Query documents with citations
`gemini_create_file_store`	Create document stores
`gemini_upload_file`	Upload PDF, DOCX, code, etc.
`gemini_list_file_stores`	List available stores

Media (Gemini exclusive)

Tool	Description	Model
`gemini_analyze_image`	Vision: describe, OCR, Q&A on images	Gemini 3 Flash
`gemini_generate_image`	Imagen — up to 4K resolution	Gemini 3 Pro Image
`gemini_generate_video`	Veo 3.1 — 4-8s with native audio (dialogue, effects, ambient)	Veo 3.1
`gemini_text_to_speech`	30 natural voices, multi-speaker dialogue	Gemini 2.5 Flash TTS

Conversation

Tool	Description
`gemini_list_conversations`	List history: title, mode, turns, last activity
`gemini_delete_conversation`	Delete by ID or title (partial match)

Quick Start

Prerequisites

Python 3.9+
Claude Code CLI
Gemini API key — get one free
(Optional) OpenRouter API key — openrouter.ai for 400+ models

Install from PyPI (Recommended)

pip install omni-ai-mcp
omni-ai-mcp-setup

The setup wizard configures Claude Code automatically.

Install from Source

git clone https://github.com/marmyx77/omni-ai-mcp.git
cd omni-ai-mcp

# Gemini only
./setup.sh YOUR_GEMINI_API_KEY

# Gemini + OpenRouter (400+ models)
./setup.sh YOUR_GEMINI_API_KEY YOUR_OPENROUTER_KEY

Restart Claude Code. Verify:

claude mcp list
# omni-ai-mcp: Connected

Manual Install

pip install 'mcp[cli]>=1.0.0' 'google-genai>=2.0.0' pydantic defusedxml filelock

mkdir -p ~/.claude-mcp-servers/omni-ai-mcp
cp -r app/ run.py pyproject.toml ~/.claude-mcp-servers/omni-ai-mcp/

claude mcp add omni-ai-mcp --scope user \
  -e GEMINI_API_KEY=YOUR_KEY \
  -e OPENROUTER_API_KEY=YOUR_OR_KEY \
  -- python3 ~/.claude-mcp-servers/omni-ai-mcp/run.py

Usage Examples

Multi-Model AI Orchestration

"Ask GPT-4o to review this authentication function"
-> ask_model(model="openai/gpt-4o", prompt="review this auth function...")

"Compare how Gemini and Llama respond to this design question"
-> ask_model(model="gemini-3.1-pro-preview", ...)
-> ask_model(model="meta-llama/llama-3.3-70b-instruct", ...)

"Get a Mistral opinion on this French legal document"
-> ask_model(model="mistralai/mistral-large-2512", ...)

Conversations with Memory

Gemini remembers previous context across calls via continuation_id:

# First turn
"Ask Gemini to analyze @src/auth.py for security issues"
# Returns: continuation_id: abc-123

# Follow-up — Gemini remembers the previous analysis
"Ask Gemini (continuation_id: abc-123) how to fix the SQL injection"

Dual Storage Mode

Mode	Storage	Retention	Use
`local` (default)	SQLite	3h (configurable)	Development, quick chats
`cloud`	Google Interactions API	55 days	Long projects, cross-device

# Start a named cloud conversation
"Ask Gemini (mode=cloud, title='Architecture Review'): Analyze my microservices design"
# Returns: continuation_id: int_v1_abc123...

# Resume from any device within 55 days
"Ask Gemini (continuation_id: int_v1_abc123...): What about the database layer?"

Deep Research

Autonomous research agent that runs 5-60 minutes:

"Deep research: Compare AI agent frameworks in 2025 — LangGraph, AutoGen, CrewAI"

The agent will:

Plan a comprehensive research strategy
Execute multiple targeted web searches
Synthesize findings from 40+ sources
Produce a structured report with citations

Use cases: market research, competitive analysis, technical deep dives, trend analysis, literature reviews.

Codebase Analysis

Leverage Gemini's 1M token context to analyze entire codebases at once:

"Analyze codebase src/**/*.py with focus on security"
"Analyze codebase ['app/', 'tests/'] — find architecture issues"

Analysis types: architecture, security, refactoring, documentation, dependencies, general

@File References

Include file contents directly in prompts:

"Ask Gemini to review @src/auth.py for security issues"
"Brainstorm improvements for @README.md"
"Code review @*.py with focus on performance"
"Analyze codebase @src/**/*.ts"

Supported patterns: @file.py, @src/main.py, @*.py, @src/**/*.ts, @. (directory listing)

Video Generation

"Generate a video of ocean waves at sunset, seagulls flying, sound of waves and wind"

Duration: 4-8 seconds
Resolution: 720p or 1080p (1080p requires 8s)
Native audio: dialogue, sound effects, ambient sounds
For dialogue: use quotes ("Hello," she said)
For sounds: describe explicitly (engine roaring, birds chirping)

Image Generation

"Generate an image of a futuristic Tokyo street at night, neon lights reflecting on wet pavement,
cinematic, shot on 35mm lens"

Resolution: up to 4K with Pro model
Aspect ratios: 1:1, 16:9, 9:16, 3:2, 4:5, and more
Use descriptive sentences, not keyword lists

Text-to-Speech

"Convert this article to speech using the Charon voice (informative, neutral)"

30 available voices — Bright: Zephyr, Autonoe / Upbeat: Puck, Laomedeia / Informative: Charon, Rasalgethi / Warm: Sulafat, Vindemiatrix / and 22 more.

Multi-speaker dialogue:

gemini_text_to_speech(
    text="Host: Welcome!\nGuest: Thanks for having me!",
    speakers=[
        {"name": "Host", "voice": "Charon"},
        {"name": "Guest", "voice": "Aoede"}
    ]
)

Image Analysis

"Analyze this screenshot and extract all visible text: @screenshot.png"
"Describe what's in this diagram and explain the architecture: @diagram.png"

Supported formats: PNG, JPG, JPEG, GIF, WEBP

RAG (Document Search)

# 1. Create a store
"Create a Gemini file store called 'project-docs'"

# 2. Upload files
"Upload the API specification PDF to the project-docs store"

# 3. Query
"Search the project-docs store: What are the rate limits?"

Challenge Tool

Get critical analysis before implementing — find flaws early:

"Challenge this plan with focus on security:
We'll store user sessions in localStorage and use MD5 for passwords"

The tool acts as a Devil's Advocate — it will NOT agree with you.
Focus areas: general, security, performance, maintainability, scalability, cost

Code Generation

"Generate a Python FastAPI endpoint for JWT authentication with refresh tokens"

Output is structured XML that Claude can apply directly:

<GENERATED_CODE>
<FILE action="create" path="src/auth.py">
# Complete implementation here...
</FILE>
</GENERATED_CODE>

Options: dry_run=true to preview without writing, language, style (production/prototype/minimal), output_dir

Thinking Mode

"Ask Gemini with high thinking level:
Design an optimal database schema for a social media platform at scale"

Levels: off (default), low (fast reasoning), high (deep analysis)

Model Selection

Text Models

Alias	Resolved Model	Best For
`pro`	`gemini-3.1-pro-preview`	Complex reasoning, coding, analysis
`flash`	`gemini-3.5-flash`	Balanced speed/quality
`fast` / `flash-lite`	`gemini-3.1-flash-lite`	High-volume, simple tasks

Models are resolved dynamically at runtime — if a model is deprecated, the registry automatically falls back to the next available option.

OpenRouter Models (via `ask_model`)

Provider	Example Model ID	Notes
OpenAI	`openai/gpt-4o`	GPT-4o, o3, o4-mini
Meta	`meta-llama/llama-3.3-70b-instruct`	Open source, fast
Anthropic	`anthropic/claude-3.5-sonnet`	Claude via OpenRouter
Mistral	`mistralai/mistral-large-2512`	Strong on EU languages
Google	`google/gemini-3.1-pro-preview`	Gemini via OpenRouter (fallback)
340+ more	—	`gemini_list_models()` to browse

Configuration

All settings via environment variables:

Variable	Default	Description
`GEMINI_API_KEY`	required	Google Gemini API key
`OPENROUTER_API_KEY`	—	OpenRouter key (enables `ask_model` for 400+ models)
`GEMINI_MODEL_PRO`	`gemini-3.1-pro-preview`	Override Pro text model
`GEMINI_MODEL_FLASH`	`gemini-3.5-flash`	Static fallback model
`GEMINI_MODEL_DEEP_RESEARCH`	`deep-research-preview-04-2026`	Override research agent
`OPENROUTER_DEFAULT_MODEL`	`openai/gpt-4o`	Default OpenRouter model
`OPENROUTER_TIMEOUT`	`120`	OpenRouter generation timeout in seconds (raise for search models like `perplexity/sonar-deep-research`)
`GEMINI_SANDBOX_ROOT`	cwd	Root directory for file access
`GEMINI_SANDBOX_ENABLED`	`true`	Enable path sandboxing
`GEMINI_MAX_FILE_SIZE`	`102400`	Max file size in bytes (100KB)
`GEMINI_CONVERSATION_TTL_HOURS`	`3`	Local conversation expiry
`GEMINI_CONVERSATION_MAX_TURNS`	`50`	Max turns per thread
`GEMINI_LOG_DIR`	`~/.omni-ai-mcp`	Log & DB directory
`GEMINI_LOG_FORMAT`	`text`	`json` or `text`
`GEMINI_DISABLED_TOOLS`	—	Comma-separated tool names to disable

Claude Code Plugin

Slash Commands

Included in .claude/commands/ (auto-available in Claude Code when working inside this repo, or copy to ~/.claude/commands/ for global access):

Command	Action
`/gemini <prompt>`	Ask Gemini Pro anything
`/gemini-research <topic>`	Autonomous deep research (40+ sources, 5-30 min)
`/gemini-review <file>`	Code review focused on bugs, security, performance
`/gemini-challenge <idea>`	Devil's Advocate — find flaws in a plan or architecture
`/gemini-analyze <path>`	Codebase analysis with 1M token context window
`/gemini-brainstorm <topic>`	Structured brainstorming (6 methodologies)
`/gemini-models`	List available models (Gemini + OpenRouter)
`/ask-model [model] <prompt>`	Ask any model: GPT-4o, Llama, Mistral, Gemini, etc.
`/cowork <task>`	Claude + Gemini working in parallel on the same task

Subagents

Included in .claude/agents/ — Claude Code activates these automatically based on context:

Agent	Trigger	Capability
`gemini-researcher`	"research X", "investigate Y", "find sources on Z"	Deep Research Agent, 40+ sources
`gemini-analyzer`	"analyze codebase", "security audit", "review architecture"	1M token context window
`model-orchestrator`	"ask GPT-4o", "compare models", "use Llama for this"	Routes to 400+ models via OpenRouter
`cowork`	"second opinion", "verify with Gemini", "stress test this"	Claude + Gemini parallel analysis with synthesis

Install globally (all projects)

cp -r .claude/commands/* ~/.claude/commands/
cp -r .claude/agents/* ~/.claude/agents/

Multi-Model Architecture

omni-ai-mcp uses Claude as the orchestrator with other models as tools:

User -> Claude Code
            (orchestrates)
       omni-ai-mcp tools
       +-- ask_model("openai/gpt-4o")       -> OpenRouter -> GPT-4o
       +-- ask_model("meta-llama/...")       -> OpenRouter -> Llama 3
       +-- ask_model("gemini-3.1-pro...")    -> Gemini API (native)
       +-- ask_gemini(...)                   -> Gemini API -> Gemini Pro
       +-- gemini_deep_research(...)         -> Gemini API -> Deep Research Agent
       +-- gemini_generate_video(...)        -> Gemini API -> Veo 3.1

This is different from provider replacement tools like claude-code-router which replace Claude's backend entirely. omni-ai-mcp keeps Claude in control while giving it access to every AI model as a tool.

Architecture

omni-ai-mcp/
+-- app/
|   +-- server.py              # FastMCP -- 20 @mcp.tool() registrations
|   +-- core/                  # Config, structured logging, security
|   +-- services/
|   |   +-- gemini.py          # Gemini client + generate_with_fallback()
|   |   +-- model_registry.py  # Dynamic model discovery (API + cache)
|   |   +-- openrouter.py      # OpenRouter client (OpenAI-compatible)
|   |   +-- persistence.py     # SQLite conversation storage + index
|   +-- tools/                 # Tool implementations by domain
|   |   +-- text/              # ask_gemini, ask_model, models, brainstorm, etc.
|   |   +-- code/              # analyze_codebase, generate_code
|   |   +-- media/             # image, video, TTS
|   |   +-- web/               # web_search, deep_research
|   |   +-- rag/               # file_store, file_search, upload
|   +-- schemas/               # Pydantic v2 validation
|   +-- utils/                 # @file expansion, token estimation
+-- tests/                     # 174+ tests (unit + integration)
+-- .claude/
|   +-- commands/              # Slash commands
|   +-- agents/                # Subagents
+-- setup.sh                   # One-command install
+-- manifest.json              # DXT extension manifest (Claude Desktop)
+-- pyproject.toml

Security

Path sandboxing: all file access restricted to GEMINI_SANDBOX_ROOT
Secrets sanitization: API keys masked in logs (Google, AWS, GitHub, OpenAI, Anthropic, Slack, JWT, Bearer tokens)
XML sanitization: LLM output sanitized before parsing to prevent injection
Atomic file writes: automatic backups before any file modification
Input validation: Pydantic v2 schemas at all tool boundaries
Rate limiting: via provider-side quotas

Docker Deployment

# Build and run
docker-compose up -d

# With log viewer (port 8080)
docker-compose --profile monitoring up -d

Docker features: non-root user, read-only filesystem with tmpfs, health check every 30s, resource limits (2 CPU, 2GB RAM), log rotation.

Troubleshooting

MCP not connecting

claude mcp list           # check registration
claude mcp remove omni-ai-mcp
./setup.sh YOUR_API_KEY   # re-register
# Restart Claude Code

Import or syntax errors

python3 -c "from app.core.config import config; print(f'v{config.version}')"
python3 -m pytest tests/unit/ -q

Video / Image generation timeouts

Video generation can take 1-6 minutes — this is normal
Large images (4K) take longer
Check your Gemini API quota at AI Studio

OpenRouter errors

Verify OPENROUTER_API_KEY is set and has credit
Check the model ID with gemini_list_models(include_openrouter=True)
Use the exact model ID from the list

API key update

claude mcp remove omni-ai-mcp
claude mcp add omni-ai-mcp --scope user \
  -e GEMINI_API_KEY=NEW_KEY \
  -e OPENROUTER_API_KEY=NEW_OR_KEY \
  -- python3 ~/.claude-mcp-servers/omni-ai-mcp/run.py

API Costs

Feature	Notes
Text generation	Free tier available · $0.075-0.30 per 1M tokens
Web Search	~$14 per 1000 queries
File Search indexing	$0.15 per 1M tokens (one-time)
Image generation	Varies by resolution and model
Video generation	Varies by duration/resolution
Text-to-speech	Varies by character count
OpenRouter	Per-model pricing — see openrouter.ai/models

See Google AI pricing for Gemini rates.

Development

# Run tests
python -m pytest tests/unit/ -v
python -m pytest tests/integration/ -v  # requires GEMINI_API_KEY

# Quick import check
python -c "from app.core.config import config; print(f'v{config.version}')"

# Reinstall after changes
rsync -a app/ ~/.claude-mcp-servers/omni-ai-mcp/app/
# Restart Claude Code

Adding a New Tool

Create app/tools/<domain>/my_tool.py with @tool(name="...", description="...", input_schema=...)
Import in app/tools/<domain>/__init__.py
Add Pydantic schema in app/schemas/inputs.py
Write tests in tests/unit/

See CLAUDE.md for the full development guide.

Changelog

v4.4.0

Updated all model defaults to the latest Gemini IDs (verified live): flash → gemini-3.5-flash, flash-lite → gemini-3.1-flash-lite, image → gemini-3-pro-image / gemini-3.1-flash-image, TTS → gemini-3.1-flash-tts-preview
Fixed deep research: deep-research-pro-preview returned 404 NOT_FOUND → now deep-research-preview-04-2026
Registry fallbacks refreshed (imagen-3.0 → imagen-4.0, added veo-3.1-lite)

v4.2.0

/gemini-illustrate and /gemini-image commands; richer gemini_generate_image tool description

v4.1.0

gemini_generate_image image editing (input_images); gemini_analyze_image multi-image + media_resolution

v4.0.1

Fixed routing: Gemini models always use native API when key available (even if provider=openrouter)
Added OpenRouter fallback for Gemini text models when no Gemini key (google/ prefix)
Fixed Python 3.11 f-string syntax in challenge tool
Fixed stale unit test imports (103 -> 174 passing tests)
Fixed model registry: corrected flash model names (gemini-3-flash-preview, gemini-3.1-flash-lite-preview)

v4.0.0

ask_model: 400+ models via OpenRouter — auto-routes from model name
gemini_list_models: live model discovery with deprecation warnings
Dynamic model registry: no more hardcoded model IDs
PyPI distribution: pip install omni-ai-mcp
Claude Code plugin: slash commands + subagents
GitHub Actions CI/CD with Trusted Publishing

v3.3.0

Dual storage mode for ask_gemini: local (SQLite) or cloud (Interactions API, 55-day retention)
gemini_list_conversations, gemini_delete_conversation
Cross-platform file locking

v3.2.0

gemini_deep_research: autonomous multi-step research (5-60 min, 40+ sources)

v3.0.0

FastMCP migration, SQLite persistence, security hardening

Contributing

Contributions are welcome! Open an issue or pull request on GitHub.

License

MIT — see LICENSE