🪟 Windows MCP Server

Windows automation that actually works. Uses the Windows UI Automation API to find buttons by name, not pixels. Tested with real AI models before every release.

Why This Exists

Screenshot-based automation doesn't work reliably. Vision models guess wrong, coordinates break when windows move or DPI changes, and you burn through thousands of tokens on retry loops. We tried it (check the commit history) — it failed too often to be useful.

Windows MCP Server asks Windows directly: "What buttons exist in this window?" Windows knows. It's deterministic.

How It Works

# 1. Find the window
window_management(action='find', title='Notepad') → handle='123456'

# 2. Click elements by name
ui_click(windowHandle='123456', nameContains='Save')

# 3. Type into fields
ui_type(windowHandle='123456', controlType='Edit', text='Hello World')

# 4. Fallback for games/canvas — screenshot + mouse
screenshot_control(windowHandle='123456') → element coordinates
mouse_control(action='click', x=450, y=300)

Same command works every time. Any machine. Any DPI. Any theme.

Browsers follow the same semantic flow: launch msedge.exe or chrome.exe, then use ui_find, ui_click, and ui_type on links, buttons, and fields exposed through UIA names and ARIA labels.

Key Features

🧠 Semantic UI — Find elements by name, not coordinates. Works regardless of DPI, theme, or window position.
� Multi-Monitor — Full support for multiple displays with per-monitor DPI scaling.
🧪 LLM-Tested — 54 tests with real AI models (GPT-4.1, GPT-5.2). 100% pass rate required for release.
💻 Broad App Support — Tested against classic Windows apps, modern Windows 11 apps, and Electron apps (VS Code, Teams, Slack). Chromium browser pages follow the same ARIA-driven pattern, but browser chrome remains best-effort.
🔄 Full Fallback — Screenshot + mouse + keyboard for games and custom controls.
🪙 Token Optimized — Short property names, JPEG screenshots, auto-scaling. ~60% fewer tokens than standard JSON.

Installation

VS Code Extension — Install from Marketplace. Works with GitHub Copilot automatically.

Plugin (GitHub Copilot CLI / Claude Code) — Install the shared plugin bundle in plugin/:

copilot plugin install sbroenne/mcp-windows:plugin

For local Claude Code development:

claude --plugin-dir .\plugin

On first use, the plugin downloads the current standalone release into plugin\bin\.

Standalone — Download from Releases. Add to your MCP config:

{ "servers": { "windows": { "command": "path\\to\\Sbroenne.WindowsMcp.exe" } } }

Tools

Tool	Purpose
`ui_click`	Click buttons, checkboxes, menu items by name
`ui_type`	Type into text fields
`ui_find`	Discover elements in a window (with timeout/retry)
`ui_read`	Read text from elements (with OCR fallback)
`file_save`	Save files via Save As dialog
`screenshot_control`	Get element metadata (image optional)
`window_management`	Find, activate, move, resize windows
`mouse_control`	Coordinate-based clicks (fallback for games)
`keyboard_control`	Hotkeys and key sequences
`app`	Launch applications

Full reference: FEATURES.md

⚠️ Caution

This MCP server controls your Windows desktop. Use responsibly.

Known Limitations

UAC & Elevated Processes — Windows security prevents any non-elevated process from interacting with UAC prompts or elevated (Administrator) windows. This is a fundamental Windows security boundary, not an MCP limitation.

Scenario	What Happens	Workaround
`winget install` triggers UAC	AI cannot click the UAC prompt	Run terminal as Administrator first
App running as Administrator	UI automation tools return `ElevatedWindowActive` error	Run MCP server elevated, or use the app non-elevated
UAC prompt appears	AI cannot interact with secure desktop	User must manually approve

See FEATURES.md for details.

Testing

dotnet test                                      # All tests
dotnet test --filter "FullyQualifiedName~Unit"   # Unit only

Framework coverage: Tests run against WinForms, WinUI 3, Electron apps, and real Chromium browser app windows by default. The Chromium smoke stack now exercises both Edge and Chrome (when installed) across a deterministic local page and a required public-web slice (demo.playwright.dev/todomvc) using the same semantic-first UI Automation model with isolated browser state and browser-window-only cleanup.

dotnet test .\tests\Sbroenne.WindowsMcp.Tests\Sbroenne.WindowsMcp.Tests.csproj --filter "FullyQualifiedName~ChromiumBrowser"

LLM tests: 54 tests with real AI models (GPT-4.1, GPT-5.2). 100% pass rate required for release.

cd tests/Sbroenne.WindowsMcp.LLM.Tests
uv run pytest -v

Requires Azure OpenAI access. See LLM Tests README.

Related Projects

pytest-aitest — LLM agent testing framework (powers our integration tests)
Excel MCP Server — AI-powered Excel automation
OBS Studio MCP Server — AI-powered streaming control

Documentation

Document	Description
FEATURES.md	Complete tool reference — all actions, parameters, examples
CONTRIBUTING.md	Build instructions, coding guidelines, PR process
LLM Tests README	How to run LLM integration tests
Release Setup	Azure OIDC and GitHub Actions configuration

License

MIT — see LICENSE

Contributing

See CONTRIBUTING.md