mcp-windows

mcp
Security Audit
Warn
Health Pass
  • License — License: MIT
  • Description — Repository has a description
  • Active repo — Last push 0 days ago
  • Community trust — 24 GitHub stars
Code Warn
  • fs module — File system access in .github/workflows/squad-heartbeat.yml
  • fs module — File system access in .github/workflows/squad-issue-assign.yml
  • fs module — File system access in .github/workflows/squad-triage.yml
  • fs module — File system access in .github/workflows/sync-squad-labels.yml
Permissions Pass
  • Permissions — No dangerous permissions requested

No AI report is available for this listing yet.

SUMMARY

Let AI agents control Windows applications. Click buttons, type text, toggle settings — all by name, not coordinates. Uses the Windows UI Automation API to find UI elements reliably, regardless of DPI, theme, resolution, or window position.

README.md

🪟 Windows MCP Server

License
.NET
Platform
CI
LLM Tests

Windows automation that actually works. Uses the Windows UI Automation API to find buttons by name, not pixels. Tested with real AI models before every release.

Why This Exists

Screenshot-based automation doesn't work reliably. Vision models guess wrong, coordinates break when windows move or DPI changes, and you burn through thousands of tokens on retry loops. We tried it (check the commit history) — it failed too often to be useful.

Windows MCP Server asks Windows directly: "What buttons exist in this window?" Windows knows. It's deterministic.

How It Works

# 1. Find the window
window_management(action='find', title='Notepad') → handle='123456'

# 2. Click elements by name
ui_click(windowHandle='123456', nameContains='Save')

# 3. Type into fields
ui_type(windowHandle='123456', controlType='Edit', text='Hello World')

# 4. Fallback for games/canvas — screenshot + mouse
screenshot_control(windowHandle='123456') → element coordinates
mouse_control(action='click', x=450, y=300)

Same command works every time. Any machine. Any DPI. Any theme.

Browsers follow the same semantic flow: launch msedge.exe or chrome.exe, then use ui_find, ui_click, and ui_type on links, buttons, and fields exposed through UIA names and ARIA labels.

Key Features

  • 🧠 Semantic UI — Find elements by name, not coordinates. Works regardless of DPI, theme, or window position.
  • � Multi-Monitor — Full support for multiple displays with per-monitor DPI scaling.
  • 🧪 LLM-Tested — 54 tests with real AI models (GPT-4.1, GPT-5.2). 100% pass rate required for release.
  • 💻 Broad App Support — Tested against classic Windows apps, modern Windows 11 apps, and Electron apps (VS Code, Teams, Slack). Chromium browser pages follow the same ARIA-driven pattern, but browser chrome remains best-effort.
  • 🔄 Full Fallback — Screenshot + mouse + keyboard for games and custom controls.
  • 🪙 Token Optimized — Short property names, JPEG screenshots, auto-scaling. ~60% fewer tokens than standard JSON.

Installation

VS Code ExtensionInstall from Marketplace. Works with GitHub Copilot automatically.

Plugin (GitHub Copilot CLI / Claude Code) — Install the shared plugin bundle in plugin/:

copilot plugin install sbroenne/mcp-windows:plugin

For local Claude Code development:

claude --plugin-dir .\plugin

On first use, the plugin downloads the current standalone release into plugin\bin\.

StandaloneDownload from Releases. Add to your MCP config:

{ "servers": { "windows": { "command": "path\\to\\Sbroenne.WindowsMcp.exe" } } }

Tools

Tool Purpose
ui_click Click buttons, checkboxes, menu items by name
ui_type Type into text fields
ui_find Discover elements in a window (with timeout/retry)
ui_read Read text from elements (with OCR fallback)
file_save Save files via Save As dialog
screenshot_control Get element metadata (image optional)
window_management Find, activate, move, resize windows
mouse_control Coordinate-based clicks (fallback for games)
keyboard_control Hotkeys and key sequences
app Launch applications

Full reference: FEATURES.md

⚠️ Caution

This MCP server controls your Windows desktop. Use responsibly.

Known Limitations

UAC & Elevated Processes — Windows security prevents any non-elevated process from interacting with UAC prompts or elevated (Administrator) windows. This is a fundamental Windows security boundary, not an MCP limitation.

Scenario What Happens Workaround
winget install triggers UAC AI cannot click the UAC prompt Run terminal as Administrator first
App running as Administrator UI automation tools return ElevatedWindowActive error Run MCP server elevated, or use the app non-elevated
UAC prompt appears AI cannot interact with secure desktop User must manually approve

See FEATURES.md for details.

Testing

dotnet test                                      # All tests
dotnet test --filter "FullyQualifiedName~Unit"   # Unit only

Framework coverage: Tests run against WinForms, WinUI 3, Electron apps, and real Chromium browser app windows by default. The Chromium smoke stack now exercises both Edge and Chrome (when installed) across a deterministic local page and a required public-web slice (demo.playwright.dev/todomvc) using the same semantic-first UI Automation model with isolated browser state and browser-window-only cleanup.

dotnet test .\tests\Sbroenne.WindowsMcp.Tests\Sbroenne.WindowsMcp.Tests.csproj --filter "FullyQualifiedName~ChromiumBrowser"

LLM tests: 54 tests with real AI models (GPT-4.1, GPT-5.2). 100% pass rate required for release.

cd tests/Sbroenne.WindowsMcp.LLM.Tests
uv run pytest -v

Requires Azure OpenAI access. See LLM Tests README.

Related Projects

Documentation

Document Description
FEATURES.md Complete tool reference — all actions, parameters, examples
CONTRIBUTING.md Build instructions, coding guidelines, PR process
LLM Tests README How to run LLM integration tests
Release Setup Azure OIDC and GitHub Actions configuration

License

MIT — see LICENSE

Contributing

See CONTRIBUTING.md

Reviews (0)

No results found