mcp-windows
Health Gecti
- License — License: MIT
- Description — Repository has a description
- Active repo — Last push 0 days ago
- Community trust — 24 GitHub stars
Code Uyari
- fs module — File system access in .github/workflows/squad-heartbeat.yml
- fs module — File system access in .github/workflows/squad-issue-assign.yml
- fs module — File system access in .github/workflows/squad-triage.yml
- fs module — File system access in .github/workflows/sync-squad-labels.yml
Permissions Gecti
- Permissions — No dangerous permissions requested
Bu listing icin henuz AI raporu yok.
Let AI agents control Windows applications. Click buttons, type text, toggle settings — all by name, not coordinates. Uses the Windows UI Automation API to find UI elements reliably, regardless of DPI, theme, resolution, or window position.
🪟 Windows MCP Server
Windows automation that actually works. Uses the Windows UI Automation API to find buttons by name, not pixels. Tested with real AI models before every release.
Why This Exists
Screenshot-based automation doesn't work reliably. Vision models guess wrong, coordinates break when windows move or DPI changes, and you burn through thousands of tokens on retry loops. We tried it (check the commit history) — it failed too often to be useful.
Windows MCP Server asks Windows directly: "What buttons exist in this window?" Windows knows. It's deterministic.
How It Works
# 1. Find the window
window_management(action='find', title='Notepad') → handle='123456'
# 2. Click elements by name
ui_click(windowHandle='123456', nameContains='Save')
# 3. Type into fields
ui_type(windowHandle='123456', controlType='Edit', text='Hello World')
# 4. Fallback for games/canvas — screenshot + mouse
screenshot_control(windowHandle='123456') → element coordinates
mouse_control(action='click', x=450, y=300)
Same command works every time. Any machine. Any DPI. Any theme.
Browsers follow the same semantic flow: launch msedge.exe or chrome.exe, then use ui_find, ui_click, and ui_type on links, buttons, and fields exposed through UIA names and ARIA labels.
Key Features
- 🧠 Semantic UI — Find elements by name, not coordinates. Works regardless of DPI, theme, or window position.
- � Multi-Monitor — Full support for multiple displays with per-monitor DPI scaling.
- 🧪 LLM-Tested — 54 tests with real AI models (GPT-4.1, GPT-5.2). 100% pass rate required for release.
- 💻 Broad App Support — Tested against classic Windows apps, modern Windows 11 apps, and Electron apps (VS Code, Teams, Slack). Chromium browser pages follow the same ARIA-driven pattern, but browser chrome remains best-effort.
- 🔄 Full Fallback — Screenshot + mouse + keyboard for games and custom controls.
- 🪙 Token Optimized — Short property names, JPEG screenshots, auto-scaling. ~60% fewer tokens than standard JSON.
Installation
VS Code Extension — Install from Marketplace. Works with GitHub Copilot automatically.
Plugin (GitHub Copilot CLI / Claude Code) — Install the shared plugin bundle in plugin/:
copilot plugin install sbroenne/mcp-windows:plugin
For local Claude Code development:
claude --plugin-dir .\plugin
On first use, the plugin downloads the current standalone release into plugin\bin\.
Standalone — Download from Releases. Add to your MCP config:
{ "servers": { "windows": { "command": "path\\to\\Sbroenne.WindowsMcp.exe" } } }
Tools
| Tool | Purpose |
|---|---|
ui_click |
Click buttons, checkboxes, menu items by name |
ui_type |
Type into text fields |
ui_find |
Discover elements in a window (with timeout/retry) |
ui_read |
Read text from elements (with OCR fallback) |
file_save |
Save files via Save As dialog |
screenshot_control |
Get element metadata (image optional) |
window_management |
Find, activate, move, resize windows |
mouse_control |
Coordinate-based clicks (fallback for games) |
keyboard_control |
Hotkeys and key sequences |
app |
Launch applications |
Full reference: FEATURES.md
⚠️ Caution
This MCP server controls your Windows desktop. Use responsibly.
Known Limitations
UAC & Elevated Processes — Windows security prevents any non-elevated process from interacting with UAC prompts or elevated (Administrator) windows. This is a fundamental Windows security boundary, not an MCP limitation.
| Scenario | What Happens | Workaround |
|---|---|---|
winget install triggers UAC |
AI cannot click the UAC prompt | Run terminal as Administrator first |
| App running as Administrator | UI automation tools return ElevatedWindowActive error |
Run MCP server elevated, or use the app non-elevated |
| UAC prompt appears | AI cannot interact with secure desktop | User must manually approve |
See FEATURES.md for details.
Testing
dotnet test # All tests
dotnet test --filter "FullyQualifiedName~Unit" # Unit only
Framework coverage: Tests run against WinForms, WinUI 3, Electron apps, and real Chromium browser app windows by default. The Chromium smoke stack now exercises both Edge and Chrome (when installed) across a deterministic local page and a required public-web slice (demo.playwright.dev/todomvc) using the same semantic-first UI Automation model with isolated browser state and browser-window-only cleanup.
dotnet test .\tests\Sbroenne.WindowsMcp.Tests\Sbroenne.WindowsMcp.Tests.csproj --filter "FullyQualifiedName~ChromiumBrowser"
LLM tests: 54 tests with real AI models (GPT-4.1, GPT-5.2). 100% pass rate required for release.
cd tests/Sbroenne.WindowsMcp.LLM.Tests
uv run pytest -v
Requires Azure OpenAI access. See LLM Tests README.
Related Projects
- pytest-aitest — LLM agent testing framework (powers our integration tests)
- Excel MCP Server — AI-powered Excel automation
- OBS Studio MCP Server — AI-powered streaming control
Documentation
| Document | Description |
|---|---|
| FEATURES.md | Complete tool reference — all actions, parameters, examples |
| CONTRIBUTING.md | Build instructions, coding guidelines, PR process |
| LLM Tests README | How to run LLM integration tests |
| Release Setup | Azure OIDC and GitHub Actions configuration |
License
MIT — see LICENSE
Contributing
See CONTRIBUTING.md
Yorumlar (0)
Yorum birakmak icin giris yap.
Yorum birakSonuc bulunamadi