pywinauto-mcp
Health Pass
- License — License: MIT
- Description — Repository has a description
- Active repo — Last push 0 days ago
- Community trust — 16 GitHub stars
Code Pass
- Code scan — Scanned 12 files during light audit, no dangerous patterns found
Permissions Pass
- Permissions — No dangerous permissions requested
No AI report is available for this listing yet.
MCP 3.1 Server for Windows Automation with associated webapp Wraps window/UI analysis, mouse, keyboard, screenshots, OCR, and optional face recog
PyWinAuto MCP
Let an AI assistant control real Windows apps through a single MCP server that wraps window, UI, mouse, keyboard, screenshots, OCR, and optional face checks behind a small set of portmanteau tools (many operations, few entry points so models stay focused).
Stack: v0.4.2 FastMCP 3.2+ Python 3.12+ Windows 10/11
Web dashboard (optional): This repo ships web_sota — a local browser UI (Vite; default http://localhost:10788) that talks to the same backend as the REST API (http://127.0.0.1:10789). Use it for a tools hub, safety/help pages, local LLM chat (Ollama or LM Studio), camera selection, biometrics, and overview — run web_sota/start.ps1. You do not need the webapp for normal MCP stdio use in an IDE; it is an extra operator surface.
Important: This is not a browser sandbox. It runs in your desktop session and can move the real cursor, type into real windows, and drive the same UI you see. Read docs/SAFETY.md before you wire it into an IDE. For why hooks and full desktop control look “malware-adjacent” and how this repo gates them (research, forensics, legitimate automation), see Dual-use tooling in that doc. For throwaway desktops (Windows Sandbox, VMs, mapped folders), use virtualization-mcp alongside this project. Fleet notes: mcp-central-docs/patterns/PYWINAUTO_MCP_SAFETY.md. Optional face features are off until you opt in see SAFETY 5 and PYWINAUTO_MCP_ENABLE_FACE. Optional global keylogger is off until you opt in see SAFETY 6 and PYWINAUTO_MCP_ENABLE_KEYLOGGER.
Native Windows vs websites (HTML DOM): pywinauto-mcp drives desktop UI (Win32 / UI Automation windows, dialogs, many native controls). It does not expose the HTML DOM inside a browser tab. For website automation and analysis (selectors, accessibility tree, network, console), use a browser MCP; those servers are usually built on Playwright (or Chromium-only stacks). The two are orthogonal combine them in your IDE when a workflow needs both a real browser page and a native app on the same machine.
Discovery (GitHub, Glama, MCP catalogs)
- Safety:
docs/SAFETY.mdkill switch, rate limits, HITL (human-in-the-loop), dry-run. - Dual-use / research: Dual-use tooling — capability vs. intent, guardrails, vendor refusal context.
- Isolation: Windows Sandbox / VM via
virtualization-mcp. Repo stars are not a safety guarantee.
Product & docs
- PRD / index:
docs/PRD.mddocs/README.md - Web dashboard:
web_sota/start.ps1(ports 10788 / 10789)
Prompts, skills, MCPB
- MCP prompts:
desktop_automation_operator_protocol,desktop_automation_runbooksrc/pywinauto_mcp/prompts.py. - Cursor skill:
skills/desktop-automation-protocol/SKILL.md. - Foreground:
docs/OPERATOR_PROTOCOL.mdkeep the target app focused during automation. - MCPB:
mcpb/manifest.jsonpackages the server; prompts come from the running process unless you extend the pack.
Planned / todo
- Optional voice (STT / keyword / speaker-adjacent): Not implemented. Would mirror face: env + optional extra, local-first, same HITL (human-in-the-loop) / safety docs not authentication.
Examples
- examples/README.md — demos: mouse dance, 9-Notepad grid, typewriter Notepad, plus older samples. Run all three in order:
just demo(requires just). - just paint-demo — Industrialized MS Paint drawing (justfile/PowerShell orchestrated).
- examples/notepad_basic.py — simple window flow.
- examples/calculator_advanced.py — element tree.
- examples/system_monitoring.py — processes / tray.
Latency depends on the host, target app, and backends (OCR, etc.); treat any old benchmark tables as obsolete.
Tools (portmanteau)
Seven core interfaces plus get_desktop_state, optional automation_face when enabled, and optional global_keylogger when enabled (see SAFETY):
| Tool | Operations | Description |
|---|---|---|
automation_windows |
11 | Window management (list, find, maximize, etc.) |
automation_elements |
14 | UI element interaction (click, hover, text, etc.) |
automation_mouse |
9 | Mouse — HITL (human-in-the-loop) may apply |
automation_keyboard |
4 | Keyboard — HITL (human-in-the-loop) may apply |
global_keylogger |
5 | Session keyboard capture (opt-in: PYWINAUTO_MCP_ENABLE_KEYLOGGER; see SAFETY §6) |
automation_visual |
4 | Screenshot, OCR, find image |
automation_face |
5 | Face (opt-in: env + face extra) |
automation_system |
status, help, wait, info, clipboard, processes, start_app | |
get_desktop_state |
1 | UI tree / discovery |
Usage snippets
automation_windows
automation_windows("find", title="Notepad", partial=True)
automation_windows("maximize", handle=12345)
automation_elements
automation_elements("click", window_handle=12345, control_id="btnOK")
automation_elements("set_text", window_handle=12345, control_id="Edit1", text="Hello!")
automation_visual
automation_visual("extract_text", image_path="screen.png")
automation_visual("find_image", template_path="button.png", threshold=0.8)
Installation
Prerequisites: uv recommended Python 3.12+ Windows 10/11 Tesseract optional (visual/OCR).
Run with uvx (published package):
uvx pywinauto-mcp
Claude Desktop (example):
"mcpServers": {
"pywinauto-mcp": {
"command": "uv",
"args": ["--directory", "D:/Dev/repos/pywinauto-mcp", "run", "pywinauto-mcp"]
}
}
Install from source:
uv pip install -e .
MCPB:
mcpb install pywinauto-mcp
Safety
Canonical: docs/SAFETY.md two-server model with virtualization-mcp, HITL (human-in-the-loop), env vars, fleet docs.
Dual-use / research framing: Dual-use tooling (research, forensics, guardrails) — why full desktop automation and hooks overlap offensive tooling in capability, how intent and authorization differ, and what this repo does to limit abuse.
Desktop automation is not browser-sandboxed. Sampling and long agent loops can multiply tool calls.
Human-in-the-loop (HITL)
HITL (human-in-the-loop) means an operator must approve before mutating mouse / keyboard when the server is configured that way.
approve_automation(duration_minutes=...)before mutating mouse / keyboard, or tools returnclarification_needed.automation_mouse("position")andautomation_mouse("telemetry")are read-only and skip approval.
Operations Overview
🖱️ Mouse Control (automation_mouse)
Consolidates all mouse interactions: position tracking, clicking, dragging, and hovering.
Pointer injection uses the in-repo win32_mouse backend (SetCursorPos / mouse_event, DPI-aware) for automation_mouse and for coordinate-based clicks in automation_elements — not PyAutoGUI alone, so moves and clicks stay reliable on scaled displays. Responses may include input_backend: "win32_mouse". Failsafe matches PyAutoGUI: moving the cursor to the upper-left screen corner aborts injected pointer ops unless PYWINAUTO_MCP_BYPASS_HITL=1 (also disables that corner check for win32_mouse).
Operations: position, click, double_click, right_click, move, move_relative, scroll, drag, hover, telemetry
Key Tool: Visual Telemetry HUD
Launches a high-visibility, "Always-on-Top" overlay for real-time calibration:
- Coordinate Tracking: Live X/Y position updates.
- Scrolling Input Buffer: Displays the last 20 keyboard characters and click events (visual verification only, no persistence).
- Industrial Safety: High visibility ensures monitoring is transparent and operator-auditable.
# Example: Launch telemetry HUD for 30 seconds
mcp-invoke automation_mouse --operation "telemetry" --telemetry_duration 30
Environment variables
| Env | Purpose |
|---|---|
PYWINAUTO_MCP_BYPASS_HITL=1 |
Bypasses approval prompts and disables pointer failsafe (pyautogui.FAILSAFE and win32_mouse corner escape) for trust-controlled demos/CI. |
PYWINAUTO_MCP_KILL_SWITCH=1 |
Blocks mutating mouse/keyboard (after HITL (human-in-the-loop) path). |
PYWINAUTO_MCP_MAX_ACTIONS_PER_MINUTE |
Default 120 rolling 60s cap for mutating actions. |
PYWINAUTO_MCP_DRY_RUN=1 |
Count actions without sending input (dry_run in results). |
PYWINAUTO_MCP_ENABLE_FACE=1 |
Allows registering automation_face (needs face extra). |
PYWINAUTO_MCP_ENABLE_KEYLOGGER=1 |
Allows registering global_keylogger (session keyboard capture; see SAFETY §6). |
PYWINAUTO_LLM_BASE_URL |
Default OpenAI-compatible root for the web_sota local-LLM proxy (e.g. Ollama http://127.0.0.1:11434/v1, LM Studio http://127.0.0.1:1234/v1). The UI can override per session. |
PYWINAUTO_MCP_CAMERA_MAX_INDEX |
Max OpenCV index to probe for GET /api/v1/cameras/ (default 10, capped at 32). |
automation_safety(operation="status"|"reset_counters") counters and flags. automation_system("help") returns a structured overview (version, tools, safety keys, doc paths).
Fleet: mcp-central-docs patterns/PYWINAUTO_MCP_SAFETY.md.
Testing (CI vs local)
See docs/TESTING.md environment-aware markers (requires_hardware, destructive, ) aligned with mcp-central-docs standards/testing-environment-aware.md. In CI, hardware-marked tests are skipped; run locally on Windows to exercise OpenCV / real window flows.
Maintenance
- Documentation should match implemented tools and env vars.
glama.json: update when releasing or when marketplace metadata changes.
License
MIT Copyright (c) 2026 Sandra Schipal.
Web dashboard (web_sota)
Optional Vite UI + local backend. Default ports from web_sota/start.ps1: frontend 10788, backend 10789.
Set-Location web_sota
.\start.ps1
Open http://localhost:10788 Help route documents safety, env vars, and tool overview (same themes as automation_system("help")).
Local LLM chat (/chat): proxies to Ollama or LM Studio (OpenAI-compatible /v1 on localhost only). Pick a model, optional personas, prompt refiner, and repo knowledge pre-prompt (src/pywinauto_mcp/llm_repo_context.py) for questions like can I click and drag?. Start Ollama or LM Studio first; then Refresh models.
Reviews (0)
Sign in to leave a review.
Leave a reviewNo results found