pywinauto-mcp

mcp
Security Audit
Pass
Health Pass
  • License — License: MIT
  • Description — Repository has a description
  • Active repo — Last push 0 days ago
  • Community trust — 16 GitHub stars
Code Pass
  • Code scan — Scanned 12 files during light audit, no dangerous patterns found
Permissions Pass
  • Permissions — No dangerous permissions requested

No AI report is available for this listing yet.

SUMMARY

MCP 3.1 Server for Windows Automation with associated webapp Wraps window/UI analysis, mouse, keyboard, screenshots, OCR, and optional face recog

README.md

PyWinAuto MCP

Let an AI assistant control real Windows apps through a single MCP server that wraps window, UI, mouse, keyboard, screenshots, OCR, and optional face checks behind a small set of portmanteau tools (many operations, few entry points so models stay focused).

Stack: v0.4.2 FastMCP 3.2+ Python 3.12+ Windows 10/11

Web dashboard (optional): This repo ships web_sota — a local browser UI (Vite; default http://localhost:10788) that talks to the same backend as the REST API (http://127.0.0.1:10789). Use it for a tools hub, safety/help pages, local LLM chat (Ollama or LM Studio), camera selection, biometrics, and overview — run web_sota/start.ps1. You do not need the webapp for normal MCP stdio use in an IDE; it is an extra operator surface.

Important: This is not a browser sandbox. It runs in your desktop session and can move the real cursor, type into real windows, and drive the same UI you see. Read docs/SAFETY.md before you wire it into an IDE. For why hooks and full desktop control look “malware-adjacent” and how this repo gates them (research, forensics, legitimate automation), see Dual-use tooling in that doc. For throwaway desktops (Windows Sandbox, VMs, mapped folders), use virtualization-mcp alongside this project. Fleet notes: mcp-central-docs/patterns/PYWINAUTO_MCP_SAFETY.md. Optional face features are off until you opt in see SAFETY 5 and PYWINAUTO_MCP_ENABLE_FACE. Optional global keylogger is off until you opt in see SAFETY 6 and PYWINAUTO_MCP_ENABLE_KEYLOGGER.

Native Windows vs websites (HTML DOM): pywinauto-mcp drives desktop UI (Win32 / UI Automation windows, dialogs, many native controls). It does not expose the HTML DOM inside a browser tab. For website automation and analysis (selectors, accessibility tree, network, console), use a browser MCP; those servers are usually built on Playwright (or Chromium-only stacks). The two are orthogonal combine them in your IDE when a workflow needs both a real browser page and a native app on the same machine.

Discovery (GitHub, Glama, MCP catalogs)

  • Safety: docs/SAFETY.md kill switch, rate limits, HITL (human-in-the-loop), dry-run.
  • Dual-use / research: Dual-use tooling — capability vs. intent, guardrails, vendor refusal context.
  • Isolation: Windows Sandbox / VM via virtualization-mcp. Repo stars are not a safety guarantee.

Product & docs

Prompts, skills, MCPB

Planned / todo

  • Optional voice (STT / keyword / speaker-adjacent): Not implemented. Would mirror face: env + optional extra, local-first, same HITL (human-in-the-loop) / safety docs not authentication.

Examples

Latency depends on the host, target app, and backends (OCR, etc.); treat any old benchmark tables as obsolete.

Tools (portmanteau)

Seven core interfaces plus get_desktop_state, optional automation_face when enabled, and optional global_keylogger when enabled (see SAFETY):

Tool Operations Description
automation_windows 11 Window management (list, find, maximize, etc.)
automation_elements 14 UI element interaction (click, hover, text, etc.)
automation_mouse 9 Mouse — HITL (human-in-the-loop) may apply
automation_keyboard 4 Keyboard — HITL (human-in-the-loop) may apply
global_keylogger 5 Session keyboard capture (opt-in: PYWINAUTO_MCP_ENABLE_KEYLOGGER; see SAFETY §6)
automation_visual 4 Screenshot, OCR, find image
automation_face 5 Face (opt-in: env + face extra)
automation_system status, help, wait, info, clipboard, processes, start_app
get_desktop_state 1 UI tree / discovery

Usage snippets

automation_windows

automation_windows("find", title="Notepad", partial=True)
automation_windows("maximize", handle=12345)

automation_elements

automation_elements("click", window_handle=12345, control_id="btnOK")
automation_elements("set_text", window_handle=12345, control_id="Edit1", text="Hello!")

automation_visual

automation_visual("extract_text", image_path="screen.png")
automation_visual("find_image", template_path="button.png", threshold=0.8)

Installation

Prerequisites: uv recommended Python 3.12+ Windows 10/11 Tesseract optional (visual/OCR).

Run with uvx (published package):

uvx pywinauto-mcp

Claude Desktop (example):

"mcpServers": {
  "pywinauto-mcp": {
    "command": "uv",
    "args": ["--directory", "D:/Dev/repos/pywinauto-mcp", "run", "pywinauto-mcp"]
  }
}

Install from source:

uv pip install -e .

MCPB:

mcpb install pywinauto-mcp

Safety

Canonical: docs/SAFETY.md two-server model with virtualization-mcp, HITL (human-in-the-loop), env vars, fleet docs.

Dual-use / research framing: Dual-use tooling (research, forensics, guardrails) — why full desktop automation and hooks overlap offensive tooling in capability, how intent and authorization differ, and what this repo does to limit abuse.

Desktop automation is not browser-sandboxed. Sampling and long agent loops can multiply tool calls.

Human-in-the-loop (HITL)

HITL (human-in-the-loop) means an operator must approve before mutating mouse / keyboard when the server is configured that way.

  • approve_automation(duration_minutes=...) before mutating mouse / keyboard, or tools return clarification_needed.
  • automation_mouse("position") and automation_mouse("telemetry") are read-only and skip approval.

Operations Overview

🖱️ Mouse Control (automation_mouse)

Consolidates all mouse interactions: position tracking, clicking, dragging, and hovering.

Pointer injection uses the in-repo win32_mouse backend (SetCursorPos / mouse_event, DPI-aware) for automation_mouse and for coordinate-based clicks in automation_elements — not PyAutoGUI alone, so moves and clicks stay reliable on scaled displays. Responses may include input_backend: "win32_mouse". Failsafe matches PyAutoGUI: moving the cursor to the upper-left screen corner aborts injected pointer ops unless PYWINAUTO_MCP_BYPASS_HITL=1 (also disables that corner check for win32_mouse).

Operations: position, click, double_click, right_click, move, move_relative, scroll, drag, hover, telemetry

Key Tool: Visual Telemetry HUD
Launches a high-visibility, "Always-on-Top" overlay for real-time calibration:

  • Coordinate Tracking: Live X/Y position updates.
  • Scrolling Input Buffer: Displays the last 20 keyboard characters and click events (visual verification only, no persistence).
  • Industrial Safety: High visibility ensures monitoring is transparent and operator-auditable.
# Example: Launch telemetry HUD for 30 seconds
mcp-invoke automation_mouse --operation "telemetry" --telemetry_duration 30

Environment variables

Env Purpose
PYWINAUTO_MCP_BYPASS_HITL=1 Bypasses approval prompts and disables pointer failsafe (pyautogui.FAILSAFE and win32_mouse corner escape) for trust-controlled demos/CI.
PYWINAUTO_MCP_KILL_SWITCH=1 Blocks mutating mouse/keyboard (after HITL (human-in-the-loop) path).
PYWINAUTO_MCP_MAX_ACTIONS_PER_MINUTE Default 120 rolling 60s cap for mutating actions.
PYWINAUTO_MCP_DRY_RUN=1 Count actions without sending input (dry_run in results).
PYWINAUTO_MCP_ENABLE_FACE=1 Allows registering automation_face (needs face extra).
PYWINAUTO_MCP_ENABLE_KEYLOGGER=1 Allows registering global_keylogger (session keyboard capture; see SAFETY §6).
PYWINAUTO_LLM_BASE_URL Default OpenAI-compatible root for the web_sota local-LLM proxy (e.g. Ollama http://127.0.0.1:11434/v1, LM Studio http://127.0.0.1:1234/v1). The UI can override per session.
PYWINAUTO_MCP_CAMERA_MAX_INDEX Max OpenCV index to probe for GET /api/v1/cameras/ (default 10, capped at 32).

automation_safety(operation="status"|"reset_counters") counters and flags. automation_system("help") returns a structured overview (version, tools, safety keys, doc paths).

Fleet: mcp-central-docs patterns/PYWINAUTO_MCP_SAFETY.md.


Testing (CI vs local)

See docs/TESTING.md environment-aware markers (requires_hardware, destructive, ) aligned with mcp-central-docs standards/testing-environment-aware.md. In CI, hardware-marked tests are skipped; run locally on Windows to exercise OpenCV / real window flows.

Maintenance

  • Documentation should match implemented tools and env vars.
  • glama.json: update when releasing or when marketplace metadata changes.

License

MIT Copyright (c) 2026 Sandra Schipal.


Web dashboard (web_sota)

Optional Vite UI + local backend. Default ports from web_sota/start.ps1: frontend 10788, backend 10789.

Set-Location web_sota
.\start.ps1

Open http://localhost:10788 Help route documents safety, env vars, and tool overview (same themes as automation_system("help")).

Local LLM chat (/chat): proxies to Ollama or LM Studio (OpenAI-compatible /v1 on localhost only). Pick a model, optional personas, prompt refiner, and repo knowledge pre-prompt (src/pywinauto_mcp/llm_repo_context.py) for questions like can I click and drag?. Start Ollama or LM Studio first; then Refresh models.

Reviews (0)

No results found