computeruseprotocol

mcp
Security Audit
Pass
Health Pass
  • License — License: MIT
  • Description — Repository has a description
  • Active repo — Last push 0 days ago
  • Community trust — 78 GitHub stars
Code Pass
  • Code scan — Scanned 4 files during light audit, no dangerous patterns found
Permissions Pass
  • Permissions — No dangerous permissions requested
Purpose
This tool provides a universal schema and compact text encoding for representing desktop UI accessibility trees. It is designed to help AI agents perceive and interact with graphical user interfaces across multiple operating systems.

Security Assessment
The automated code scan found no dangerous patterns, no hardcoded secrets, and no dangerous permission requests. However, by design, the protocol captures and exposes local UI states, application names, and on-screen text, which can include highly sensitive user data. While the schema itself is safe, implementing it to screen-scrape active desktops requires strict local access controls. Overall risk: Medium (elevated due to the sensitive nature of captured UI data, though the tool itself is structurally benign).

Quality Assessment
The project is licensed under the standard MIT license and has garnered decent community interest with 78 GitHub stars. The repository remains active with recent pushes. However, the README explicitly warns that the project is archived. It has been superseded by a native Rust CLI alternative. Consequently, while the code is currently functional, it is no longer actively maintained.

Verdict
Not recommended (The project is officially archived and superseded; developers should evaluate the recommended Rust replacement instead).
SUMMARY

Computer Use Protocol is a universal schema for AI agents to perceive and interact with any desktop UI.

README.md

[!WARNING]
This project is archived. Computer Use Protocol has been superseded by agent-ctrl, a native Rust CLI that implements the same cross-platform UI normalization with a compact agent-ready output format. Please use agent-ctrl going forward.


Computer Use Protocol

A universal protocol for AI agents to perceive and interact with any desktop UI.

npm version PyPI version MIT License Website


Computer Use Protocol is a universal schema for representing UI accessibility trees, one format that works identically across Windows, macOS, Linux, Web, Android, and iOS. It includes a compact text encoding optimized for LLM context windows (~97% smaller than JSON), making it ideal for AI agents that need to perceive and act on desktop UIs. This repository is that core: the JSON schema, the compact text format, the cross-platform role/state/action mappings, and documentation.

CUP also provides SDKs for capturing and interacting with native UI trees, and MCP servers for exposing those capabilities directly to AI agents like Claude and Copilot.

Schema

CUP defines a JSON envelope format built on ARIA-derived roles:

{
    "version": "0.1.0",
    "platform": "windows",
    "timestamp": 1740067200000,
    "screen": { "w": 2560, "h": 1440, "scale": 1.0 },
    "app": { "name": "Spotify", "pid": 1234 },
    "tree": [
        {
            "id": "e0",
            "role": "window",
            "name": "Spotify",
            "bounds": { "x": 120, "y": 40, "w": 1680, "h": 1020 },
            "states": ["focused"],
            "actions": ["click"],
            "children": [ ... ]
        }
    ]
}

CUP compact format (~97% token reduction, heavily optimized for CUA/LLMs):

[e0] win "Spotify" 120,40 1680x1020
  [e1] doc "Spotify" 120,40 1680x1020
    [e2] btn "Back" 132,52 32x32 [clk]
    [e3] btn "Forward" 170,52 32x32 {dis} [clk]
    [e7] nav "Main" 120,88 240x972
      [e8] lnk "Home" 132,100 216x40 {sel} [clk]

Full schema: schema/cup.schema.json |
Compact format spec: schema/compact.md

Benchmark

Captured UI on a text heavy wikipedia page, using every format.

CUP Format Benchmark

Why CUP?

Every platform exposes UI accessibility differently. Windows uses UIA with ~40 ControlTypes, macOS has AXUIElement with its own role system, Linux uses AT-SPI2 with 100+ roles, and the web has ~80 ARIA roles.

Today, every agent framework reinvents this translation layer independently. CUP solves it once at the representation level:

  • One format everywhere - write agent logic once, run it on any platform
  • Built for LLMs - compact encoding fits complex UIs into context windows at ~15x fewer tokens than the next closest format
  • Built for actions - 15 canonical verbs that map to native platform APIs
  • No information loss - raw native properties preserved via node.platform.*

Roles

59 ARIA-derived roles:

alert alertdialog application banner button cell checkbox columnheader combobox complementary contentinfo dialog document form generic grid group heading img link list listitem log main marquee menu menubar menuitem menuitemcheckbox menuitemradio navigation none option progressbar radio region row rowheader scrollbar search searchbox separator slider spinbutton status switch tab table tablist tabpanel text textbox timer titlebar toolbar tooltip tree treeitem window

Role mappings: schema/mappings.json

States

16 state flags (only truthy/active states are listed, absence = default):

busy checked collapsed disabled editable expanded focused hidden mixed modal multiselectable offscreen pressed readonly required selected

Actions

The protocol defines 15 canonical action verbs, the vocabulary for what an agent can do with an element. The protocol specifies the names and semantics; SDKs provide the actual execution against native platform APIs.

Action Parameters Description
click - Click/invoke the element
collapse - Collapse an expanded element
decrement - Decrement a slider/spinbutton
dismiss - Dismiss a dialog/popup
doubleclick - Double-click
expand - Expand a collapsed element
focus - Move keyboard focus to the element
increment - Increment a slider/spinbutton
longpress - Long-press (touch/mobile interaction)
rightclick - Right-click (context menu)
scroll direction: str Scroll container (up/down/left/right)
select - Select an item in a list/tree/tab
setvalue value: str Set element value programmatically
toggle - Toggle checkbox or switch
type value: str Type text into a field

Session-level actions (not element-scoped):

Action Parameters Description
press_keys keys: str Send a keyboard shortcut
wait ms: int Wait/delay between actions in a batch

SDKs

SDKs implement the protocol. They capture native accessibility trees, normalize them into CUP format, execute actions, and optionally expose everything through MCP servers for AI agent integration.

Language Repository Package
Python python-sdk pip install computeruseprotocol
TypeScript typescript-sdk npm install computeruseprotocol

Building your own SDK? All you need is this spec. Implement tree capture for your target platform, normalize into the CUP schema, and you're compatible with every tool in the ecosystem.

Documentation

Contributing

CUP is in early development (v0.1.0). Contributions to the specification are welcome, especially:

  • New role or action proposals with cross-platform mapping rationale
  • Platform mapping improvements in schema/mappings.json
  • Schema documentation and examples

For SDK contributions (bug fixes, new platform adapters, etc.), see the language-specific repos above.

See CONTRIBUTING.md for guidelines.

License

MIT

Reviews (0)

No results found