computer.cpp
Health Uyari
- License — License: MIT
- Description — Repository has a description
- Active repo — Last push 0 days ago
- Low visibility — Only 5 GitHub stars
Code Basarisiz
- rm -rf — Recursive force deletion command in .github/workflows/release.yml
- rm -rf — Recursive force deletion command in scripts/build-common.sh
- rm -rf — Recursive force deletion command in scripts/create-local-codesign-identity.sh
Permissions Gecti
- Permissions — No dangerous permissions requested
Bu listing icin henuz AI raporu yok.
Turn any desktop app into an agent-ready API.

computer.cpp
Most software was not built for agents.
It was built for humans staring at windows.
A human can open an app, understand what is on screen, click the right thing,
type the right text, wait for the UI to settle, and verify that the job is
done.
Agents need tools, APIs, schemas, results, and something callable. Most desktop
apps do not provide that.
computer.cpp gives agents the missing bridge.
Write one Lua file that describes what a desktop app can do:
add-reminder
complete-reminder
summarize-list
extract-visible-rows
approve-invoice
update-customer-record
submit-form
computer.cpp turns that into:
CLI commands
local HTTP endpoints
typed input and output schemas
sync and async operations
progress updates
cancellation
trace logs
screenshots and artifacts
bounded model tool calls
agent-friendly MCP server
The desktop app does not need to expose an API. The app vendor does not need to
ship an SDK. The workflow does not need to live in a browser.
If a human can operate it on screen, computer.cpp can help make it
programmable.
Table Of Contents
- The Magic
- What Actually Happens
- Why This Exists
- Desktop Apps For Agents
- CLI, HTTP, And MCP
- Examples
- Quick Start
- Define An App API
- Operations
- Micro-Agents
- Core Desktop Control
- LLM Configuration
- Lua Scripts
- Protocol
- Tracing And Artifacts
- Security Model
- Why C++?
- Project Status
- Alternatives And Comparisons
- Philosophy
- Community And Contributions
- Stargazers
- License
The Magic
The outside world sees a clean command:
POST /commands/add-reminder
Inside, computer.cpp can run a tiny desktop micro-agent that sees the screen
and uses bounded keyboard, mouse, screenshot, and app-specific tools.
local ac = require("computer_cpp")
local app = ac.app.define({
name = "mac.reminders",
title = "macOS Reminders",
version = "1.0.0",
})
local add_reminder_agent = ac.micro_agent.define({
name = "reminders.add",
system = [[
You operate the visible macOS Reminders window.
Add the requested reminder and verify that it appears.
Use only screenshots and the provided tools.
Click, type, wait, and verify from visible screen evidence.
Call blocked if the screen is not usable.
]],
tools = {
ac.tools.screenshot({ focusApp = "Reminders", frontmostWindowOnly = true }),
ac.tools.click_box({ focusApp = "Reminders" }),
ac.tools.type_text({ focusApp = "Reminders" }),
ac.tools.press_key({ focusApp = "Reminders" }),
ac.tools.wait_stable(),
ac.tools.blocked(),
ac.tool.define("confirm_created", {
description = "Confirm the requested reminder is visible",
input = {
title = { type = "string", required = true },
evidence = { type = "string", required = true },
},
}),
},
})
app:command("add-reminder", {
description = "Add a reminder to a list",
input = {
list = { type = "string", required = true },
title = { type = "string", required = true },
notes = { type = "string", default = "" },
},
output = {
created = { type = "boolean" },
title = { type = "string" },
evidence = { type = "string" },
},
handler = function(ctx, args)
ac.app.launch("Reminders")
ac.wait_frontmost("Reminders", { timeoutMs = 5000 })
local evidence = nil
local result = add_reminder_agent:run_loop(ctx, {
goal = "Add reminder: " .. args.title,
max_steps = 12,
state = args,
on_tool_call = {
confirm_created = function(call)
if call.args.title ~= args.title then
return ac.tool_result.error({
code = "wrong_title",
message = "Visible reminder title did not match",
})
end
evidence = call.args.evidence
return ac.tool_result.done({ created = true })
end,
},
})
if not result.ok then
error(result.error and result.error.message or "add reminder failed")
end
return {
created = true,
title = args.title,
evidence = evidence,
}
end,
})
return app
From that one file:
computer.cpp app run ./reminders.lua add-reminder \
--list Today \
--title "Review release notes"
POST /commands/add-reminder
POST /mcp
The MCP server generates agent-friendly tools from the same Lua app definition,
so the desktop app can be exposed through CLI, HTTP, and MCP without writing a
custom server for each app.
Now Reminders is not just a GUI app. It is a command-line tool and a local HTTP
API, and an MCP server agents can use.
What Actually Happens
The command looks simple:
add-reminder("Today", "Review release notes")
The runtime can do the messy desktop work underneath:
focus the app
take a screenshot
ask the model what it sees
click the toolbar plus button
type the title
wait for the UI to settle
take another screenshot
verify the reminder is visible
return a typed result
save the trace
The model does not get arbitrary control. It gets bounded tools. The Lua
command owns the schema, state, validation, retries, progress, final result, and
proof.
The caller never sees:
click(...)
type(...)
screenshot(...)
scroll(...)
The caller sees:
add-reminder(...)
complete-reminder(...)
summarize-list(...)
extract-visible-rows(...)
approve-invoice(...)
computer.cpp does not merely automate desktop apps. It turns desktop apps into
programmable infrastructure.
Why This Exists
There is a huge amount of useful software that agents cannot directly use. Not
because the software is impossible to operate, but because it was built for
humans.
A person can sit in front of a screen and work through:
a native productivity app
a finance system
a medical scheduling app
a thick-client enterprise tool
a remote desktop workflow
an installer
an internal operations console
a legacy application with no API
An agent needs a callable interface. computer.cpp creates that interface. It
wraps GUI-only work behind commands, schemas, results, traces, and operations.
The implementation can be ugly. The API should not be.
Desktop Apps For Agents
AI agents are good at calling tools. Most desktop apps are not tools. They are
human interfaces.
computer.cpp turns a desktop app into something an agent can use:
desktop app
-> Lua app definition
-> CLI / HTTP / MCP
-> agent-callable API
An agent can call:
add-reminder(list="Today", title="Review release notes")
instead of trying to reason about:
take screenshot
find plus button
click coordinate
type title
press escape
take another screenshot
verify visually
The second sequence may still happen internally. The agent sees the first one.
CLI, HTTP, And MCP
computer.cpp exposes the same desktop app API in multiple ways.
CLI
Use the CLI for local automation, scripts, tests, and agents that call shell
commands:
computer.cpp app run ./reminders.lua add-reminder \
--list Today \
--title "Review release notes"
Local HTTP API
Use HTTP when you want a normal local service interface:
GET /health
GET /schema
POST /commands/add-reminder
POST /commands/add-reminder?async=true
GET /operations/op_123
GET /operations/op_123/result?wait=30
POST /operations/op_123:cancel
When binding outside localhost, app serve requires --auth-token-env so the
HTTP API is not exposed without a bearer token.
MCP Server
MCP is becoming a standard way for agents to
discover and call tools.
app serve also exposes a Streamable HTTP MCP endpoint at /mcp. The endpoint
uses JSON-RPC over HTTP POST and returns JSON responses. It does not require TLS
itself; put Caddy or another reverse proxy in front when exposing it over
HTTPS.
The MCP endpoint is stateless: it does not allocate MCP session ids and does
not open SSE streams. MCP GET requests to /mcp return 405 Method Not Allowed; clients should use the JSON response path over HTTP POST.
computer.cpp app serve ./reminders.lua --listen 127.0.0.1:8787
POST /mcp
The MCP server turns a Lua app definition into app-level tools such as:
add-reminder
complete-reminder
summarize-list
instead of raw desktop primitives like:
click
type
screenshot
scroll
Supported MCP methods include:
initialize
notifications/initialized
ping
tools/list
tools/call
The MCP tool schemas come from the command input and output schemas in the
Lua app definition. Tool calls return both structuredContent and a JSON text
content block for clients that prefer either form.
HTTP MCP requests should include:
Accept: application/json, text/event-stream
Content-Type: application/json
MCP-Protocol-Version: 2025-11-25
MCP-Protocol-Version is negotiated by initialize and should be sent on
subsequent requests. The server supports the current 2025-11-25 revision and
keeps compatibility with 2025-06-18 and 2025-03-26 clients for the tool
surface implemented here.
When exposing /mcp through a reverse proxy, set a bearer token and allow the
browser origins that should be able to reach the endpoint:
export COMPUTER_CPP_APP_TOKEN='change-me'
computer.cpp app serve ./reminders.lua \
--listen 127.0.0.1:8787 \
--auth-token-env COMPUTER_CPP_APP_TOKEN \
--allowed-origin https://mcp.example.com
The source of truth is the Lua app definition:
one Lua app definition
-> CLI
-> HTTP API
-> MCP server
-> async operations
-> schemas
-> traces
Examples
Personal productivity:
POST /commands/add-reminder
POST /commands/complete-reminder
POST /commands/summarize-list
Business operations:
POST /commands/extract-visible-invoices
POST /commands/approve-invoice
POST /commands/update-customer-record
POST /commands/export-report
Internal tools:
POST /commands/open-case
POST /commands/summarize-visible-record
POST /commands/fill-required-fields
POST /commands/submit-form
These are not generic computer-use actions. They are app APIs.
Quick Start
Install the local build and Lua runtime dependencies. On macOS with Homebrew:
brew install cmake ninja wxwidgets lua
Build and run the test suite:
./scripts/build-mac.sh --verify
For a faster tray-app development loop, build only ComputerCpp.app and
relaunch it:
./scripts/build-mac.sh --launch
The macOS build script uses Ninja and writes build outputs to:
build/debug-ninja/computer.cpp
build/debug-ninja/ComputerCpp.app
For a cross-platform CMake wrapper, use scripts/build.sh. It accepts a
generator explicitly, so Ninja users can run:
./scripts/build.sh --verify --generator Ninja --build-dir build/debug-ninja
If CMake was already configured with another generator, add --reconfigure to
recreate the build directory.
Manual CMake fallback:
cmake -S . -B build/debug-ninja -G Ninja \
-DCMAKE_BUILD_TYPE=Debug \
-DBUILD_TESTING=ON \
-DCMAKE_PREFIX_PATH="$(brew --prefix)"
cmake --build build/debug-ninja --target all --config Debug
ctest --test-dir build/debug-ninja --output-on-failure
For release-style local builds on macOS:
./scripts/build-mac.sh --release --reconfigure
The macOS build script also creates a reusable local signing identity before
the first signed build if you do not already have an Apple Development or
Developer ID Application certificate. You can create or refresh that identity
manually with:
./scripts/create-local-codesign-identity.sh
On macOS, the tray app also needs a stable code-signing identity before asking
for Accessibility and Screen Recording permissions. TCC records permissions
against the app's code identity, so rebuilding with a different ad-hoc or
regenerated certificate can leave stale privacy rows or prevent the app from
appearing in System Settings.
The script is intentionally idempotent. If ComputerCpp Local Code Signing
already exists in the login keychain, it reuses that identity instead of
generating a new one. That keeps the local TCC identity stable across rebuilds.
COMPUTER_CPP_CODE_SIGN_IDENTITY=auto is the default. On macOS it prefers an
Apple Development or Developer ID Application certificate when one is available,
then falls back to ComputerCpp Local Code Signing, and finally to ad-hoc
signing. Ad-hoc signing is not recommended for macOS permission onboarding.
Launch the tray app from the build directory:
open -n build/debug-ninja/ComputerCpp.app
Use the tray menu's Permissions item to grant and verify macOS permissions.
The panel has separate rows for Accessibility and Screen Recording:
- Click
Requestin the Accessibility row. macOS opens Privacy & Security.
EnableComputerCpp, return to the permission panel, then clickTest. - Click
Requestin the Screen Recording row. If macOS does not addComputerCppautomatically, use the+button in Screen Recording and
select the running build artifact shown below. Return to the permission panel
and clickTest. - When both rows are granted, use
Restart ComputerCppif macOS asks for a
restart. If the permissions get wedged after rebuilds, useReset Permissions && Restart.
After granting Screen Recording, macOS may ask to quit and reopen the app. If
the app does not visibly return, check the tray icon or run:
pgrep -af ComputerCpp
./build/debug-ninja/computer.cpp permissions
If Screen Recording does not add ComputerCpp to the list, use the + button
in System Settings and select the running build artifact:
build/debug-ninja/ComputerCpp.app
If permissions get stuck after changing bundle paths, bundle ids, or signing
identities, quit the tray app and do a service-wide reset for the two privacy
services before trying again:
pkill -x ComputerCpp 2>/dev/null || true
tccutil reset Accessibility
tccutil reset ScreenCapture
open -n build/debug-ninja/ComputerCpp.app
For a public downloadable macOS binary, use Developer ID signing and
notarization. The self-signed identity is for local source builds; it is not a
replacement for Developer ID distribution.
Check permissions and capabilities:
./build/debug-ninja/computer.cpp permissions
./build/debug-ninja/computer.cpp capabilities
Run the macOS Reminders example schema:
./build/debug-ninja/computer.cpp --json app run examples/mac/reminders.lua
Run a command:
./build/debug-ninja/computer.cpp --json app run examples/mac/reminders.lua \
add-reminder \
--list Today \
--title "Review release notes"
Serve it over HTTP:
./build/debug-ninja/computer.cpp app serve examples/mac/reminders.lua \
--listen 127.0.0.1:8787
Call it:
curl -X POST http://127.0.0.1:8787/commands/add-reminder \
-H 'Content-Type: application/json' \
-d '{"list":"Today","title":"Review release notes"}'
Use it as an MCP server:
curl -X POST http://127.0.0.1:8787/mcp \
-H 'Accept: application/json, text/event-stream' \
-H 'Content-Type: application/json' \
-d '{"jsonrpc":"2.0","id":1,"method":"initialize","params":{"protocolVersion":"2025-11-25","capabilities":{},"clientInfo":{"name":"curl","version":"1.0.0"}}}'
See examples/mac for a complete macOS Reminders API that lists,
adds, completes, and summarizes reminders through the real desktop app.
Define An App API
A computer.cpp app is a Lua file that returns an app definition.
local ac = require("computer_cpp")
local app = ac.app.define({
name = "demo.notes",
title = "Demo Notes",
version = "1.0.0",
description = "Desktop API for a notes app.",
})
app:command("create-note", {
description = "Create a note",
input = {
title = { type = "string", required = true },
body = { type = "string", default = "" },
},
output = {
created = { type = "boolean" },
title = { type = "string" },
},
handler = function(ctx, args)
ctx:progress({ step = "opening_app" })
-- Use snapshots, clicks, typing, screenshots, deterministic Lua,
-- or bounded model tool-call loops here.
return {
created = true,
title = args.title,
}
end,
})
return app
Run a command from the CLI:
computer.cpp app run ./notes.lua create-note \
--title "Draft release note" \
--body "..."
Or serve the app as a local HTTP API:
computer.cpp app serve ./notes.lua --listen 127.0.0.1:8787
curl http://127.0.0.1:8787/schema
curl -X POST http://127.0.0.1:8787/commands/create-note \
-H 'Content-Type: application/json' \
-d '{"title":"Draft release note","body":"..."}'
Or expose it as an MCP server through the same HTTP service:
curl -X POST http://127.0.0.1:8787/mcp \
-H 'Accept: application/json, text/event-stream' \
-H 'Content-Type: application/json' \
-H 'MCP-Protocol-Version: 2025-11-25' \
-d '{"jsonrpc":"2.0","id":2,"method":"tools/list","params":{}}'
From this one definition, computer.cpp can generate:
CLI help
CLI argument parsing
HTTP schema
HTTP input validation
HTTP output validation
command dispatch
sync execution
async execution
operation storage
progress updates
trace logging
MCP tool schemas
MCP server dispatch
The MCP server uses the same command definitions as the CLI and HTTP surfaces.
Operations
Commands can run synchronously or asynchronously.
Sync is default:
computer.cpp app run ./app.lua summarize-visible-items --limit 20
Async is explicit:
computer.cpp app run ./app.lua summarize-visible-items --limit 20 --async
Inspect async operations from the CLI:
computer.cpp app operation get ./app.lua op_01jabc
computer.cpp app operation result ./app.lua op_01jabc --wait 30
computer.cpp app operation cancel ./app.lua op_01jabc
HTTP follows the same model:
POST /commands/summarize-visible-items
POST /commands/summarize-visible-items?async=true
GET /operations/op_01jabc
GET /operations/op_01jabc/result?wait=30
POST /operations/op_01jabc:cancel
Statuses are:
pending
running
succeeded
failed
cancelled
Long-running desktop work should be inspectable, cancellable, traceable, and
easy to call.
Micro-Agents
computer.cpp supports small, bounded model-driven loops for narrow desktop
tasks.
A micro-agent is not a general planner. It does one thing:
read visible rows
extract candidate cards
identify the active modal
verify that a record was saved
find the submit button
Micro-agents use real model tool calls with JSON schemas. They can call
standard tools like:
screenshot
click_box
scroll_down
scroll_up
press_key
type_text
wait
wait_stable
done
blocked
They can also report semantic app-specific data through tools like:
report_visible_rows
report_invoice_fields
report_visible_state
confirm_saved_record
The model does not return fake JSON in normal text. It calls real tools.computer.cpp validates the arguments, dispatches the tools, records the trace,
and returns tool results.
Core Desktop Control
computer.cpp is also a local desktop automation daemon and CLI. It exposes a
small JSON protocol for desktop control on macOS, Linux, and Windows:
accessibility snapshots, screenshots, input, window management, leases,
clipboard access, image utilities, and optional LLM calls.
Desktop-affecting commands are protected by a control-session lease. Acquire a
lease directly or run a command under session run:
computer.cpp session acquire --owner local --purpose smoke
computer.cpp session run --owner local --purpose smoke -- /bin/echo ok
Targets are resolved through one of these forms:
@ref
point:x,y
rect:left,top,right,bottom
role:button[name="Save"]
Use snapshot --with-bounds and target find role ... to discover actionable
accessibility refs.
Common commands:
computer.cpp ping
computer.cpp capabilities
computer.cpp schema
computer.cpp permissions
computer.cpp state
computer.cpp snapshot --interactive --with-bounds
computer.cpp screenshot /tmp/screen.png --max-dim 1200
computer.cpp image info /tmp/screen.png
computer.cpp image split /tmp/tall.png --chunk-height 900 --overlap 80
Input and window commands:
computer.cpp click @e1
computer.cpp click point:500,400
computer.cpp click rect:10,20,110,70
computer.cpp mouse move 500 400 --duration-ms 250
computer.cpp mouse drag 100 100 300 300 --button left
computer.cpp scroll -600 0 --at role:scrollarea
computer.cpp press "Cmd+L"
computer.cpp type "hello" --paste
computer.cpp window list Finder
computer.cpp window bounds 100 100 1200 800
Observation commands record input events and sampled screenshot frames:
computer.cpp observe events 20
computer.cpp observe frames last 10
Wait commands support app focus and screen stability:
computer.cpp wait --frontmost Finder --timeout-ms 5000
computer.cpp wait --stable-screen 750 --timeout-ms 5000
Clipboard commands:
computer.cpp clipboard read
computer.cpp clipboard write "hello"
computer.cpp clipboard paste
LLM Configuration
LLM calls use one canonical user config file. The tray settings window and thecomputer.cpp config CLI commands both edit the same config.toml.
CLI and TOML
computer.cpp config path
computer.cpp config init
computer.cpp config set-provider openrouter --type openrouter --api-key-stdin
computer.cpp config set-profile main --provider openrouter --model openai/gpt-4.1-mini \
--temperature 0.2 --max-output-tokens 1200 --default
computer.cpp config test
Use computer.cpp config open to open the editable TOML file. The config stores
providers, profiles, model ids, timeouts, sampling defaults, OpenRouter routing
preferences, and provider API keys. config show redacts keys, and the file is
created in the platform user config directory. On macOS/Linux it is written
owner-read/write only.
A minimal OpenRouter config looks like this:
version = 1
default_profile = "openrouter"
[providers.openrouter]
type = "openrouter"
base_url = "https://openrouter.ai/api/v1"
api_key = "replace-with-your-key"
[profiles.openrouter]
provider = "openrouter"
model = "openai/gpt-4.1-mini"
temperature = 0.2
max_output_tokens = 1200
timeout_ms = 180000
[profiles.openrouter.openrouter.provider]
allow_fallbacks = true
order = ["openai"]
For a local or OpenAI-compatible endpoint, use type = "openai-compatible" and
set base_url to the endpoint's /v1 URL. Omit api_key when the endpoint is
local or otherwise does not require a key.
computer.cpp config set-provider local-llm \
--type openai-compatible \
--base-url http://127.0.0.1:8000/v1 \
--no-api-key
computer.cpp config set-profile main \
--provider local-llm \
--model qwen36-27b \
--default
Legacy LLM environment variables are only a one-time import path:
computer.cpp config import-env
After import, edit the config file, tray settings, or computer.cpp config
commands instead of setting env vars at launch time.
Tray Settings
Launch the tray app and choose Settings... from the tray menu. The settings
window edits the same config.toml file:
Providersdefines endpoint names, provider type, base URL, and API key.
ChooseOpenRouterforhttps://openrouter.ai/api/v1, orOpenAI-compatiblefor local and compatible/v1endpoints. CheckNo API key requiredfor local endpoints that accept unauthenticated calls.Profilesdefines the active model settings. Pick a provider, set the model
id, optional temperature, top-p, max token, timeout, extra request params, and
optional OpenRouter routing JSON. UseSet Activeto make a profile the
default andTest Inferenceto verify it.Configshows the config file path.Open Configopens the TOML file in the
default editor,Reloaddiscards unsaved UI changes, andSave Changes
writes the TOML file.
Lua Scripts
Lua scripts can call the same daemon surface through ac:
local ac = require("computer_cpp")
ac.snapshot({ interactive = true, bounds = true })
ac.click("role:button[name=\"Save\"]")
ac.wait_frontmost("Finder", { timeoutMs = 5000 })
ac.screenshot("/tmp/screen.png", { maxDimension = 1200 })
Run scripts with:
computer.cpp run --owner local --purpose script ./script.lua
computer.cpp run --dry-run ./script.lua
Protocol
Requests are JSON objects with a method and optional params. Responses useok, data, error, and code fields.
{"method":"ping","params":{}}
Batch requests run multiple steps through the same control-session gate:
printf '[{"method":"ping","params":{}}]' | computer.cpp --json batch
Tracing And Artifacts
Every app command execution can be traceable.
A trace may include:
command input
progress updates
screenshots
model requests
model tool calls
tool results
desktop actions
final result
error or cancellation
timing
artifacts
Normal command results should stay small. Traces and artifacts are for
debugging, verification, replay, and improving app wrappers.
Use --trace to include an execution trace in JSON output, or --trace-dir to
write the trace as JSONL:
computer.cpp --json app run ./app.lua command-name --trace
computer.cpp --json app run ./app.lua command-name --trace-dir ./traces
Security Model
computer.cpp is a local automation tool, not a remote SaaS control plane.
Default posture:
local daemon
local socket
localhost HTTP by default
desktop-control leases
explicit permissions
traceable operations
auth required when binding beyond localhost
Important notes:
A process with a control-session token can perform real desktop actions.
Localhost HTTP serving is intended for local development/control.
Do not expose the HTTP server broadly without authentication and a proper network boundary.
Screenshots and traces may contain sensitive data.
Model-backed commands may send screenshots or text to a configured model provider.
The tool is powerful because it can operate the real desktop. Use it with the
same care you would use for any local automation system that can click, type,
read screenshots, and access the clipboard.
Why C++?
This project has to touch the real computer.
Screenshots, input injection, window state, accessibility snapshots, clipboard
behavior, display geometry, native app focus, and desktop permissions all live
at the operating system boundary.
A CMake-based C++ project can compile close to the metal, link directly against
OS APIs, and run as a small local binary. On macOS, desktop automation means
talking to frameworks like AppKit, CoreGraphics, Accessibility,
ScreenCaptureKit, and the system clipboard. On Linux, it can mean X11, XTest,
Wayland/KWin helpers, desktop portals, or other platform-specific adapters. On
Windows, it means Win32, UI Automation, input APIs, window handles, sessions,
and desktop permissions.
Lua sits on top because app APIs need to be easy to define. C++ sits underneath
because the computer has to actually move.
Project Status
Current implementation status:
macOS: primary and most complete backend
Linux: adapter work exists; support is partial and depends on native dependencies
Windows: adapter work exists; support is evolving
MCP: supported through Lua app definitions
The current macOS backend includes native desktop control, permissions,
screenshots, accessibility snapshots, window/app state, input actions, optional
LLM calls, Lua app definitions, local HTTP serving, async operation records,
MCP serving, tracing, and the Reminders example.
Linux and Windows support should be treated as evolving.
Alternatives And Comparisons
Cua vs computer.cpp
Cua is a broader computer-use stack for agents.
It includes components for controlling desktops, working with sandboxes,
exposing tools, running agents, and building computer-use workflows.
Cua asks:
How do we give agents access to computers?
computer.cpp asks:
How do we turn this desktop app or workflow into a callable API?
A Cua-style system gives an agent ways to inspect and operate a computer:
windows, screenshots, accessibility state, mouse, keyboard, tools, and
execution environments.
computer.cpp lets a developer wrap a specific desktop app as a semantic API:
POST /commands/add-reminder
POST /commands/approve-invoice
POST /commands/extract-visible-rows
GET /operations/op_123/result
The app may still be controlled through screenshots, clicks, typing, keyboard
shortcuts, accessibility snapshots, or model vision internally. Those details
stay behind the command boundary.
The public interface is not "click this coordinate."
The public interface is "perform this app operation."
Cua gives agents computers.
computer.cpp gives desktop apps APIs.
They can be complementary: a lower-level computer-use driver can provide
desktop control, while computer.cpp defines the semantic app contract,
schemas, operations, traces, and API surface.
PyAutoGUI vs computer.cpp
PyAutoGUI is a classic Python desktop
automation library. It can move the mouse, type text, press keys, take
screenshots, and locate images on screen.
That is low-level desktop scripting.
computer.cpp lets developers wrap a GUI app as a typed command surface.
PyAutoGUI-style code says:
click(x, y)
write("hello")
press("enter")
computer.cpp says:
POST /commands/add-reminder
The implementation may still click, type, wait, and screenshot internally. The
caller gets a semantic command and a typed result.
PyAutoGUI helps you automate a screen.
computer.cpp helps you publish an API for a desktop workflow.
SikuliX vs computer.cpp
SikuliX is a visual automation tool built around
image recognition: find this image, click that region, wait for the UI to
change.
That can be useful for visual desktop automation and GUI testing.
computer.cpp can use visual information too, but not as the public
abstraction.
A visual macro is usually a sequence of UI actions.
A computer.cpp app definition is an API contract:
app:command("summarize-visible-items", {
input = {
limit = { type = "integer", default = 20 },
},
output = {
items = { type = "array" },
summary = { type = "string" },
},
handler = function(ctx, args)
return my_app.summarize_visible_items(ctx, args)
end,
})
From that one definition, computer.cpp can provide:
computer.cpp app run ./my-app.lua summarize-visible-items --limit 20
and:
POST /commands/summarize-visible-items
Visual macros automate steps.
computer.cpp defines callable app behavior.
AutoHotkey vs computer.cpp
AutoHotkey is excellent for Windows hotkeys,
macros, and personal automation.
It is great when a human wants to customize their own machine.
computer.cpp is aimed at a different layer: turning desktop workflows into
programmatic APIs that agents, scripts, and services can call.
AutoHotkey scripts typically expose behavior through hotkeys or script entry
points.
computer.cpp exposes behavior through typed commands, schemas, CLI, HTTP, MCP,
async operations, and traceable results.
AutoHotkey is personal automation.
computer.cpp is an app API runtime.
agent-computer-use vs computer.cpp
agent-computer-use, also
known as agent-cu, focuses on accessibility-based desktop automation.
Accessibility-first tools are useful when the target app exposes a reliable
accessibility tree. They can provide deterministic element references, labels,
roles, and structured UI state.
Accessibility is powerful when it works.
But many real workflows involve apps or surfaces where accessibility is
incomplete, stale, misleading, unavailable, or simply not the right abstraction:
custom-rendered UIs
remote desktops
legacy enterprise software
canvas apps
installers
GPU-heavy views
broken Electron apps
mixed workflows across multiple apps
computer.cpp can use accessibility snapshots internally when they help.
But it does not require the public API to mirror the accessibility tree.
The goal is not to expose UI elements.
The goal is to expose useful app operations:
POST /commands/extract-visible-invoices
POST /commands/approve-invoice
POST /commands/update-customer-record
The implementation can use accessibility, screenshots, keyboard shortcuts,
model vision, or direct input.
The caller should not care.
NIB / nut.js vs computer.cpp
nut.js and
NIB provide desktop automation tools for mouse,
keyboard, screen, windows, and agent-facing CLI usage.
They are useful when you want a JavaScript or Node-oriented desktop automation
stack.
computer.cpp has a different center of gravity.
It is not a Node package and not just an agent CLI for desktop actions.
It is a C++ desktop app API runtime.
The goal is not only:
let an agent click and type
The goal is:
let a developer define a semantic API for a desktop app
That API can then be exposed through CLI, HTTP, and MCP with schemas,
operations, results, traces, and artifacts.
computer-use-mcp vs computer.cpp
computer-use-mcp is an MCP
server/client for controlling a desktop computer with AI agents. It exposes
tools like screenshots, mouse, keyboard, clipboard, app management, and window
targeting.
That is useful when your primary goal is MCP-based computer control.
computer.cpp exposes MCP too, but MCP is not the core abstraction.
The core abstraction is the app API definition.
one Lua file
-> semantic commands
-> CLI
-> HTTP
-> MCP
-> async operations
-> traces
computer-use-mcp exposes computer-control tools to agents.
computer.cpp helps developers expose app-specific commands to agents.
Agents should not always be asked to operate a desktop at the level of pixels
and clicks.
They should be able to call:
approve-invoice
extract-visible-rows
summarize-list
complete-reminder
Playwright / Puppeteer / Selenium vs computer.cpp
Browser automation tools are the right answer when the workflow lives inside a
browser and the DOM or browser protocol is available.
Examples:
Use browser automation for browser-native work.
Use computer.cpp when the workflow lives on the desktop:
native apps
desktop software
system dialogs
installers
remote desktops
legacy enterprise tools
custom internal applications
apps with no official API
mixed workflows across multiple apps
personal productivity apps
Browser automation gives you a browser API.
computer.cpp helps you build APIs for GUI workflows that do not already have
one.
OpenAI / Anthropic computer use vs computer.cpp
Model providers increasingly expose computer-use capabilities.
Examples:
Those systems help models reason about screens and produce actions.
computer.cpp is different.
It is the local runtime for making desktop workflows callable, traceable, and
reusable.
A model may be used inside a computer.cpp command, but the public contract is
still the app command:
POST /commands/summarize-visible-items
not:
here is a screenshot, decide what to click next
Model computer use is a reasoning capability.
computer.cpp is an app API layer for real desktop software.
UiPath / Power Automate / traditional RPA vs computer.cpp
Traditional RPA platforms such as UiPath,
Microsoft Power Automate,
Automation Anywhere, and
Blue Prism are full workflow systems.
They often include recorders, schedulers, credential vaults, queues, dashboards,
approvals, governance, and enterprise administration.
computer.cpp is smaller and more developer-native.
It does not try to be a full RPA platform.
It gives developers a runtime for defining semantic commands over desktop
workflows, then exposing them through CLI, HTTP, and MCP.
RPA asks:
How do we manage a fleet of business-process bots?
computer.cpp asks:
How do we turn this desktop app or workflow into a callable API?
That makes it useful as a local substrate, an agent tool, an internal
automation layer, or the foundation for more opinionated systems.
Bespoke Agents vs computer.cpp
Bespoke agents are powerful. For hard workflows, specific code often beats
generic frameworks.
computer.cpp preserves that.
The app logic is still bespoke. The Lua file can define exactly how one app or
workflow should behave.
But computer.cpp standardizes the boring parts:
CLI
HTTP server
MCP server
input validation
output validation
operation ids
async execution
result retrieval
cancellation
progress updates
trace logs
screenshots
artifacts
model tool calling
standard desktop tools
So each app can be custom where it matters, without rebuilding the runtime
every time.
Bespoke behavior.
Standard runtime.
Raw Desktop Endpoints vs computer.cpp
A desktop automation server could expose endpoints like:
POST /click
POST /type
POST /scroll
POST /screenshot
computer.cpp intentionally avoids that as the default public API.
Raw desktop primitives are internal tools.
The public API should describe what the app does:
POST /commands/add-reminder
POST /commands/complete-reminder
POST /commands/extract-visible-rows
POST /commands/approve-invoice
This makes the API stable even if the implementation changes.
Today a command might use screenshots and mouse clicks.
Tomorrow it might use keyboard shortcuts, accessibility, a model tool call, or a
better app-specific strategy.
The caller should not care.
Philosophy
The API is semantic.
The implementation can be ugly.
Desktop apps are messy. They have modals, weird focus behavior, stale screens,
broken accessibility trees, remote views, unexpected dialogs, and no official
APIs.
computer.cpp gives developers a way to wrap that mess in a clean command
surface:
define the app API once
expose it through CLI, HTTP, and MCP
trace every operation
keep low-level desktop tools internal
turn GUI-only work into API-callable work
Community And Contributions
If you are building desktop agents, app wrappers, local automation tools, or
computer-use infrastructure, you are invited to build with us.
You can absolutely build your own thing on top of computer.cpp. The reason to
contribute reusable pieces upstream is leverage.
When a fix or helper lands here, you no longer have to carry it alone. Other
people can test it on platforms, displays, apps, and edge cases you do not have.
Your work becomes part of the shared runtime, your use case influences the
direction of the project, and your contribution becomes visible proof of the
problem you solved.
That is good for you and good for the project:
less private maintenance
more review and testing
public credit for useful work
a stronger foundation for your own products
faster progress on the boring runtime layer
more time for the app-specific work that makes your project unique
The best place to compete is at the product and workflow layer. The best place
to cooperate is the substrate: screenshots, input, accessibility, leases, app
schemas, operation tracking, MCP, HTTP, traces, and cross-platform desktop
behavior.
Good contributions include:
Lua app definitions for real desktop apps
platform backend fixes
better accessibility targeting
more reliable screenshot and input behavior
MCP, HTTP, and CLI improvements
docs, examples, and tests
bug reports with clear reproduction steps
small helper APIs that remove repeated wrapper code
Private workflows can stay private. If you automate an internal app, you may not
be able to share the app logic, screenshots, or data. That is fine. The reusable
parts are still valuable: selectors, retries, validation helpers, trace patterns,
micro-agent tools, platform fixes, and lessons learned from real failure modes.
Open a pull request, start a discussion, or publish a small example. If you are
building something adjacent, the door is open. A healthier desktop-agent
ecosystem is one where independent projects can still improve the common pieces
together.
Stargazers
If this project helps, star the repository so other people can find it.
License
MIT. See LICENSE.
Yorumlar (0)
Yorum birakmak icin giris yap.
Yorum birakSonuc bulunamadi