clawdcursor
Health Gecti
- License — License: MIT
- Description — Repository has a description
- Active repo — Last push 0 days ago
- Community trust — 189 GitHub stars
Code Basarisiz
- rm -rf — Recursive force deletion command in docs/install.sh
Permissions Gecti
- Permissions — No dangerous permissions requested
Bu listing icin henuz AI raporu yok.
OS-agnostic, model-agnostic desktop automation server. Gives any AI agent eyes, hands, and ground-truth verification on Windows, macOS, and Linux.
Clawd Cursor
OS-level desktop automation server. Gives any AI model eyes, hands, and ears on a real computer.
Model-agnostic · Works with Claude, GPT, Gemini, Llama, or any tool-calling model · Free with local models
Website · Discord · Quick Start · Connect · API · Changelog
What's New in v0.8.0 — V2 Architecture
A vision-first alternative to the legacy cascade, opt in with --v2:
clawdcursor start --v2
- Ground-truth verifier — six independent signals (pixel diff, window state, focus change, OCR delta, task-type assertions, error-pattern detection). Independent of the agent, so it can't be fooled by "done" self-reports. Caught false positives in testing where the legacy pipeline reported
UNVERIFIED_SUCCESS. - Single vision-first agent loop — screenshot → tool call → new screenshot → repeat. 6-rule system prompt (down from 36). Works with Anthropic, OpenAI, OpenRouter, or anything with vision + tool calls.
- PlatformAdapter abstraction — platform-specific code now lives in
src/v2/platform/{macos,windows,linux}.tsbehind one interface. Replaces 142+ scatteredif (IS_MAC)branches across 34 files. Adding a new OS is a single file. - Legacy pipeline untouched —
clawdcursor start(no flag) behaves exactly as before. Zero breaking changes.
Full history in CHANGELOG.md.
What It Does
Clawd Cursor is a tool server. It wraps your desktop as 42 callable tools: mouse, keyboard, screen, windows, browser. Any AI that can call functions can use it.
Your AI → "Click the Send button" → find_element + mouse_click
Your AI → "What's on screen?" → desktop_screenshot + read_screen
Your AI → "Open Chrome to gmail" → open_app + navigate_browser
No app-specific integrations. No per-service API keys. If it's on screen, clawdcursor can interact with it.
Quick Start
Windows
powershell -c "irm https://clawdcursor.com/install.ps1 | iex"
clawdcursor start
macOS
curl -fsSL https://clawdcursor.com/install.sh | bash
clawdcursor grant # grant Accessibility + Screen Recording permissions
clawdcursor start
Linux
curl -fsSL https://clawdcursor.com/install.sh | bash
clawdcursor start
First run auto-detects your AI provider from environment variables. Or be explicit:
clawdcursor start --provider anthropic --api-key sk-ant-...
clawdcursor start --provider gemini # GEMINI_API_KEY in env
clawdcursor start # free with Ollama
See docs/MACOS-SETUP.md for macOS permission setup.
Connect
Three modes. Same 42 tools.
1. Built-in Agent (start)
Full autonomous agent. Send a task, get a result.
clawdcursor start
curl http://localhost:3847/task -d '{"task": "Open Notepad and write Hello"}'
2. Tools-Only Server (serve)
Exposes tools over REST. You bring the AI.
clawdcursor serve
curl http://localhost:3847/tools # discover tools
curl http://localhost:3847/execute/mouse_click -d '{"x":500,"y":300}'
3. MCP Mode (mcp)
MCP stdio server for Claude Code, Cursor, Windsurf, Zed.
// ~/.claude/settings.json
{
"mcpServers": {
"clawdcursor": {
"command": "node",
"args": ["/path/to/clawdcursor/dist/index.js", "mcp"]
}
}
}
Tools
42 tools across 6 categories:
| Category | Count | Examples |
|---|---|---|
| Perception | 9 | desktop_screenshot, read_screen, get_active_window, smart_read, ocr_read_screen |
| Mouse | 6 | mouse_click, mouse_double_click, mouse_drag, mouse_scroll |
| Keyboard | 5 | key_press, type_text, smart_type, shortcuts_list, shortcuts_execute |
| Window / App | 6 | focus_window, open_app, get_windows, invoke_element |
| Browser (CDP) | 10 | cdp_connect, cdp_click, cdp_type, cdp_read_text, cdp_evaluate |
| Orchestration | 6 | smart_click, navigate_browser, delegate_to_agent, wait |
Pipeline
Two pipelines ship side by side. Same 42 tools, same MCP interface — only the decision-maker differs.
V2 — vision-first (--v2)
Three stages, each does one thing:
┌──────────┐ ┌────────────────┐ ┌──────────────────────┐
│ Router │ → │ VisionAgent │ → │ GroundTruthVerifier │
│ │ │ │ │ │
│ regex │ │ screenshot │ │ pixel diff · window │
│ shortcut│ │ → tool call │ │ focus · OCR delta │
│ zero │ │ → screenshot │ │ task assertions │
│ LLM │ │ → repeat │ │ anti-patterns │
└──────────┘ └────────────────┘ └──────────────────────┘
Router handles trivial tasks ("open Safari") without a model. Everything else hits the VisionAgent (16 tools, 6-rule prompt, model-agnostic). The Verifier runs six independent checks against the screen after the agent claims done — so "done" has to be true, not just asserted.
Legacy — text-first cascade (default, no flag)
Cheapest-first. Kept for backwards compatibility.
L1.5 Deterministic flows → hardcoded sequences. Zero LLM.
L2 Skill Cache → learned action patterns. Zero LLM.
L2.5 OCR Reasoner → OS OCR + cheap text LLM. ~90% of tasks.
L2.5b A11y Reasoner → fallback when OCR is unavailable.
L3 Computer Use → vision model. Last resort.
API
http://localhost:3847
| Endpoint | Method | Description |
|---|---|---|
/tools |
GET | All tools in OpenAI function-calling format |
/execute/:name |
POST | Execute a tool |
/task |
POST | Submit a plain-English task |
/status |
GET | Agent state |
/screenshot |
GET | Current screen as PNG |
/task-logs |
GET | Recent task logs (JSONL) |
/confirm |
POST | Approve/reject a safety-gated action |
/abort |
POST | Stop current task |
/health |
GET | Version + health check |
Safety
| Tier | Actions | Behavior |
|---|---|---|
| Auto | Navigation, reading, opening apps | Runs immediately |
| Preview | Typing, form filling | Logged before executing |
| Confirm | Sending messages, deleting, purchases | Pauses for approval |
Server binds to localhost only. Dangerous key combos blocked. Consent required on first run.
CLI
clawdcursor start Full agent (built-in LLM pipeline)
clawdcursor serve Tools-only REST server
clawdcursor mcp MCP stdio server
clawdcursor doctor Diagnose and configure
clawdcursor grant Grant macOS permissions (interactive)
clawdcursor task <t> Send task to running agent
clawdcursor stop Stop server
clawdcursor dashboard Open web dashboard
Options:
--port <port> Default: 3847
--provider <name> anthropic | openai | gemini | groq | ollama | deepseek | ...
--model <model> Override model
--api-key <key> Provider API key
--base-url <url> OpenAI-compatible endpoint
--accept Skip consent prompt (non-interactive)
--v2 Use v2 architecture (vision-first agent + ground truth verifier)
Platform Support
Platform-specific code lives in src/v2/platform/{macos,windows,linux}.ts behind one PlatformAdapter interface — business logic never reads process.platform.
| Platform | UI Automation | OCR | Browser |
|---|---|---|---|
| Windows x64 / ARM64 | PowerShell + UI Automation | Windows.Media.Ocr | Chrome / Edge |
| macOS Intel / Apple Silicon | JXA + System Events | Apple Vision | Chrome / Edge |
| Linux x64 / ARM64 | AT-SPI | Tesseract | Chrome / Edge |
Prerequisites
- Node.js 20+
- macOS — Xcode CLI tools:
xcode-select --install, thenclawdcursor grantfor Accessibility + Screen Recording - Linux —
sudo apt install tesseract-ocr - AI key — optional; works fully offline with Ollama
Tech Stack
TypeScript · Node.js · nut-js · Playwright · sharp · Express · MCP SDK · Zod
License
MIT — see LICENSE.
Yorumlar (0)
Yorum birakmak icin giris yap.
Yorum birakSonuc bulunamadi