OpenVoiceUI
Voice-powered AI assistant platform — connect any LLM, any TTS, with a live web canvas, music generation, and agent orchestration using openclaw. Install: npx openvoiceui setup
OpenVoiceUI
The open-source voice AI that actually does work.
Install, open localhost:5001, say "build me a dashboard", and watch it render live.
Watch the demo -- see voice-to-canvas in action
Install
Prerequisite: Docker must be installed and running for all install methods.
Pinokio (one-click)
Download Pinokio if you don't have it, then search "OpenVoiceUI" in the app store and click Install.
npm
npx openvoiceui setup # interactive wizard — walks you through API keys + builds Docker images
npx openvoiceui start # starts everything
Docker
git clone https://github.com/MCERQUA/OpenVoiceUI.git
cd OpenVoiceUI
cp .env.example .env # edit with your API keys
docker compose up
Open localhost:5001 and start talking.
What is OpenVoiceUI?
OpenVoiceUI is a hands-free, AI-controlled computer. You talk — it builds. Live web apps, dashboards, games, full websites — rendered in real time while you watch. No mouse, no keyboard, no typing prompts into a chat box.
It runs on OpenClaw and works with any LLM. The AI agent can build and display apps mid-conversation, switch between projects with a voice command, generate music on the fly, delegate work to parallel sub-agents, and remember everything across sessions. It uses any Claude Code or OpenClaw skill — and the community can build and share more through the plugin system.
Self-hosted. Your hardware, your data. MIT licensed, forever free.
Core Features
- Hands-Free AI Computer — Talk and watch it work. The AI builds apps, switches between projects, runs tasks, and displays results on a live visual canvas — all without touching a mouse or keyboard.
- Live Canvas — AI renders real HTML pages mid-conversation: dashboards, tools, galleries, reports, full web apps. Not text responses — real interactive pages you can use.
- AI Music Generation — Generate songs on the fly with your voice using Suno. Full music player with playlist management built in.
- Custom Animated Interface — Choose from animated face modes (eye-face avatar, reactive halo-smoke orb) or install community-built faces through plugins. Build your own — the face system is fully extensible.
- Sub-Agents — Delegate multiple tasks to parallel AI workers simultaneously and get results back.
- Long-Term Memory — ByteRover context engine curates knowledge every turn. Persists across sessions in human-readable markdown.
- Desktop OS Interface — Themed desktop environment with window management (Windows XP, macOS, Ubuntu, Win95, Win 3.1).
- Admin Dashboard — Mobile-responsive. Agent profiles, provider config, workspace file browser, plugin management, system health. Everything editable live.
- Self-Hosted — Your hardware, your data. No vendor lock-in, no monthly fees.
And More
- Image generation (FLUX.1, Stable Diffusion 3.5)
- Video creation (Remotion Studio)
- Voice cloning (Qwen3-TTS via fal.ai)
- Cron jobs for scheduled automation
- File explorer with drag-and-drop
- Agent profiles — switch personas, voices, and LLM providers from the admin panel
Plugins
OpenVoiceUI has a plugin system for community-built extensions. Plugins can include animated face packs, canvas pages, workflow dashboards, gateway adapters, or any combination of these.
Our first community plugin:
- BHB Animated Characters — Custom animated avatar faces by BHB
Build your own. If you can build a canvas page, an animated face, or a workflow dashboard, you can package it as a plugin. See the plugins repo for submission guidelines and the BHB plugin as a reference.
Install Details
Option 1: Pinokio (one-click)
- Install Pinokio if you don't have it
- Search "OpenVoiceUI" in the Pinokio app store
- Click Install, then Start
Pinokio handles Docker, dependencies, and configuration automatically.
Option 2: npm
Requires Node.js 20+, Python 3.10+, and Docker.
npx openvoiceui setup # interactive wizard — configures LLM, TTS, API keys, builds Docker images
npx openvoiceui start # starts OpenClaw gateway + Supertonic TTS + voice UI
The setup wizard walks you through choosing an LLM provider, TTS provider, and entering API keys. Configuration is saved to .env and openclaw-data/.
npx openvoiceui stop # stop all services
npx openvoiceui status # check what's running
npx openvoiceui logs # tail service logs
Option 3: Docker
Requires Docker and Docker Compose.
git clone https://github.com/MCERQUA/OpenVoiceUI.git
cd OpenVoiceUI
cp .env.example .env
Edit .env with your API keys (at minimum: an LLM provider key and optionally a TTS key). Then:
docker compose up -d
This starts three containers:
| Container | Port | Purpose |
|---|---|---|
openclaw |
18791 | LLM gateway — routes to your chosen LLM provider |
supertonic |
(internal) | Free local TTS — no API key needed |
openvoiceui |
5001 | Voice UI + Canvas + Admin dashboard |
Open http://localhost:5001 to use the voice interface, or http://localhost:5001/admin for the admin dashboard.
To stop: docker compose down
Option 4: VPS / Production
For running on an Ubuntu server with nginx and systemd:
git clone https://github.com/MCERQUA/OpenVoiceUI.git
cd OpenVoiceUI
cp .env.example .env # edit with your API keys
sudo bash deploy/setup-sudo.sh # creates dirs, installs systemd service
bash deploy/setup-nginx.sh # generates nginx config (edit domain)
See deploy/ for the full production setup including SSL, nginx reverse proxy, and systemd service files.
Configuration
All configuration is in .env. Copy .env.example to .env and fill in your values.
Required:
- An LLM provider API key (OpenAI, Anthropic, Groq, Z.AI, or any OpenClaw-compatible provider)
CLAWDBOT_AUTH_TOKEN— set duringnpx openvoiceui setupor in OpenClaw's setup wizard
Optional but recommended:
GROQ_API_KEY— enables Groq Orpheus TTS (fast, high quality, free tier)SUNO_API_KEY— enables AI music generationCLERK_PUBLISHABLE_KEY— enables login/auth (for multi-user or public deployments)
See .env.example for all available options with descriptions.
Works With Any Provider
LLM
| Provider | Status |
|---|---|
| OpenClaw Gateway | Built-in — routes to OpenAI, Anthropic, Groq, Z.AI, and more |
| Z.AI (GLM-5-turbo) | Built-in |
| Groq (Llama, Qwen) | Via OpenClaw |
| Google Gemini | Via OpenClaw |
| MiniMax | Via OpenClaw |
| Ollama (local) | Via adapter |
| Any LLM | Drop-in gateway plugin |
Text-to-Speech
| Provider | Status |
|---|---|
| Supertonic (local) | Free, ships with Docker setup |
| Groq Orpheus | Fast cloud TTS, free tier |
| Resemble AI | Premium cloned voices |
| Qwen3-TTS (fal.ai) | Voice cloning |
| Hume EVI | Emotion-aware |
| ElevenLabs | High quality, many voices |
Speech-to-Text
| Provider | Status |
|---|---|
| Web Speech API | Free, browser-native (default) |
| Deepgram | Streaming, accurate |
| Groq Whisper | Fast cloud transcription |
Admin Dashboard
Access at localhost:5001/admin. Mobile-responsive.
- Profiles — View and activate agent personas
- Agent Editor — Edit name, voice, LLM provider, system prompt, features, and agent workspace files. 4 tabs: Profile, System Prompt, Features, Agent Files
- Plugins — Install and manage face packs, gateways, and extensions
- Canvas Pages — Toggle public/private, lock pages, delete with archive
- Workspace Files — Browse and edit agent workspace. Audio playback, image preview built in.
- Music (Suno) — View all generated songs, play inline, archive tracks
- Provider Config — Select LLM, TTS, STT providers. Saves to active profile.
- Health and Stats — CPU, RAM, disk, gateway status, session reset
- Connector Tests — 12 automated endpoint diagnostics
Use Cases
Small Business — AI receptionist, appointment scheduler, report builder. Talk to your AI and get a live dashboard of today's leads, reviews, and tasks.
Digital Agencies — Deploy custom AI assistants per client. Multi-tenant ready. Each client gets their own voice-powered workspace.
Developers — Fork it, extend it, deploy it anywhere. MIT licensed. Build custom plugins, gateway adapters, and canvas pages on top of a voice-first platform.
How It's Different
| OpenVoiceUI | Typical Voice AI | |
|---|---|---|
| Source | Open source (MIT) | Closed source |
| Canvas UI | Live HTML rendering | Text/audio only |
| Skills | Any Claude Code or OpenClaw skill | API endpoints |
| Music | AI music generation (Suno) | None |
| Memory | ByteRover long-term context | Session only |
| Admin | Full dashboard, mobile-ready | Config files |
| Plugins | Community face packs, pages, workflows | None |
| Hosting | Self-hosted, your data | Vendor cloud only |
| Pricing | Free forever | Per-minute billing |
Extend It
- Build a plugin — Face packs, canvas pages, workflow dashboards, or any combination. See the plugins repo for examples and submission guidelines.
- Build a gateway plugin — Connect any LLM provider. See
plugins/README.md - Build an adapter — Add new STT/TTS providers. See
src/adapters/_template.js
Tech Stack
| Layer | Technology |
|---|---|
| Backend | Python / Flask |
| Frontend | Vanilla JS (ES modules, no framework) |
| Canvas | Fullscreen iframe + SSE |
| STT | Web Speech API, Deepgram, Groq Whisper |
| TTS | Supertonic, Groq Orpheus, Resemble, Qwen3-TTS |
| LLM | Any provider via OpenClaw gateway |
| Memory | ByteRover context engine (markdown knowledge base) |
| Auth | Clerk (optional) |
| Deploy | npm, Docker, Pinokio, VPS/systemd |
Documentation
- Architecture Overview
- TTS Provider Guide
- Supertonic Setup
- Environment Variables
- PR Review Checklist
- Website
Contributing
We welcome contributions — especially plugins. Build a face pack, a canvas page, a workflow dashboard, or a full extension and submit it to the plugins repo. See CONTRIBUTING.md for code contribution guidelines. This project is MIT licensed — fork it, build on it, make it yours.
License
Website · GitHub · npm · Plugins · [email protected]
Reviews (0)
Sign in to leave a review.
Leave a reviewNo results found