OpenVoiceUI

agent
SUMMARY

Voice-powered AI assistant platform — connect any LLM, any TTS, with a live web canvas, music generation, and agent orchestration using openclaw. Install: npx openvoiceui setup

README.md

OpenVoiceUI Banner

OpenVoiceUI

The open-source voice AI that actually does work.

npm version MIT License GitHub Stars Website

Install, open localhost:5001, say "build me a dashboard", and watch it render live.


Watch the demo -- see voice-to-canvas in action


Install

Prerequisite: Docker must be installed and running for all install methods.

Pinokio (one-click)

Download Pinokio if you don't have it, then search "OpenVoiceUI" in the app store and click Install.

npm

npx openvoiceui setup     # interactive wizard — walks you through API keys + builds Docker images
npx openvoiceui start     # starts everything

Docker

git clone https://github.com/MCERQUA/OpenVoiceUI.git
cd OpenVoiceUI
cp .env.example .env        # edit with your API keys
docker compose up

Open localhost:5001 and start talking.


What is OpenVoiceUI?

OpenVoiceUI is a hands-free, AI-controlled computer. You talk — it builds. Live web apps, dashboards, games, full websites — rendered in real time while you watch. No mouse, no keyboard, no typing prompts into a chat box.

It runs on OpenClaw and works with any LLM. The AI agent can build and display apps mid-conversation, switch between projects with a voice command, generate music on the fly, delegate work to parallel sub-agents, and remember everything across sessions. It uses any Claude Code or OpenClaw skill — and the community can build and share more through the plugin system.

Self-hosted. Your hardware, your data. MIT licensed, forever free.

Core Features

  • Hands-Free AI Computer — Talk and watch it work. The AI builds apps, switches between projects, runs tasks, and displays results on a live visual canvas — all without touching a mouse or keyboard.
  • Live Canvas — AI renders real HTML pages mid-conversation: dashboards, tools, galleries, reports, full web apps. Not text responses — real interactive pages you can use.
  • AI Music Generation — Generate songs on the fly with your voice using Suno. Full music player with playlist management built in.
  • Custom Animated Interface — Choose from animated face modes (eye-face avatar, reactive halo-smoke orb) or install community-built faces through plugins. Build your own — the face system is fully extensible.
  • Sub-Agents — Delegate multiple tasks to parallel AI workers simultaneously and get results back.
  • Long-Term Memory — ByteRover context engine curates knowledge every turn. Persists across sessions in human-readable markdown.
  • Desktop OS Interface — Themed desktop environment with window management (Windows XP, macOS, Ubuntu, Win95, Win 3.1).
  • Admin Dashboard — Mobile-responsive. Agent profiles, provider config, workspace file browser, plugin management, system health. Everything editable live.
  • Self-Hosted — Your hardware, your data. No vendor lock-in, no monthly fees.

And More

  • Image generation (FLUX.1, Stable Diffusion 3.5)
  • Video creation (Remotion Studio)
  • Voice cloning (Qwen3-TTS via fal.ai)
  • Cron jobs for scheduled automation
  • File explorer with drag-and-drop
  • Agent profiles — switch personas, voices, and LLM providers from the admin panel

Plugins

OpenVoiceUI has a plugin system for community-built extensions. Plugins can include animated face packs, canvas pages, workflow dashboards, gateway adapters, or any combination of these.

Our first community plugin:

Build your own. If you can build a canvas page, an animated face, or a workflow dashboard, you can package it as a plugin. See the plugins repo for submission guidelines and the BHB plugin as a reference.


Install Details

Option 1: Pinokio (one-click)

  1. Install Pinokio if you don't have it
  2. Search "OpenVoiceUI" in the Pinokio app store
  3. Click Install, then Start

Pinokio handles Docker, dependencies, and configuration automatically.

Option 2: npm

Requires Node.js 20+, Python 3.10+, and Docker.

npx openvoiceui setup     # interactive wizard — configures LLM, TTS, API keys, builds Docker images
npx openvoiceui start     # starts OpenClaw gateway + Supertonic TTS + voice UI

The setup wizard walks you through choosing an LLM provider, TTS provider, and entering API keys. Configuration is saved to .env and openclaw-data/.

npx openvoiceui stop      # stop all services
npx openvoiceui status    # check what's running
npx openvoiceui logs      # tail service logs

Option 3: Docker

Requires Docker and Docker Compose.

git clone https://github.com/MCERQUA/OpenVoiceUI.git
cd OpenVoiceUI
cp .env.example .env

Edit .env with your API keys (at minimum: an LLM provider key and optionally a TTS key). Then:

docker compose up -d

This starts three containers:

Container Port Purpose
openclaw 18791 LLM gateway — routes to your chosen LLM provider
supertonic (internal) Free local TTS — no API key needed
openvoiceui 5001 Voice UI + Canvas + Admin dashboard

Open http://localhost:5001 to use the voice interface, or http://localhost:5001/admin for the admin dashboard.

To stop: docker compose down

Option 4: VPS / Production

For running on an Ubuntu server with nginx and systemd:

git clone https://github.com/MCERQUA/OpenVoiceUI.git
cd OpenVoiceUI
cp .env.example .env               # edit with your API keys
sudo bash deploy/setup-sudo.sh     # creates dirs, installs systemd service
bash deploy/setup-nginx.sh         # generates nginx config (edit domain)

See deploy/ for the full production setup including SSL, nginx reverse proxy, and systemd service files.


Configuration

All configuration is in .env. Copy .env.example to .env and fill in your values.

Required:

  • An LLM provider API key (OpenAI, Anthropic, Groq, Z.AI, or any OpenClaw-compatible provider)
  • CLAWDBOT_AUTH_TOKEN — set during npx openvoiceui setup or in OpenClaw's setup wizard

Optional but recommended:

  • GROQ_API_KEY — enables Groq Orpheus TTS (fast, high quality, free tier)
  • SUNO_API_KEY — enables AI music generation
  • CLERK_PUBLISHABLE_KEY — enables login/auth (for multi-user or public deployments)

See .env.example for all available options with descriptions.


Works With Any Provider

LLM

Provider Status
OpenClaw Gateway Built-in — routes to OpenAI, Anthropic, Groq, Z.AI, and more
Z.AI (GLM-5-turbo) Built-in
Groq (Llama, Qwen) Via OpenClaw
Google Gemini Via OpenClaw
MiniMax Via OpenClaw
Ollama (local) Via adapter
Any LLM Drop-in gateway plugin

Text-to-Speech

Provider Status
Supertonic (local) Free, ships with Docker setup
Groq Orpheus Fast cloud TTS, free tier
Resemble AI Premium cloned voices
Qwen3-TTS (fal.ai) Voice cloning
Hume EVI Emotion-aware
ElevenLabs High quality, many voices

Speech-to-Text

Provider Status
Web Speech API Free, browser-native (default)
Deepgram Streaming, accurate
Groq Whisper Fast cloud transcription

Admin Dashboard

Access at localhost:5001/admin. Mobile-responsive.

  • Profiles — View and activate agent personas
  • Agent Editor — Edit name, voice, LLM provider, system prompt, features, and agent workspace files. 4 tabs: Profile, System Prompt, Features, Agent Files
  • Plugins — Install and manage face packs, gateways, and extensions
  • Canvas Pages — Toggle public/private, lock pages, delete with archive
  • Workspace Files — Browse and edit agent workspace. Audio playback, image preview built in.
  • Music (Suno) — View all generated songs, play inline, archive tracks
  • Provider Config — Select LLM, TTS, STT providers. Saves to active profile.
  • Health and Stats — CPU, RAM, disk, gateway status, session reset
  • Connector Tests — 12 automated endpoint diagnostics

Use Cases

Small Business — AI receptionist, appointment scheduler, report builder. Talk to your AI and get a live dashboard of today's leads, reviews, and tasks.

Digital Agencies — Deploy custom AI assistants per client. Multi-tenant ready. Each client gets their own voice-powered workspace.

Developers — Fork it, extend it, deploy it anywhere. MIT licensed. Build custom plugins, gateway adapters, and canvas pages on top of a voice-first platform.


How It's Different

OpenVoiceUI Typical Voice AI
Source Open source (MIT) Closed source
Canvas UI Live HTML rendering Text/audio only
Skills Any Claude Code or OpenClaw skill API endpoints
Music AI music generation (Suno) None
Memory ByteRover long-term context Session only
Admin Full dashboard, mobile-ready Config files
Plugins Community face packs, pages, workflows None
Hosting Self-hosted, your data Vendor cloud only
Pricing Free forever Per-minute billing

Extend It

  • Build a plugin — Face packs, canvas pages, workflow dashboards, or any combination. See the plugins repo for examples and submission guidelines.
  • Build a gateway plugin — Connect any LLM provider. See plugins/README.md
  • Build an adapter — Add new STT/TTS providers. See src/adapters/_template.js

Tech Stack

Layer Technology
Backend Python / Flask
Frontend Vanilla JS (ES modules, no framework)
Canvas Fullscreen iframe + SSE
STT Web Speech API, Deepgram, Groq Whisper
TTS Supertonic, Groq Orpheus, Resemble, Qwen3-TTS
LLM Any provider via OpenClaw gateway
Memory ByteRover context engine (markdown knowledge base)
Auth Clerk (optional)
Deploy npm, Docker, Pinokio, VPS/systemd

Documentation

Contributing

We welcome contributions — especially plugins. Build a face pack, a canvas page, a workflow dashboard, or a full extension and submit it to the plugins repo. See CONTRIBUTING.md for code contribution guidelines. This project is MIT licensed — fork it, build on it, make it yours.

License

MIT


Website  ·  GitHub  ·  npm  ·  Plugins  ·  [email protected]

Yorumlar (0)

Sonuc bulunamadi