DeLive

Name: DeLive
Author: XimilalaXiang

System Audio Capture | Multi-Provider ASR | Local-First AI Review Workspace

DeLive is a desktop transcription workspace for system audio. It captures whatever your computer is playing, routes the audio through the ASR backend that fits the job, keeps everything on your machine, and turns completed transcripts into searchable history with a full AI Review Desk — rich Markdown-rendered chat, Q&A threads, structured briefings, and mind maps.

Live Transcription	Review & History	Topics
Real-time transcription with floating caption overlay	Session history with activity heatmap and search	Project-based session organization

AI Overview	AI Chat	Mind Map & Export
Summary, action items, keywords, and chapters	Multi-thread conversation with cited references	Caption style editor with live preview

Settings	Caption Overlay
Provider configuration with credential management	Draggable always-on-top floating caption window

Core Features
Download
Supported ASR Providers
Quick Start
Usage
Project Map
System Architecture
Tech Stack
Security
Open API & MCP Ecosystem
Extending Providers
Notes
License
Acknowledgments

🎯 Core Features

System-audio capture for real desktop use — browser video, live streams, meetings, courses, podcasts, or any other playback source
Six ASR backends behind one UI — Soniox, Volcengine, Groq, SiliconFlow, OpenAI-compatible local services, and local whisper.cpp
Provider-aware capture pipeline — auto-switches between MediaRecorder and AudioWorklet PCM16 capture based on provider requirements
Three execution modes — true realtime streaming, windowed batch retranscription, and Electron-managed local runtime
Session lifecycle management — draft sessions, autosave while recording, interrupted-session recovery, and completed-session history
Floating caption overlay — separate always-on-top window with source / translated / dual display modes and style customization
Soniox bilingual & speaker-aware flows — realtime translation, dual-line captions, diarization tokens, speaker-grouped preview
AI Review Desk — full-page workspace with animated tab navigation (Overview, Transcript, Chat, Mind Map)
Rich AI Chat — multi-thread conversation with GFM Markdown rendering, syntax-highlighted code blocks, hover actions, and more
Structured AI briefing — summary, action items, keywords, chapters, title/tag suggestions, and cited Q&A
Mind maps — generate Markmap-compatible Markdown, edit live, export SVG or PNG
Topics — organize sessions into project-based containers with emoji icons
Local model workflows — detect local services, discover models, pull from Ollama, import/download whisper.cpp assets
5 color themes — Cyan, Violet, Rose, Green, Amber — each with full light and dark mode
Local-first persistence — sessions, tags, topics, and settings in IndexedDB/localStorage; secrets via Electron safeStorage
Desktop integration — tray, global shortcut, auto-launch, updater, diagnostics export
Security hardening — trusted-window IPC, CSP injection, navigation guard, path allowlist, encrypted secret storage
Open API & MCP ecosystem — local REST API, real-time WebSocket, MCP server for AI agents, token-based authentication, and agent skill definition
Cross-platform — Windows, macOS, and Linux

📥 Download

Get the latest release for your platform:

Platform	Files
Windows	`.exe` installer, portable `.exe`
macOS	`.dmg` (Intel x64 and Apple Silicon arm64)
Linux	`.AppImage`, `.deb`

All downloads are available on the Releases page.

🔌 Supported ASR Providers

Provider	Type	Transport	Audio path	Highlights
Soniox V4	Cloud	Realtime streaming	`MediaRecorder` (`webm/opus`) → WebSocket	Token-level realtime transcription, realtime translation, bilingual captions, speaker diarization
Volcengine	Cloud	Realtime streaming	`AudioWorklet` PCM16 → embedded proxy → WebSocket	Chinese-oriented realtime path; proxy injects required headers from Electron
Groq	Cloud	Windowed batch retranscription	`AudioWorklet` PCM16 → WAV → REST	Whisper `large-v3-turbo` / `large-v3` style flow with quasi-realtime session updates
SiliconFlow	Cloud	Windowed batch retranscription	`AudioWorklet` PCM16 → WAV → REST	SenseVoice, TeleSpeech, and Qwen Omni-backed transcription flow
Local OpenAI-compatible	Local service	Windowed batch retranscription	`MediaRecorder` (`webm/opus`) → `/v1/audio/transcriptions`	Works with Ollama or other compatible gateways; supports service/model discovery and optional Ollama pull
Local `whisper.cpp`	Local runtime	Electron-managed local runtime	`AudioWorklet` PCM16 → local `/inference`	Starts `whisper-server`, manages binary/model assets, and stays fully local

🚀 Quick Start

Prerequisites

Node.js 18+ (release.yml uses Node 20 in CI)
One provider path:
- Soniox: API key from soniox.com
- Volcengine: APP ID and Access Token
- Groq: API key from groq.com
- SiliconFlow: API key from siliconflow.cn
- Local OpenAI-compatible: local service exposing /v1/models and /v1/audio/transcriptions
- Local whisper.cpp: whisper-server plus a local .bin or .gguf model, or let DeLive import/download them

Installation

git clone https://github.com/XimilalaXiang/DeLive.git
cd DeLive
npm run install:all

Development

npm run dev

npm run dev starts Vite and Electron together. The Volcengine proxy is embedded in the Electron main process, so normal desktop development does not need a separate backend.

For standalone proxy debugging:

npm run dev:server

Quality Checks

npm run check

npm run check runs frontend lint, frontend tests, and a full app build.

To run just the frontend tests:

npm run test:frontend

Current suite status: 184 tests across 22 files with coverage around provider config, transcript state/stabilization, subtitle export, session lifecycle/repository, storage, and AI post-process parsing.

Build

npm run dist:win
npm run dist:mac
npm run dist:linux
npm run dist:all

Artifacts are written to release/.

Optional: Stage `whisper.cpp` Into Packaged Builds

npm run fetch:whisper-runtime -- --target win32
npm run stage:whisper-runtime -- --binary /path/to/whisper-server --target linux

If local-runtimes/whisper_cpp/whisper-server(.exe) exists at build time, electron-builder packages it as an extra resource. End users can still import or download binaries and models later from the UI.

📖 Usage

Typical Recording Flow

Open settings and choose a provider.
Fill in credentials or local runtime details, then run Test Config.
Click Start Recording.
Pick a screen or window and make sure audio sharing is enabled.
Watch partial and final text update in the main window and, optionally, the floating caption overlay.
Stop recording and open the saved session from History for review, AI actions, or export.

Caption Overlay

Toggle the floating caption window from the main UI.
Adjust font, colors, width, line count, shadow, and position.
Switch between source, translated, and dual modes when the provider supplies translation output.
Use draggable/interactive states to reposition the overlay without closing it.

Topics

Organize recordings into project-like containers:

Open the Topics tab from the navigation bar.
Create a topic with a name, emoji icon, and optional description.
Start recording into a topic in two ways:
- Click Record New on a topic card — jumps to Live with the topic pre-selected.
- In the Live view, click the Select Topic link above the recording controls and pick a topic.
The selected topic appears as a badge above the record button. Recordings are assigned automatically.
Existing sessions can be moved into (or out of) a topic from the Overview tab in Review.
Sessions inside a topic are hidden from the default Review list, but global search still finds them.

AI Review Desk

Completed sessions open in a dedicated full-page Review Desk (not a modal) with an animated sliding tab bar and keyboard arrow navigation:

Overview tab: AI briefing — summary, action items, keywords, chapters, title/tag suggestions, and one-click apply
Transcript tab: Timestamped segments in a left gutter, color-coded speaker badges, consecutive same-speaker merging, hover highlight, and TXT/Markdown/SRT/VTT export
Chat tab: Multi-thread AI conversation — GFM Markdown rendering with syntax-highlighted code blocks (one-click copy), user/AI avatars, hover Copy/Regenerate actions, animated thinking-dots indicator, auto-resizing composer (Enter to send), floating scroll-to-bottom button, and per-thread delete
Mind Map tab: Generate Markmap-compatible Markdown, edit it live, and export SVG or PNG
Metadata actions: apply suggested title/tags and rename speaker labels for diarized sessions

Local OpenAI-compatible Services

Select Local OpenAI-compatible.
Fill in Base URL and Model.
Use the local-model guide to probe the service and list installed models.
If the detected service is Ollama, DeLive can pull the selected model directly from the app.

Local `whisper.cpp` Runtime

Select Local whisper.cpp.
Prepare the runtime binary by importing an existing whisper-server file or downloading a recommended official release asset.
Prepare the model by choosing, importing, or downloading a .bin / .gguf file.
Start the runtime or run Test Config.
Record normally; Electron manages the runtime lifecycle through IPC.

History, Backup, and Recovery

Sessions can be renamed, tagged, organized by topic, searched, and exported as TXT, Markdown, SRT, or VTT.
Recording drafts are autosaved and incomplete sessions can be restored after an interrupted launch.
Full local data can be exported/imported for backup or migration.
Diagnostics export generates a redacted JSON bundle with system info and recent logs for troubleshooting.

🧩 Project Map

Area	Key files	Responsibility
Desktop shell	`electron/main.ts`, `electron/mainWindow.ts`, `electron/captionWindow.ts`, `electron/tray.ts`, `electron/shortcuts.ts`, `electron/desktopSource.ts`, `electron/autoUpdater.ts`, `electron/ipcSecurity.ts`	Starts Electron, owns native windows, tray behavior, shortcuts, desktop source picking, updater lifecycle, IPC security, and app shutdown.
Renderer app	`frontend/src/App.tsx`, `frontend/src/components/`, `frontend/src/i18n/`	Main settings, recording, history, topics, preview, and caption-control UI. Workspace view (Live / Review Desk / Topics / Settings) is driven by Zustand.
ASR orchestration	`frontend/src/hooks/useASR.ts`, `frontend/src/services/captureManager.ts`, `frontend/src/services/providerSession.ts`, `frontend/src/services/captionBridge.ts`	Resolves provider setup, starts the right audio pipeline, forwards transcript events, and mirrors text to the caption overlay.
Provider abstraction	`frontend/src/providers/registry.ts`, `frontend/src/providers/implementations/*`	Normalizes six backends behind one contract and capability model.
State management	`frontend/src/stores/sessionStore.ts`, `frontend/src/stores/topicStore.ts`, `frontend/src/stores/uiStore.ts`, `frontend/src/stores/settingsStore.ts`, `frontend/src/stores/tagStore.ts`, `frontend/src/stores/transcriptStore.ts`	Zustand store slices for sessions, topics, UI state, settings, tags, and a unified facade for backward compatibility.
Session intelligence	`frontend/src/services/aiPostProcess.ts`, `frontend/src/components/ReviewDeskView.tsx`, `frontend/src/components/PreviewModal.tsx`	AI briefing, Q&A, mind maps, tagging, and speaker label editing.
Topics	`frontend/src/components/TopicsView.tsx`, `frontend/src/components/TopicDetailView.tsx`, `frontend/src/components/TopicDialog.tsx`, `frontend/src/components/TopicPicker.tsx`	Card-grid topic browser, per-topic session list, CRUD dialogs, and Live-view topic selection.
Review Desk UI	`frontend/src/components/review/SessionTabBar.tsx`, `frontend/src/components/review/SessionHeader.tsx`, `frontend/src/components/review/OverviewTab.tsx`, `frontend/src/components/review/TranscriptTab.tsx`, `frontend/src/components/review/ChatTab.tsx`, `frontend/src/components/review/MindMapTab.tsx`, `frontend/src/components/review/MarkdownRenderer.tsx`	Animated tab bar with keyboard navigation, session header with multi-format export (TXT/Markdown/SRT/VTT), per-tab content views, GFM Markdown rendering with syntax highlighting, and mind map editing.
Settings UI	`frontend/src/components/settings/ServiceSettingsPanel.tsx`, `frontend/src/components/settings/GeneralSettingsPanel.tsx`	Provider credential configuration and general app settings (language, theme, AI config, backup/restore).
Runtime UI	`frontend/src/components/runtime/BundledRuntimeSummaryCard.tsx`, `frontend/src/components/runtime/BundledRuntimeAdvancedPanel.tsx`	Status card and advanced panel for managing bundled `whisper.cpp` runtime assets.
Shared UI system	`frontend/src/components/ui/*`	Button, Badge, Switch, EmptyState, StatusIndicator, DialogShell primitives with semantic color tokens across five themes.
Local model/runtime tooling	`frontend/src/utils/localModelSetup.ts`, `frontend/src/utils/localRuntimeManager.ts`, `frontend/src/components/LocalModelSetupGuide.tsx`, `frontend/src/components/BundledRuntimeSetupGuide.tsx`, `electron/localRuntime.ts`, `electron/localRuntimeFiles.ts`, `electron/localRuntimeShared.ts`, `electron/localRuntimeIpc.ts`	Detects local services, checks models, supports Ollama pull, imports/downloads `whisper.cpp` assets, manages runtime files, and starts/stops the local runtime.
Electron IPC layer	`electron/appIpc.ts`, `electron/captionIpc.ts`, `electron/safeStorageIpc.ts`, `electron/updaterIpc.ts`, `electron/diagnosticsIpc.ts`, `electron/apiIpc.ts`	Modular IPC handlers for app lifecycle, caption window control, secret storage, auto-update, diagnostics, and Open API data bridge.
Open API layer	`electron/apiServer.ts`, `electron/apiBroadcast.ts`, `frontend/src/hooks/useApiIpcResponder.ts`	REST API endpoints, WebSocket live transcript broadcasting, and renderer-side IPC responder for session data queries.
MCP & agent ecosystem	`mcp/delive-mcp-server.js`, `skills/delive-transcript-analyzer/SKILL.md`	Standalone MCP server exposing DeLive as tools/resources and agent skill definition.
Shared contracts	`shared/electronApi.ts`, `electron/preload.ts`, `shared/volcProxyCore.ts`	Typed bridge between renderer and main process plus shared protocol helpers for the embedded Volcengine proxy.
Debug and release support	`server/`, `scripts/`, `.github/workflows/release.yml`, `.github/workflows/ci.yml`	Standalone Volc proxy debugging, icon/runtime staging scripts, continuous integration, and tagged multi-platform release builds.
Design references	`design-system/delive/MASTER.md`	Product and visual reference material used during UI iteration. Not part of the runtime path.

🔄 Recording Lifecycle

App.tsx initializes storage, theme, settings, tags, and saved sessions.
useASR asks ProviderSessionManager to resolve the selected provider's capabilities and connect.
CaptureManager requests system audio through getDisplayMedia and chooses either MediaRecorder or AudioWorklet PCM16 capture.
Provider events flow into sessionStore, while CaptionBridge mirrors stable and non-final text to the floating caption window.
sessionStore builds session snapshots, autosaves drafts, and restores interrupted work on next launch.
Completed sessions open in the preview workspace for transcript review, AI briefing, Q&A, mind map generation, tagging, and export.

🏗 System Architecture

graph TB
    subgraph "Desktop Shell"
        EM[Electron Main Process]
        WIN[Main Window]
        CAP[Caption Overlay Window]
        DESK[Tray / Shortcut / Auto Launch / Updater]
        SEC[IPC Security / SafeStorage / Diagnostics]
    end

    subgraph "Renderer"
        UI[React App]
        STORES[Zustand Stores]
        CFG[Provider and Runtime Setup]
        PREV[History / Preview / AI Workspace]
    end

    subgraph "Orchestration"
        ASR[useASR]
        CAPMGR[CaptureManager]
        PROVSESS[ProviderSessionManager]
        CAPBR[CaptionBridge]
    end

    subgraph "Capture Pipeline"
        GDM[getDisplayMedia]
        MR[MediaRecorder<br/>WebM / Opus]
        AP[AudioWorklet<br/>PCM16 16kHz]
    end

    subgraph "Provider Layer"
        REG[Provider Registry]
        SON[Soniox]
        VOL[Volcengine]
        GRQ[Groq]
        SIL[SiliconFlow]
        LOA[Local OpenAI-compatible]
        WCP[whisper.cpp Runtime]
    end

    subgraph "Electron Services"
        PROXY[Embedded Volc Proxy]
        RTM[Local Runtime Controller]
    end

    subgraph "Persistence"
        REPO[Session Repository]
        IDB[IndexedDB]
        LS[localStorage]
        SAFE[safeStorage]
    end

    UI --> STORES
    UI --> CFG
    UI --> PREV
    UI --> ASR

    ASR --> CAPMGR
    ASR --> PROVSESS
    ASR --> CAPBR

    CAPMGR --> GDM
    GDM --> MR
    GDM --> AP

    PROVSESS --> REG
    REG --> SON
    REG --> VOL
    REG --> GRQ
    REG --> SIL
    REG --> LOA
    REG --> WCP

    MR --> SON
    MR --> LOA
    AP --> VOL
    AP --> GRQ
    AP --> SIL
    AP --> WCP

    VOL --> PROXY
    WCP --> RTM

    STORES --> REPO
    REPO --> IDB
    REPO --> LS
    CFG --> SAFE

    UI --> EM
    EM --> WIN
    EM --> CAP
    EM --> DESK
    EM --> SEC
    EM --> PROXY
    EM --> RTM
    CAPBR --> CAP

    style UI fill:#61dafb,color:#000
    style EM fill:#334155,color:#fff
    style CAP fill:#f472b6,color:#000
    style REG fill:#f59e0b,color:#000
    style PROXY fill:#10b981,color:#fff
    style RTM fill:#0f766e,color:#fff
    style SEC fill:#ef4444,color:#fff
    style SAFE fill:#a855f7,color:#fff
    style IDB fill:#3b82f6,color:#fff

Architecture Overview

Layer	Main components	Notes
Desktop shell	Electron main process, main window, caption window, tray, updater, diagnostics	Owns native lifecycle, source picking, caption overlay, and OS integration.
Renderer	React UI, Zustand stores, history/preview workspace, topics, settings panels	Handles recording flow, configuration, topic management, session review, and user actions.
Orchestration	`useASR`, `CaptureManager`, `ProviderSessionManager`, `CaptionBridge`	Keeps provider logic separate from capture and UI.
Provider layer	Registry plus 6 implementations	Unifies realtime cloud, windowed batch cloud, local service, and local runtime flows.
Electron services	Embedded Volc proxy, local runtime controller, safe-storage IPC, diagnostics IPC	Provides features that the browser environment cannot do directly.
Persistence	Session repository, IndexedDB, localStorage, `safeStorage`	Autosaves drafts, restores interrupted sessions, and stores secrets separately from general settings.
Shared contracts	Typed preload bridge and shared helper modules	Keeps renderer/main contracts explicit and safer to evolve.

📁 Project Structure

DeLive/
├── electron/                         # Electron main process, windows, tray, IPC, updater, runtime control, Open API server
├── frontend/                         # React renderer app, providers, stores, UI components, tests
├── shared/                           # Shared TypeScript contracts for preload/renderer/main and proxy helpers
├── server/                           # Standalone Volcengine proxy used mainly for debugging
├── mcp/                              # Standalone MCP server for AI agents (Claude, Cursor, etc.)
├── skills/                           # Agent skill definitions
├── local-runtimes/                   # Optional packaged runtime assets (for whisper.cpp staging)
├── scripts/                          # Icon generation, runtime fetch/stage, release notes
├── design-system/                    # Design reference material
├── assets/                           # README and branding assets
├── build/                            # Electron-builder icons and packaging resources
├── .github/workflows/ci.yml          # Push/PR continuous integration pipeline
├── .github/workflows/release.yml     # Tag-triggered quality + release pipeline
├── README.md
└── package.json

Generated outputs such as dist-electron/, release/, and dependency folders are omitted here.

🔧 Tech Stack

Layer	Technology
Desktop app	Electron 40
Frontend	React 18.3 + TypeScript 5.6 + Vite 6
Styling	Tailwind CSS 3.4
State management	Zustand 4.5
Testing	Vitest 4
Audio processing	`MediaRecorder`, `AudioWorklet`, WAV conversion utilities
Desktop services	Electron main-process IPC, Express, `ws`
Persistence	IndexedDB, localStorage, Electron `safeStorage`
AI review	OpenAI-compatible chat completions for briefing, Q&A, and mind maps
Packaging	`electron-builder`
Release automation	GitHub Actions tag workflow

🔒 Security

Feature	Description
Context isolation	`contextIsolation: true`, `nodeIntegration: false`
Trusted IPC senders	Sensitive handlers verify the caller belongs to a registered trusted window
Content Security Policy	CSP is injected at the Electron layer and allows only the required connect targets
Navigation guard	Unexpected renderer navigation is blocked
Path allowlist	File-path checks are limited to safe roots such as `userData`, home, desktop, downloads, and documents
Secret storage	API keys are stored through Electron `safeStorage` when OS encryption is available
Open API gating	Local REST API and WebSocket are disabled by default; optional Bearer token authentication when enabled
Diagnostics hygiene	Exported diagnostics redact secret-looking fields before writing the JSON bundle

⌨️ Keyboard Shortcut

Shortcut	Function
`Ctrl+Shift+D` / `Cmd+Shift+D`	Show or hide the main window

🌐 Open API & MCP Ecosystem

DeLive exposes its transcription data through a local API, enabling external tools, scripts, and AI agents to programmatically access session history, live captions, and recording status.

Enabling the API

Go to Settings > General > Open API.
Toggle Enable Open API to on.
Optionally set an Access Token for authentication (recommended).

REST API

When enabled, the following endpoints are available at http://localhost:23456/api/v1/:

Endpoint	Description
`GET /health`	Health check (always accessible, even when API is disabled)
`GET /sessions`	List sessions with search, filter, and pagination
`GET /sessions/:id`	Full session detail including transcript and AI summary
`GET /sessions/:id/transcript`	Plain text transcript only
`GET /sessions/:id/summary`	AI summary, action items, and mind map
`GET /topics`	List all topics
`GET /tags`	List all tags
`GET /status`	Current recording status

If a token is set, include it as Authorization: Bearer <token>.

WebSocket

Real-time transcript streaming is available at ws://localhost:23456/ws/live. Authenticate via ?token=<token> query parameter or Authorization header.

MCP Server

A standalone MCP server (mcp/delive-mcp-server.js) exposes DeLive's API as tools and resources for AI agents (Claude Desktop, Cursor, etc.). See mcp/ for setup instructions.

Agent Skill

An agent skill definition is available at skills/delive-transcript-analyzer/SKILL.md, providing structured guidance for AI agents to use DeLive's capabilities.

🔧 Extending Providers

Add a provider implementation under frontend/src/providers/implementations/.
Define accurate ASRProviderInfo metadata, required fields, and capability flags.
Register the provider in frontend/src/providers/registry.ts.
Add config-test logic in frontend/src/utils/providerConfigTest.ts if the provider supports validation.
For local-service or local-runtime flows, wire model/runtime helpers in frontend/src/utils/localModelSetup.ts or frontend/src/utils/localRuntimeManager.ts.
If the provider needs custom headers or native process control, add the Electron-side support in electron/.

⚠️ Notes

System requirements: Windows 10+, macOS 13+, or Linux with PulseAudio loopback support.
Volcengine proxy: normal desktop usage does not require a separate backend process; Electron starts the proxy internally.
Local OpenAI-compatible mode: discovery expects /v1/models, while transcription expects /v1/audio/transcriptions.
whisper.cpp mode: packaged binaries are optional; users can also import or download runtime assets later.
Tray behavior: closing the main window hides to tray instead of exiting the app.
Auto-launch: currently supported on Windows and macOS.
Auto-update: supported on Windows, macOS, and Linux AppImage builds.

🛡️ Windows SmartScreen Warning

Windows may show a SmartScreen warning the first time you launch DeLive. That is expected for unsigned or newly distributed apps.

Click More info.
Click Run anyway.

You can also inspect the source code directly and verify released binaries independently.

📄 License

Apache License 2.0

🙏 Acknowledgments

Soniox for realtime speech recognition APIs
Volcengine for Chinese-focused speech recognition
Groq for high-performance Whisper inference
SiliconFlow for speech and multimodal ASR services
Ollama for local model workflows
whisper.cpp for local open-source runtime support
BiBi-Keyboard for multi-provider architecture inspiration

Made by XimilalaXiang