t2t
Health Pass
- License — License: MIT
- Description — Repository has a description
- Active repo — Last push 0 days ago
- Community trust — 12 GitHub stars
Code Fail
- network request — Outbound network request in cloud/src/agent-runtime.ts
- network request — Outbound network request in cloud/src/index.ts
- exec() — Shell command execution in cloud/src/proactive.ts
- network request — Outbound network request in cloud/src/proactive.ts
Permissions Pass
- Permissions — No dangerous permissions requested
This is a macOS desktop application that provides system-wide voice-to-text dictation and AI-powered agent capabilities. It transcribes audio locally using Whisper and can optionally connect to AI models and external tool servers for task automation.
Security Assessment
The tool inherently handles sensitive data, requiring both microphone and accessibility permissions to function. A significant concern is a failed audit check confirming that the application executes shell commands (found in the proactive runtime script). While it makes outbound network requests to external servers (likely for routing AI requests via OpenRouter), no hardcoded secrets were detected, and no dangerous OS-level permissions are requested. Because it listens to keystrokes, records audio, and executes system commands, the overall risk is rated as Medium.
Quality Assessment
The project is actively maintained, with its most recent code push occurring today. It is properly licensed under the permissive MIT license. Community trust is currently very low, as indicated by only 12 GitHub stars. Developers should be aware that the macOS builds are currently unsigned, requiring manual security bypasses to install, which makes the application more vulnerable to supply chain attacks.
Verdict
Use with caution — the app is new, unsigned, and executes shell commands alongside microphone access, so only install if you fully understand and accept these local security risks.
Voice-to-text with MCP support. System-wide dictation (hold fn) and AI agent mode (hold fn+ctrl) that connects to any MCP server. Cross-platform desktop app with local Whisper transcription.
t2t
Voice-to-text with intelligence. Hold fn to talk, hold fn+ctrl to command.
Download
Note: The app is not code-signed yet. On first launch, macOS may show a security warning. To open it:
- Right-click the app → Open, then click Open in the dialog
- Or run:
xattr -cr /Applications/t2t.appin TerminalHeads up: This is an unsigned build while we polish things up. Each time you update to a new version, you'll need to remove t2t from System Settings → Privacy & Security → Accessibility (and Microphone if needed), then re-add it. We'll get it properly signed soon!
How It Works
- Hold Fn key → records microphone audio
- Release Fn key → transcribes using local Whisper model
- Typing mode (red bar): Hold Fn alone → pastes transcription into focused text field, preserves clipboard
- Agent mode (cyan bar): Hold Fn+Ctrl → speaks commands to AI agent
- MCP mode (if configured): Connects to MCP servers, uses their tools via OpenRouter AI
- AppleScript mode (fallback): Generates and executes AppleScript for macOS automation
- Visual feedback: red/cyan bar while recording (based on mode), amber while processing
Requirements
- macOS (currently macOS only; tested on Apple Silicon)
- Accessibility permission - Required for Fn key detection and focusing the correct field before paste
- Microphone permission - Required for audio recording
- OpenRouter API key (for agent mode) - Get one at openrouter.ai
The app will prompt you if permissions are missing.
Getting Started
- Download and install the app from t2t.now
- Grant permissions when prompted (Accessibility and Microphone)
- Get an OpenRouter API key at openrouter.ai (required for agent mode)
- Open settings: Click the menu bar icon → View Settings
- Configure agent mode (optional):
- Add your OpenRouter API key in settings
- Optionally configure MCP servers for extended automation
Settings & Analytics
The settings window (Menu bar icon → View Settings) includes three tabs:
Analytics Tab
View your transcription usage statistics:
- Total Words: Lifetime count of all transcribed words
- Lifetime Average: Average words per minute across all sessions
- Session Average: Average words per minute for current session
- Sessions: Total number of transcription sessions
- Hours Active: Total time spent transcribing
- Recent Activity: 48-hour hourly activity chart
Settings Tab
Configure your t2t installation:
- Theme: Toggle between light and dark mode
- OpenRouter API Key: Set your API key for agent mode
- AI Model Selection: Choose which model to use for agent mode
- Supports all OpenRouter models
- Auto-refresh available to fetch latest models
- MCP Servers: Add, configure, and manage MCP servers
- Test connections and view available tools
- Enable/disable servers individually
- Supports stdio, HTTP, and SSE transports
History Tab
See History & Logging section below.
MCP (Model Context Protocol) Support
When MCP servers are configured in settings, agent mode uses MCP instead of AppleScript. This enables:
- Extensible automation: Connect to any MCP-compatible service (databases, APIs, file systems, etc.)
- Tool-based execution: AI agent uses tools provided by your MCP servers
- Multiple servers: Connect to multiple MCP servers simultaneously
- Transport options: Supports stdio, HTTP, and SSE transports
To configure: Menu bar icon → View Settings → Settings tab → MCP Servers section. Requires an OpenRouter API key.
Vision Support & Automatic Screenshots
t2t automatically captures and includes a screenshot with every agent call, enabling vision-capable models to "see" your screen context. This works seamlessly with any model - vision-capable models process the image, while text-only models simply ignore it.
How It Works
- Automatic capture: When you use agent mode (Fn+Ctrl), a screenshot is captured before sending your prompt
- Universal support: Screenshots are included with all agent calls, regardless of model selection
- Smart routing: OpenRouter automatically routes to vision-capable models when available, or ignores the image for text-only models
- Seamless integration: Screenshots are included in the API request without any additional UI or user action
- Privacy: Screenshots are only sent to the API (not stored locally), and thumbnails are visible in History
Privacy & Permissions
- Screen Recording permission: macOS may prompt for screen recording permission the first time you use agent mode
- No local storage: Full screenshots are not saved to disk - they're only sent to the API
- Thumbnails: Small thumbnails (150x150px) are stored locally in History for reference
- Error handling: If screenshot capture fails (e.g., permission denied), the agent falls back to text-only mode
Technical Details
- Screenshots are captured using macOS
screencapturecommand - Images are encoded as base64 PNG and included in the OpenAI-compatible message format
- The screenshot is included in both initial requests and follow-up requests after tool execution
- Vision-capable models (GPT-4 Vision, Claude 3.5 Sonnet, etc.) can process the image to understand your screen context
History & Logging
t2t automatically logs all transcriptions and agent calls for review and debugging.
Features
- Transcription history: All voice transcriptions are saved with timestamps
- Agent call logging: Complete request/response logs for all OpenRouter API calls
- Screenshot thumbnails: Tiny thumbnails (150x150px) of screenshots captured with all agent calls
- Search: Fast local search across all history entries
- Expandable details: Click any entry to view full request/response JSON and tool calls
Accessing History
Menu bar icon → View Settings → History tab
Configuration
- History limit: Set
T2T_HISTORY_LIMITenvironment variable (default: 1000 entries) - Storage: History is stored locally in
history.jsonvia Tauri's store plugin - Privacy: All data stays on your machine - nothing is sent to external services
What's Logged
Transcriptions:
- Timestamp
- Transcribed text
Agent Calls:
- Timestamp
- Transcript (your voice input)
- Model used
- Full request JSON (messages, parameters)
- Full response JSON (AI output, tool calls)
- Tool calls executed (if any)
- Screenshot thumbnail (captured automatically with each agent call)
- Success/error status
First Run
On first launch, the app automatically downloads the Whisper model (~150MB) to ~/.cache/whisper/ggml-base.en.bin. This happens in the background.
For Developers
Setup
# Install dependencies (in desktop/)
cd desktop && bun install
# Development
bun dev # From root, or:
cd desktop && bun tauri dev
# Build
bun build # From root, or:
cd desktop && bun tauri build
Requirements
- Rust (install via rustup)
- Bun (recommended) or Node.js 18+
Tech Stack
- Frontend: Svelte 5 + SvelteKit
- Backend: Rust + Tauri
- STT: whisper-rs (local Whisper.cpp model)
- AI: OpenRouter API (direct calls, no infrastructure needed)
- MCP: Model Context Protocol client (local stdio/HTTP/SSE)
- Hotkey: macOS event monitoring (Fn key) + fallbacks
- Audio capture: native (Rust via cpal)
Architecture: Fully local. Only OpenRouter API calls go out. No servers, workers, or infrastructure required.
Debugging
- Logs:
~/Library/Logs/t2t.log - Model location:
~/.cache/whisper/ggml-base.en.bin - History storage:
history.json(via Tauri store, location depends on Tauri config)
License
MIT
Reviews (0)
Sign in to leave a review.
Leave a reviewNo results found