voicetree
The spatial IDE for recursive multi-agent orchestration. It's like an Obsidian graph-view that you work directly inside of.
Voicetree
Obsidian meets Claude Code
Voicetree is an interactive graph-view where nodes are either markdown notes, or terminal based agents (Claude code, Codex, OpenCode, Gemini etc. )
Agents can spawn their own subagents onto the graph. Agents will have the nearby nodes injected into their context.
Agents are also able to edit and create their own nodes.
This project aims to build from first principles the most possibly efficient human-AI interaction system.

Why?
| Challenge | Voicetree Solution |
|---|---|
| Manual agent coordination | Agents can breakdown tasks into subgraphs and recursively spawn children terminals |
| 4-10 agent terminals is overwhelming | Spatially organise agents, tasks and progress on the graph |
| Agents don't know what you know | You share the same memory graph with agents |
| Agents suffer context-rot and lack memory | Defaults to short, focussed sessions with automatic handover |
Install
Download links macOS (Apple Silicon) | macOS (Intel) | Windows | Linux
MacOS
brew tap voicetreelab/voicetree && brew install voicetree
Linux
curl -fsSL https://raw.githubusercontent.com/voicetreelab/voicetree/main/install.sh | sh
Windows:
https://github.com/voicetreelab/voicetree/releases/latest/download/voicetree.exe
How It Works
Your agents (Claude Code, Codex, Opencode, Gemini etc.) live inside the graph, next to their tasks, plans, and progress updates.
Context retrieval: Agents see all nodes within a configurable radius and can semantic search against local embeddings.
Spatial layout: Location-based memory is the most efficient way to remember things.
Externalized working memory: Each node represents a concept at any level of abstraction. The graph structure mirrors your mental model - relationships between ideas are represented exactly as you think about them, offloading cognitive load to the canvas.
In Detail
Nodes are markdown files, connections are wikilinks to the .md file paths. You open rich markdown editors directly within the graph by hovering over a node, (or use speech-to-graph mode).
You can spawn coding agents on a node, the contents of that node will become the agents task, and it will also be given all context within an adjustable distance around them, and can semantic search against local embeddings. This means agents see what you see. You share the same memory, the same second brain.
The graph structure allows for context retrieval to be targeted to only what is most relevant rather than dumping entire conversation history - avoiding the 30-60% performance degradation from context rot[^1].
Agents can build their own subgraphs, decomposing their tasks into small connected chunks of work. You can glance at the high-level structure and progress of these, and zoom in to the details of what matters most.
For example, ask a Voicetree agent to divide their plan into nodes of data-model, architecture, pure logic, edge logic, UI components, and integration. This lets you carefully track the planing to implementation for what matters most: the high level changes & core logic.
Agents can then spawn and orchestrate their own parallel subagents to work through these dependency graphs. In Voicetree, subagents are just native terminals so you have full transparency and control over them unlike with other CLI agents.
As your project & context grows, the Voicetree approach scales. You use your brains most efficient form of memory: remembering the location of where things are.
Each node can represent any concept at any level of abstraction. You can see and reason about the structure between these concepts more easily as it is represented exactly as your brain represented them. This lets you externalise your working memory, freeing up cognitive load for the real problem-solving.
Voice Mode
Capture ideas hands-free with speech-to-graph.
Why speaking works: Speaking activates deliberate (System 2) thinking - verbalizing forces you to think about what you are doing. Japanese train conductors use "point and calling" (shisa kanko) to reduce errors by 85% for the same reason. Speech also engages different brain regions than writing, with lower cognitive load for idea generation. It's usually messy and hard to store/retrieve, so we turn voice into a structured mindmap.
Backtracking without mental load: Go arbitrarily deep down a problem. The graph holds the chain of "why am I doing this?" so you don't have to.
Tangibility: Thought becomes visible and persistent. This isn't just documentation; Making progress tangible is a prerequisite for flow states.
Development
Prerequisites: Node.js 18+, Python 3.13, uv
cd webapp && npm install && npm run electron # App
uv sync && uv run pytest # Backend
License
BSL 1.1, converts to Apache 2.0 after 4 years. See LICENSE.
Telemetry
We collect anonymous usage telemetry. You can disable this by setting VITE_DISABLE_ANALYTICS=true in webapp/.env. You can read more about this here.
Contact
Questions? Join the Discord. Feedback is valuable - ping us with thoughts, criticisms, or feature requests.
[^1]: Chroma Research, "Context Rot: How Increasing Input Tokens Impacts LLM Performance" (July 2025). 30-60% performance gaps between focused (~300 token) and full (~113k token) prompts. https://research.trychroma.com/context-rot
Reviews (0)
Sign in to leave a review.
Leave a reviewNo results found