Velune-CLI
Health Warn
- License — License: Apache-2.0
- Description — Repository has a description
- Active repo — Last push 0 days ago
- Low visibility — Only 8 GitHub stars
Code Pass
- Code scan — Scanned 12 files during light audit, no dangerous patterns found
Permissions Pass
- Permissions — No dangerous permissions requested
No AI report is available for this listing yet.
VELUNE CLI is an open-source AI engineering CLI that unifies local LLMs (Ollama), cloud AI providers, MCP servers, tools, memory, and project context into a single developer workflow. Build, code, automate, and orchestrate AI with one extensible, provider-agnostic command-line interface.
Velune
Local-first multi-model AI developer CLI. Council-based agents,
persistent memory, repository cognition.
No cloud required. No quota. No lock-in.
What it does
Velune is a terminal-first AI coding assistant that runs a council of
specialized agents (Planner, Coder, Reviewer, Challenger, Synthesizer)
on your local machine using Ollama, or on free cloud tiers via Groq,
OpenRouter, and others.
Unlike Copilot or Cursor, Velune:
- Runs 100% locally with Ollama — no API key needed
- Remembers your codebase across sessions (persistent 5-tier memory)
- Reviews its own code using multiple specialized agents
- Works in any terminal on any project — no IDE required
60-second quickstart
Option A — Local (Ollama, free, no key)
# 1. Install Ollama
curl -fsSL https://ollama.com/install.sh | sh
# 2. Pull a model
ollama pull qwen2.5-coder:7b
# 3. Install Velune
pip install velune-cli
# 4. Initialize in your project
cd your-project
velune init
# 5. Start
velune
Option B — Cloud free tier (Groq, fastest, no GPU needed)
pip install velune-cli
velune init --provider groq
velune setup # enter your free Groq key
velune
Get a free Groq key at https://console.groq.com/keys — no credit card.
Installing the velune command
pip install velune-cli
velune --version
If your shell reports velune: command not found (or, on Windows,
"'velune' is not recognized…"), the install succeeded but your Python
scripts directory is not on PATH. Two reliable fixes:
Recommended — install with pipx (isolated env, auto-managed PATH):
pipx install velune-cliOr run it as a module (always works, no PATH changes needed):
python -m velune --version python -m velune # start the REPL
On Windows, a plain pip install puts the launcher in a per-user…\PythonXX\Scripts folder; re-running the Python installer with “Add
Python to PATH” checked (or using pipx) resolves it permanently.
Hardware requirements
| RAM | GPU | Can run local LLM? | Recommended setup |
|---|---|---|---|
| < 8 GB | any | ✗ No | Use Groq free tier |
| 8 GB | integrated | ⚠ 3B models only | Groq + phi3-mini local |
| 16 GB | integrated | ⚠ Slow (CPU only) | Groq + 3B local |
| 16 GB | 6–8 GB VRAM | ✓ 7B comfortable | qwen2.5-coder:7b |
| 32 GB | 12+ GB VRAM | ✓ 13B comfortable | Full council local |
| 36 GB | Apple Silicon | ✓ 27B comfortable | Full council, Metal accel |
| 64 GB | 24 GB VRAM | ✓ 70B capable | Max power mode |
Velune detects your hardware on startup and prints tier, GPU, and recommendations.
On underpowered machines it routes tasks to cloud providers automatically.
Startup flow
Velune starts instantly and does no work until you ask for it. Repository
cognition (indexing) is explicit and on-demand — it never runs automatically
on launch.
velune
↓
CLI opens instantly
↓
Connect a model → /model discover · /model connect <id> · /model use <id>
↓
Open a project → /project open <path> · /project status
↓
Run cognition → /cognition quick · /cognition standard · /cognition deep
Interface
Velune features a modern, clean terminal interface designed for productivity:
- Startup banner shows your hardware tier, active model, and available providers
- Responsive prompt with intelligent context indicators (only displays when relevant)
- Sophisticated color palette using restrained styling for clarity
- Intuitive command structure with tab-completion for all slash commands
- Session modes for balancing speed vs. quality (Optimus / Normal / Godly)
Providers
| Provider | Type | Cost | Models | Setup |
|---|---|---|---|---|
| Ollama | Local | Free | Any pulled model | Install Ollama, pull a model |
| LM Studio | Local | Free | Any GGUF / MLX model | Launch LM Studio server |
| Groq | Cloud | Free tier | Llama 3.3 70B, Mixtral, Gemma2 | velune setup → enter key |
| OpenRouter | Cloud | Pay-per-token | 100+ models | velune setup → enter key |
| OpenAI | Cloud | Pay-per-token | GPT-4o, GPT-4o Mini | velune setup → enter key |
| Anthropic | Cloud | Pay-per-token | Claude Opus, Sonnet, Haiku | velune setup → enter key |
| xAI (Grok) | Cloud | Pay-per-token | Grok 2, Grok 2 Mini | velune setup → enter key |
| Cloud | Free quota | Gemini 2.0 Flash, 1.5 Pro/Flash | velune setup → enter key |
|
| Together AI | Cloud | Pay-per-token | Llama 3.3 70B, Qwen 2.5 Coder, DeepSeek R1 | velune setup → enter key |
| Fireworks AI | Cloud | Pay-per-token | DeepSeek R1, Qwen 2.5 Coder, Mixtral 8x22B | velune setup → enter key |
| Mistral | Cloud | Pay-per-token | Mistral Large, Codestral, Mixtral | velune setup → enter key |
| DeepSeek | Cloud | Pay-per-token | DeepSeek R1, DeepSeek Coder | velune setup → enter key |
| Cohere | Cloud | Pay-per-token | Command R+, Command R | velune setup → enter key |
| NVIDIA NIM | Cloud | Pay-per-token | Llama, Mistral, and other NIM models | velune setup → enter key |
| HuggingFace | Cloud | Free/paid | Open models via Inference API | velune setup → enter key |
Keys are stored in your OS keyring — never in files, never in git.
Commands
CLI (terminal, before the REPL)
# Core
velune # Start the interactive REPL session
velune chat # Same as above (explicit form)
velune run "<task>" # Run a task non-interactively and exit
velune ask "<question>" # Ask a one-shot question and exit
velune init # Initialize Velune in a project directory
# Workspace & sessions
velune workspace init # Index the current workspace
velune workspace status # Show index freshness and file counts
velune workspace graph # Render the workspace dependency graph
velune workspace list # List all known workspaces
velune session list # List saved chat sessions
velune session delete <id> # Delete a saved session
# Setup & models
velune setup # Configure API keys (stored in OS keyring)
velune models scan # Discover all available local and cloud models
velune models list # List discovered models
velune provider list # Show all configured providers and their status
velune provider add # Add a new provider interactively
velune config show # Print effective velune.toml settings
velune config set <k> <v> # Write a setting to velune.toml
# Analytics & monitoring
velune usage # Token usage and estimated cost for recent sessions
velune quota # Check provider rate-limit and quota status
velune health # Check provider reachability and response time
# Diagnostics
velune doctor # Run full environment health check
velune logs # View recent execution event log
velune logs live # Follow new events as they are written
velune status # Index freshness and workspace health snapshot
velune pipeline "<query>" # Trace a retrieval query through the search pipeline
velune daemon start # Start the background Velune service
velune daemon stop # Stop the background service
velune mcp serve # Expose Velune's tool council as an MCP server
velune mcp connect <url> <name> # Connect to an external MCP server and list tools
velune memory inspect # Show memory tier sizes and record counts
velune memory clear # Clear all memory tiers for the current workspace
Inside the REPL
─── Session ───────────────────────────────────────────────────────────────────
/help Show all commands and their aliases
/exit Exit Velune
/clear Clear the terminal screen (context is preserved)
/new [title] Start a new conversation (project memory persists)
─── Workspace & Cognition ─────────────────────────────────────────────────────
/project open <path> Open a directory as the active workspace
/project close Close the current workspace
/project status Show the active workspace and its cognition state
/project list List recently-opened workspaces (reopen instantly)
/project add <path> Remember a workspace without switching to it
/cognition quick Fast scan — manifests only, no symbol index
/cognition standard Build a full symbol index for the workspace
/cognition deep Deep index (symbols + import graph + relationships)
/cognition status Show index freshness and progress
/cognition cancel Cancel a running cognition job
/cognition rebuild Discard and rebuild the index from scratch
─── Council / Execution ───────────────────────────────────────────────────────
/run <task> Execute a task through the Reasoning Council
/run --bg <task> Submit task to background — prompt returns immediately
/council <task> Force full council tier regardless of task complexity
/jobs List all background jobs (ID, status, phase, elapsed)
/jobs cancel <id> Cancel a running background job
/dashboard Live progress dashboard (jobs + alerts + provider health)
─── Models ────────────────────────────────────────────────────────────────────
/model [model-id] Switch active model (arrow-key picker if no arg)
/model discover Discover locally available models (e.g. Ollama)
/model connect <id> Register/connect a model to the active session
/model use <id> Set the active model by id
/model status Show the active model and connection state
/model remove <id> Remove a model from the registry
/models List all available models
/pull [model-id] Download an Ollama model with live progress
/delete <model-id> Delete a locally installed Ollama model
/councilmodel Assign specific models to Planner / Coder / Reviewer roles
/bench [run] View or run empirical model capability benchmarks
─── Session Modes ─────────────────────────────────────────────────────────────
/optimus Speed mode — instant tier, smallest model, 4k context
/godly Max power — full council, largest model, 128k context
/normal Return to balanced mode (auto-tier, 16k context)
/mode Show current mode settings and active council tier
─── Memory & Context ──────────────────────────────────────────────────────────
/memory [clear|stats] Inspect or clear memory tiers
/session Interactive session picker (list / resume / save / export)
/context Show context window usage for the current conversation
/graph Render a tree of knowledge graph entities
─── Diffs & Editing ───────────────────────────────────────────────────────────
/diff Show uncommitted file changes from the last council run
/undo Revert the last Velune-generated git commit (keeps changes staged)
/hunk Toggle hunk-by-hunk review mode for edits
/approve [safe|ask|block] Set tool/command approval gate
─── Git Integration ───────────────────────────────────────────────────────────
/push [--force] Push current branch to remote origin
/pr <title> Create a pull request / merge request on GitHub or GitLab
/issue <number> Fetch a GitHub/GitLab issue and inject it as context
/sandbox [docker|status] Show or switch sandbox type (subprocess or Docker)
─── Code Intelligence ─────────────────────────────────────────────────────────
/lint [file] Lint a Python file and display diagnostic output
/refactor <file> Detect code smells and suggest refactoring targets
/typify <file> Suggest type hints for unannotated functions
─── MCP & Plugins ─────────────────────────────────────────────────────────────
/mcp [servers|tools|resources|connect|disconnect] Inspect MCP servers and tools
/plugin [list|enable|disable|reload|show] Manage declarative plugins
─── Diagnostics ───────────────────────────────────────────────────────────────
/doctor Run environment health checks
/config Show current configuration settings
/stats Session statistics: tokens, cost, turns, uptime
/history Show REPL command execution history
/hooks List active lifecycle hooks and their configuration
Tab-completion is active for all / commands and for model IDs (type /model then Space to trigger).
The status bar shows ⚙ N bg for active background jobs and ⚠ N for unread proactive alerts. Alerts drain automatically after each prompt and are printed as Rich panels above the input line.
Architecture overview
velune/
├── cli/ REPL, slash commands, banner, autocomplete, session manager
│ ├── commands/ Typer subcommands (workspace, session, models, doctor, mcp, …)
│ ├── display/ Live dashboards and council pipeline view
│ └── rendering/ Rich error panels and markdown streaming
├── providers/ 15 provider adapters (Ollama, Groq, OpenAI, Anthropic, Mistral, …)
│ ├── adapters/ Per-provider inference + streaming implementations
│ └── discovery/ Model catalog discovery for each provider
├── cognition/ Council: Planner → Coder → Reviewer → Challenger → Synthesizer
│ └── council/ DebateSession, CouncilRunner, per-role agents, tier classifier
├── memory/ 5-tier: working → episodic → semantic → graph → lineage
├── proactive/ Alert store + watcher (CognitiveBus event subscriptions)
├── repository/ AST indexing, import graph, blast-radius estimator, .veluneignore
├── retrieval/ Hybrid retrieval: BM25 + vector + graph, cross-encoder reranker
├── execution/ Managed execution (allowlist + limits), diff preview, rollback
│ └── edit_formats/ Diff format parsers (unified, search-replace, XML, JSON)
├── analysis/ Code intelligence: linting, code-smell detection, type inference
├── integrations/ GitHub and GitLab REST clients (push, PR, issues)
├── hooks/ Lifecycle hook dispatcher and executor (pre/post tool events)
├── observability/ Context reports, execution trace log, workspace dependency graph
├── mcp/ MCP server + client; stdio / SSE / HTTP / WebSocket transports
├── hardware/ Hardware detection, tier classification, GPU probe
├── telemetry/ Token tracking, cost estimation, latency profiling
├── models/ Model registry, capability scoring, specializations
├── context/ Context window tracking, token counting, extractive compression
├── orchestration/ ContextOrchestrationEngine — wires intent → council → output
├── core/ Loop detector, retry policy, task/job registry, error types
├── kernel/ Bootstrap, lifecycle coordinator, service container
├── daemon/ Background Velune service (server + IPC transport)
├── tools/ File-system, git, web-fetch, and terminal tool implementations
└── plugins/ Declarative plugin loader, SKILL.md injection, hook wiring
Memory system
Velune maintains five memory tiers across sessions:
- Working — current conversation turns (in-process, TTL-evicted)
- Episodic — session history (SQLite, persisted to
~/.velune/) - Semantic — vector search over past interactions (local LanceDB and Qdrant)
- Graph — repository structure and symbol relationships
- Lineage — decision history, what was tried and why
This means "fix the auth issue from yesterday" actually works —
Velune retrieves recent sessions, git changes, and related context
to reconstruct intent without you explaining it again.
Session modes
| Mode | Command | Council tier | Model | Context cap |
|---|---|---|---|---|
| Normal | /normal |
auto | current | 16 k tokens |
| Optimus | /optimus |
instant | smallest | 4 k tokens |
| Godly | /godly |
full | largest | 128 k tokens |
Switch modes at any time mid-session. The prompt badge updates immediately.
MCP integration
Velune works as both an MCP server and an MCP client:
- Server (
velune mcp serve) — exposes Velune's local tool council over
stdio so Claude Desktop, VS Code, and other MCP-capable editors can call
Velune's models without sending your code to a third party. - Client (
velune mcp connect <url> <name>) — connects to any external MCP
server, lists its tools, and makes them available inside the REPL via/mcp. - Transports — stdio, SSE, HTTP, and WebSocket (
ws:///wss://) are all
supported. Servers can also be declared in.mcp.jsonand loaded automatically.
Outbound connections to external MCP servers are trust-gated — see
MCP trust gating in the security policy.
Windows
Velune runs natively on Windows (supporting native command execution sandboxing, local Ollama integration, and keyring credentials). It can also run via WSL2 if preferred.
Project docs
| Doc | What's inside |
|---|---|
| docs/SECURITY.md | Security posture, trust boundaries, reporting |
| docs/CONTRIBUTING.md | Dev setup and contribution workflow |
| docs/CHANGELOG.md | Full version history |
Optional extras
The default install is intentionally lean and pure-python-friendly so it
resolves fast and cleanly on every platform. Heavy or feature-specific
dependencies live in extras — every feature that needs one degrades
gracefully when it is absent (e.g. semantic search becomes a no-op, but
lexical search and chat keep working).
| Extra | Installs | Enables |
|---|---|---|
[rag] |
lancedb, pyarrow, qdrant-client |
Semantic memory + vector retrieval (large compiled wheels) |
[parsing] |
tree-sitter + grammars |
Tree-sitter source parsing for deep repository cognition |
[telemetry] |
opentelemetry-* |
Export spans/metrics to an OTLP collector |
[git] |
gitpython |
Git provider tools (push / PR / issue) and richer git context |
[gguf] |
gguf |
GGUF file metadata reading — safe, no transitive risk |
[docker] |
docker |
Docker sandbox for isolated code execution |
[all] |
everything above | Full-featured install |
[dev] |
Test/lint tools | For contributors |
pip install velune-cli # lean base (Ollama, cloud providers, chat, lexical search)
pip install 'velune-cli[rag]' # + semantic memory & vector retrieval
pip install 'velune-cli[all]' # + every optional feature
The former
[llamacpp]extra has been permanently removed:llama-cpp-pythonpulls indiskcache ≤ 5.6.3(unsafe pickle
deserialization, no patched version). Installllama-cpp-pythonmanually,
in a trusted single-user environment only, if you accept that risk.
Contributing
See CONTRIBUTING.md.
Before opening a PR:
pip install -e ".[dev]"
ruff check velune/
pytest tests/ -q
Report security issues via
GitHub Security Advisories
— not public issues.
License
Apache 2.0 — see LICENSE.
Copyright 2026 Surya HA
Reviews (0)
Sign in to leave a review.
Leave a reviewNo results found