OpenClaw Optimization Guide

Name: openclaw-optimization-guide
Author: OnlyTerp

Make Your OpenClaw AI Agent Faster, Smarter, and Actually Useful

Speed optimization, memory architecture, context management, model selection, and one-shot development for OpenClaw

Speed - Trim context files, add fallbacks, manage reasoning mode
Context Bloat - Quadratic scaling, built-in defenses
Cron Session Bloat - Session file accumulation, cleanup
Memory - 3-tier memory system, Ollama vector search
Orchestration - Sub-agent delegation, CEO/COO/Worker model
Models - Provider comparison, pricing, local models
Web Search - Tavily, Brave, Serper, Gemini grounding
One-Shotting Big Tasks - Research-first methodology
Vault Memory System - Structured knowledge graph, MOCs, cross-session continuity
State-of-the-Art Embeddings - Upgrade from nomic to Qwen3-VL, Stark Edition server, Windows gotchas
Auto-Capture Hook - Automatic knowledge extraction after every session, no manual memory writes
Self-Improving System - Micro-learning loop that compounds forever, $0/day
Memory Bridge - Give coding agents (Codex/Claude Code) access to your vault knowledge
Quick Checklist - 30-minute setup checklist
Infrastructure Hardening - Compaction crash loops, GPU contention, Gemini Flash purge, Tavily migration
The One-Shot Prompt - Copy-paste automation prompt

📊 Benchmarks — Real numbers from a production system (context savings, search latency, reindex results, SWE-bench rankings)

📁 Example Vault — Populated mini-vault showing MOCs, wiki-links, Agent Notes, and .learnings/ after 2 weeks of use

The Problem

If you're running a stock OpenClaw setup, you're probably dealing with:

Freezing and hitting context limits. Bloated workspace files exhaust the context window mid-response.
Slow responses. 15-20KB+ of context injected every message = hundreds of milliseconds of latency per reply.
Forgetting everything. New session = blank slate. No memory of yesterday's work or decisions.
Inconsistent behavior. Without clear rules, personality drifts between sessions.
Doing everything the expensive way. Main model writes code, does research, AND orchestrates - all at top-tier pricing.
Flying blind. No web search means guessing at anything after training cutoff.
Wrong model choice. Using whatever was default without considering the tradeoffs.

What This Fixes

After this setup:

Metric	Before	After
Context per msg	15-20 KB	4-5 KB
Time to respond	4-8 sec	1-2 sec
Memory recall	Forgets daily	Remembers weeks
Token cost/msg	~5,000 tokens	~1,500 tokens
Long sessions	Degrades	Stable
Concurrent tasks	One at a time	Multiple parallel

How It Works

You ask a question
    ↓
Orchestrator (main model, lean context ~5KB)
    ↓
┌─────────────────────────────────────────┐
│  memory_search() - 45ms, local, $0     │
│  ┌─────────┐  ┌──────────┐  ┌────────┐ │
│  │MEMORY.md│→ │memory/*.md│→ │vault/* │ │
│  │(index)  │  │(quick)   │  │(deep)  │ │
│  └─────────┘  └──────────┘  └────────┘ │
└─────────────────────────────────────────┘
    ↓
Only relevant context loaded (~200 tokens)
    ↓
Fast, accurate response + sub-agents for heavy work

The key insight: Workspace files become lightweight routers, not storage. All knowledge lives in a local vector database. The bot loads only what it needs - not everything it's ever learned.

What The Optimized Files Look Like

Full versions in /templates:

SOUL.md (772 bytes - injected every message):

## Who You Are
- Direct, concise, no fluff. Say the useful thing, then stop.
- Have opinions. Disagree when warranted. No sycophancy.

## Memory Rule
Before answering about past work, projects, people, or decisions:
run memory_search FIRST. It costs 45ms. Not searching = wrong answers.

## Orchestrator Rule
You coordinate; sub-agents execute. Never write 50+ lines of code yourself.

MEMORY.md (581 bytes - slim pointer index):

## Active Projects
- Project A → vault/projects/project-a.md
- Project B → vault/projects/project-b.md

## Key People
- Person A - role, relationship → vault/people/person-a.md

Details live in vault/. The bot finds them via vector search in 45ms.

This isn't a settings tweak - it's a complete architecture change: memory routing, context engineering, and orchestration working together. The one-shot prompt at the bottom does the entire setup automatically.

Note: Tested on Claude Opus 4.6. Other frontier models should work if they can follow multi-step instructions.

Templates included: Check /templates for ready-to-use versions of SOUL.md, AGENTS.md, MEMORY.md, TOOLS.md, and a sample vault/ structure.

Part 1: Speed (Stop Being Slow)

Every message you send, OpenClaw injects ALL your workspace files into the prompt. Bloated files = slower, more expensive replies. This is the #1 speed issue people don't realize they have.

Why Trimming Works

You don't need big files once you have vector search.

Old approach: Stuff everything into MEMORY.md so the bot "sees" it every message → 15KB+ context, slow responses, wasted tokens on irrelevant info.

New approach: MEMORY.md is a slim index of pointers. Full details live in vault/. memory_search() finds them instantly via local Ollama embeddings ($0). Your workspace files stay tiny without losing any knowledge.

Trim Your Context Files

File	Target Size	What Goes In It	Why This Size
SOUL.md	< 1 KB	Personality, tone, core rules	Injected EVERY message - every byte costs latency
AGENTS.md	< 2 KB	Decision tree, tool routing	Needs to fit in working memory
MEMORY.md	< 3 KB	Pointers only - NOT full docs	Vector search replaces big files
TOOLS.md	< 1 KB	Tool names + one-liner usage	Just reminders, not documentation
Total	< 8 KB	Everything injected per message	Down from 15KB+ = 50-66% faster

Rule: If it's longer than a tweet thread, it's too long for a workspace file. Move the details to vault/.

Add a Fallback Model

"fallbackModels": ["your-provider/faster-cheaper-model"]

OpenClaw automatically switches when your main model is rate-limited or slow.

Reasoning Mode - Know the Tradeoff

Run /status to see your current reasoning mode.

Off - fastest, no thinking phase
Low - slight thinking, faster responses
High - deep reasoning, adds 2-5 seconds but catches things low/off misses

I run high and keep it there. The context trimming from other steps more than compensates for the reasoning overhead.

Disable Unused Plugins

Every enabled plugin adds overhead. If you're not using memory-lancedb, memory-core, etc., set "enabled": false.

Ollama Housekeeping

ollama ps        # Check what's loaded
ollama stop modelname  # Unload idle big models

The default model for memory search is nomic-embed-text (300 MB). If you have a GPU with 16GB+ VRAM, upgrade to Qwen3-VL-Embedding-8B for dramatically better search quality — see Part 10.

Part 2: Context Bloat (The Silent Performance Killer)

The Quadratic Problem

LLM attention scales quadratically with context length:

2x the tokens = 4x the compute cost
3x the tokens = 9x the compute cost

When context goes from 50K to 100K tokens, the model does four times the work. That means slower responses and higher bills.

What Happens at 50% of Your Context Window

Just because a model advertises 1M context doesn't mean it performs well at 1M:

11 of 12 models tested dropped below 50% accuracy by 32K tokens
GPT-4.1 showed a 50x increase in response time at ~133K tokens
Models exhibit "lost-in-the-middle" bias - they track the beginning and end but lose the middle
Effective context is usually a fraction of the max

Where Bloat Comes From

Source	Typical Size	Injected When
System prompt	2-5 KB	Every message
Workspace files	5-20 KB	Every message
Conversation history	Grows per turn	Every message
Tool results	1-50 KB each	After tool calls
Skill files	1-5 KB each	When skill activates

Tool spam is the worst offender. A single exec returning a large file = 20K+ tokens permanently in your session. Five tool calls = 100K tokens of context the model re-reads every message.

The Cost Math

Lean (5K tokens/msg)   → Claude Opus: $0.025/msg
Bloated (50K tokens/msg) → Claude Opus: $0.25/msg   ← 10x more
Over 100 msgs/day: $2.25/day vs $22.50/day

Built-In Defenses

Session Pruning - Trims old tool results from context:

{
  "agents": {
    "defaults": {
      "contextPruning": { "mode": "cache-ttl", "ttl": "5m" }
    }
  }
}

Auto-Compaction - Summarizes older conversation when nearing context limits. Trigger manually with /compact.

Use both. Pruning handles tool result bloat. Compaction handles conversation history bloat.

Context Bloat Checklist

Workspace files under 8 KB total
Context pruning enabled (mode: "cache-ttl")
Use /compact proactively when sessions feel slow
Use /new when switching topics entirely
Delegate heavy tool work to sub-agents (their context is separate)
Monitor with /status - stay under 10-15% of your model's context window

Part 3: Cron Session Bloat (The Hidden Killer)

Every cron job creates a session transcript file (.jsonl). Over time:

30 cron jobs × 48 runs/day × 30 days = 43,200 session files
The sessions.json index balloons, slowing session management

How to Spot It

# Linux/Mac
ls ~/.openclaw/agents/*/sessions/*.jsonl | wc -l

# Windows (PowerShell)
(Get-ChildItem ~\.openclaw\agents\*\sessions\*.jsonl).Count

Thousands of files = cron session bloat.

The Fix

1. Configure session rotation:

{ "session": { "maintenance": { "rotateBytes": "100mb" } } }

2. Clean up old sessions:

openclaw sessions cleanup

3. Use isolated sessions for cron:

{ "sessionTarget": "isolated", "payload": { "kind": "agentTurn", "message": "Do the thing" } }

Isolated sessions don't pile up in your main agent's session history.

Prevention > Cleanup

Use delivery: { "mode": "none" } on crons where you don't need output announced
Keep cron tasks focused - 1 tool call generates 15x less session data than 15

Part 4: Memory (Stop Forgetting Everything)

Out of the box, OpenClaw forgets everything between sessions. The fix is a 3-tier memory system.

The Architecture

MEMORY.md          ← Slim index (< 3 KB), pointers only
memory/            ← Auto-searched by memory_search()
  projects.md
  people.md
  decisions.md
vault/             ← Deep storage, searched via memory
  projects/
  people/
  decisions/
  lessons/
  reference/
  research/

How It Works

MEMORY.md - table of contents with one-liner pointers. Never put full documents here.
memory/*.md - automatically searched when the bot calls memory_search("query").
vault/ - deep storage for detailed project docs, research notes, full profiles.

Setting It Up

Step 1: Install Ollama + embedding model

# Windows: winget install Ollama.Ollama
# Mac/Linux: curl -fsSL https://ollama.com/install.sh | sh
ollama pull nomic-embed-text

OpenClaw detects Ollama on localhost:11434 automatically. No config needed.

GPU users: For a major quality upgrade (768-dim → 4096-dim vectors), see Part 10: State-of-the-Art Embeddings.

Step 2: Create the directory structure

workspace/
  MEMORY.md
  memory/
  vault/
    projects/  people/  decisions/  lessons/  reference/  research/

Step 3: Slim down MEMORY.md

# MEMORY.md - Core Index
_Pointers only. Search before answering._

## Active Projects
- Project A → vault/projects/project-a.md

## Key Tools
- Tool X: `command here`

## Key Rules
- Rule 1

Step 4: Move everything else to vault/

Every detailed document → vault/. Leave a one-liner pointer in MEMORY.md or memory/.

The Golden Rule

Add this to your SOUL.md:

## Memory
Before answering about past work, projects, or decisions:
run memory_search FIRST. It costs 45ms. Not searching = wrong answers.

Part 5: Orchestration (Stop Doing Everything Yourself)

Your main model should NEVER do heavy work directly. It should plan and delegate to cheaper, faster sub-agents.

The Mental Model

You = CEO (gives direction)
Your Bot (main model) = COO (plans, coordinates, makes decisions)
Sub-agents (cheaper/faster model) = Workers (execute tasks fast and cheap)

Add This to AGENTS.md

## Core Rule
You are the ORCHESTRATOR. You coordinate; sub-agents execute.
- Code task (3+ files)? → Spawn coding agent
- Research task? → Spawn research agent
- 2+ independent tasks? → Spawn ALL in parallel

## Model Strategy
- YOU (orchestrator): Best model - planning, judgment, synthesis
- Sub-agents (workers): Cheaper/faster model - execution, code, research

Your expensive model decides WHAT to build. The cheap model builds it. Right model, right job.

Give Coding Agents Your Brain

Before spawning any coding sub-agent, run the Memory Bridge preflight to inject relevant vault knowledge into the project directory:

node scripts/memory-bridge/preflight-context.js --task "Build auth middleware" --workdir ./my-project

This writes a CONTEXT.md that the coding agent reads automatically — giving it access to your past decisions, error patterns, and architecture choices. See Part 13 for the full setup.

Part 6: Models (What to Actually Use)

The Model Strategy

Role	What It Does	Best Model(s)	Why
Orchestrator	Plans, judges, coordinates	Claude Opus 4.6	Best complex reasoning + tool use
Daily driver	General assistant	Claude Sonnet 4.6, Gemini 3.1 Pro	Great quality, lower cost
Sub-agents	Execute delegated tasks	Gemini 3 Flash, Kimi K2.5, MiMo V2 Pro	Fast, cheap, capable enough
Coding (hard)	Architecture, complex bugs	Claude Opus 4.6	#1 SWE-bench (1549) — best coding model alive
Coding (batch)	Scaffolding, CRUD, refactors	GPT-5.4 Codex	Fast, $0 on subscription, good with Memory Bridge
Research	Web search, analysis	Gemini 2.5 Flash + Tavily	Built-in grounding
Free tier	Zero-cost operations	Gemini (all variants), Groq open models	$0 with generous limits

Model Deep Dive

Claude Opus 4.6 - The Best Orchestrator

Unmatched multi-step reasoning and complex tool use
Follows long, nuanced system prompts better than any other model
1M context window with prompt caching (up to 90% savings on cached tokens)
Cost: $5/M input, $25/M output, $0.50/M cached | Max ($100/mo): included - best value for heavy use

Claude Sonnet 4.6 - The Sweet Spot

80% of Opus quality at 20% of the cost. Strong at coding
Cost: $3/M input, $15/M output | Pro ($20/mo): included

💡 Pro tip: Don't pay API rates for Claude if you have a subscription. Pro ($20/mo) covers Sonnet, Max ($100/mo) covers Opus. For power users, Max is the best value in AI right now.

Gemini 3.1 Pro / 3 Pro - Free Powerhouse

Competitive with Sonnet on most tasks - and it's free. 1M context, multimodal.
Weaker than Claude on complex agentic tool-use chains.

Gemini Flash (2.5 / 3) - Speed Demon

Fastest responses of any capable model. Perfect for sub-agents. Free.

GPT-5.3 / 5.4 Pro - OpenAI's Best

Codex models are purpose-built for code - fast and cheap.
Cost: GPT-5.3: $1.75/M input, $14/M output | GPT-5.4 Pro: $30/M input, $180/M output

Grok 4 / 4.1 Fast - The Dark Horse

Grok 4.20 has a massive 2M context window. Grok 4.1 Fast is insanely cheap.
Cost: Grok 4: $3/M in, $15/M out | Grok 4.1 Fast: $0.20/M in, $0.50/M out

Kimi K2.5 - Budget Sub-Agent King

262K context, multimodal, $0.45/M input, $2.20/M output - excellent price-to-performance.

MiMo V2 Pro (Xiaomi) - The Sleeper

1T parameter model, 1M context. Great for agentic sub-agents on a budget. $1/M in, $3/M out.

OpenRouter: The Model Marketplace

OpenRouter gives you dozens of models through one API key. Notable options:

openrouter/free - auto-routes to the best free model for your request. Perfect for $0 sub-agents.
MiMo V2 Pro - Currently free (launch promotion). Add: openrouter/xiaomi/mimo-v2-pro
Kimi K2.5 - Budget powerhouse. Add: openrouter/moonshotai/kimi-k2.5
Perplexity Sonar - Built-in web search, no separate tool needed. Add: openrouter/perplexity/sonar

Local Models: $0 Forever, No Rate Limits

If you have a GPU, local models via Ollama = unlimited inference at zero cost.

Qwopus (Qwen 3.5 27B + Claude Opus reasoning distilled) - Opus-style thinking on a single 4090. ollama pull qwopus
NVIDIA Nemotron Nano 4B - Punches above its weight, 128K context, fits on any GPU. ollama pull nemotron-nano

Using Anthropic Membership (The Best Way)

Your Claude Pro/Max subscription includes API access. OpenClaw can use it directly:

1. Run `claude` in terminal → login via browser (OAuth)
2. Run `openclaw onboard` → detects your credentials → uses membership
3. Done. No separate API key needed.

Recommended Setups

Budget ($0/month):

Main: Gemini 3.1 Pro (free) | Sub-agents: Gemini 3 Flash | Local: Nemotron Nano 4B

Balanced (~$20/month - Claude Pro):

Main: Sonnet 4.6 (membership) | Fallback: Gemini 3.1 Pro | Sub-agents: Flash / Kimi K2.5

Power (~$100/month - Claude Max):

Main: Opus 4.6 (membership) | Fallback: Gemini 3.1 Pro | Sub-agents: Kimi / MiMo / Flash
Code (hard): Opus directly | Code (batch): Codex + Memory Bridge
Self-improving: .learnings/ micro-loop ($0) | Memory: Qwen3-VL on local GPU

Pro Tips

Always set 2-3 fallbacks. Auto-switch beats breaking.
Match model to task. Don't use Opus for scripts. Don't use Flash for architecture.
Enable prompt caching on Anthropic: cacheRetention: "extended" + cache-ttl pruning.
Membership > API keys. If you're paying for Pro/Max, use it via OAuth. Don't pay twice.
Free models are real. Gemini's free tier is legitimately good for daily driving.

Part 7: Web Search (Give Your Agent Eyes on the Internet)

Without web search, your agent guesses at anything after its training cutoff.

The Players

Provider	Price per 1K queries	Free Tier	Best For	LLM-Optimized
Tavily	~$8	1,000/month	AI agents, RAG	✅ Built for it
Brave Search	$5	$5 credit/month	Privacy, scale	✅ LLM Context mode
Serper	$1-3	2,500 credits	Budget, speed	Partial
SerpAPI	$25-75/month	100/month	Multi-engine	Partial
Gemini Grounding	Free	Included	Google ecosystem	✅ Native
Perplexity Sonar	$3/M in, $15/M out	Via OpenRouter	Research synthesis	✅ Built for it

Why We Use Tavily

Built for AI agents. Returns clean, structured, pre-processed content - not a list of links. One API call → usable answer. No fetching/parsing extra steps.
Search + Extract + Crawl in one API. Fewer tools, fewer context-eating tool calls.
Depth control. Basic (1 credit, fast) vs Advanced (2 credits, comprehensive) - per query.
Usable free tier. 1,000 credits/month = enough for a personal assistant that searches a few times daily.
Built-in safety. Guards against prompt injection from search results and PII leakage.

Setting Up Tavily

Get a free API key at tavily.com (30 seconds)
Add to TOOLS.md: Tavily Search: For grounded web research. Basic for lookups, advanced for deep research.
For research sub-agents, include Tavily in task instructions

When to Use What

Need	Use
Real-time facts/news	Tavily (basic) or Gemini grounding
Deep research + full articles	Tavily (advanced + extract)
Privacy-first search	Brave Search API
Structured results, budget	Serper ($1/1K)
Search in model response	Perplexity Sonar
Free and good enough	Gemini grounding

Part 8: One-Shotting Big Tasks (Stop Iterating, Start Researching)

Most people type a vague prompt, iterate 15 times, burn context and money, end up at 60% quality. The model isn't the problem - your prompt is.

The Data

Vague prompts → 1.7x more issues, 39% more cognitive complexity, 2.74x more security vulnerabilities
Detailed specifications → 95%+ first-attempt accuracy

The quality of your output is capped by the quality of your input.

Why Iteration Fails

Burns context - each correction adds to history, pushing toward bloat
Confuses the model - contradictory instructions across rounds
Pays twice - you paid for the bad output AND the correction
Loses coherence - by iteration 8, the agent forgot iteration 1 (lost-in-the-middle)

The Method: Research → Spec → Ship

Phase 1: Research (30-60 minutes)

Before building, know what "good" looks like:

Find best examples - Search for top 3-5 implementations, study their tech stack and shared features
Analyze UI patterns - Screenshot the best UIs, note layouts, color schemes, component patterns
Study the tech stack - Pick the stack the best implementations use, not your default
Find the pitfalls - Search for common mistakes. Every pitfall in your prompt = one fewer iteration

Phase 2: Write the Spec (15-30 minutes)

Turn research into a blueprint:

# Project: [Name]

## Context
[What this is, who it's for, why it exists]

## Research Summary
[Key findings - what the best implementations do]

## Tech Stack
- Framework: [choice based on research]
- UI Library: [choice]
- Key Dependencies: [list]

## Features (Priority Order)
1. [Feature] - [acceptance criteria]
2. [Feature] - [acceptance criteria]

## File Structure
[Project organization]

## Quality Bar
- [ ] Responsive, error handling, loading states
- [ ] Clean code, no TODOs in final output

## What NOT To Do
- [Pitfall from research]

Why this works: You're not asking the AI to make 50+ decisions - you've already made them based on research. The AI executes, not strategizes. Blueprints, not vibes.

Phase 3: Delegate and Ship

Send the spec to a coding agent, not your orchestrator:

sessions_spawn({
  task: "[full spec]",
  mode: "run",
  runtime: "subagent"  // or "acp" for Codex/Claude Code
})

Run Memory Bridge preflight first. Before spawning any coding agent, inject vault context:
node scripts/memory-bridge/preflight-context.js --task "..." --workdir <project>
This writes a CONTEXT.md with relevant past decisions and patterns. See Part 13.
Send to a coding model. Your main model plans, not builds. For hard architecture work, Opus can code directly (#1 SWE-bench).
Include everything in one prompt. If you're thinking "I'll clarify later," you haven't researched enough.
Attach reference images for vision-capable models.

Let Your Agent Do the Research

You don't have to research manually - make your agent do Phase 1:

Before building anything, research first:
1. Find top 5 [things] that exist. What tech/UI patterns do they share?
2. Search "[thing] best practices 2026" - summarize key patterns.
3. Search "[thing] common mistakes" - list top pitfalls.
4. Based on research, write a detailed spec with tech stack, features,
   file structure, and quality bar.
Do NOT start building until the spec is written and I approve it.

The workflow:

You: "Research and spec out a [thing]"     → 2 min
Agent: [Tavily research → writes spec]     → 3-5 min
You: "Looks good, build it"                → 30 sec
Agent: [builds from spec]                  → one-shot quality

5 minutes of research saves 3+ hours of iteration. The math always works out.

Part 9: Vault Memory System (Stop Losing Knowledge Between Sessions)

Part 4 gave you memory. But after months of daily use, your agent gets dumber, not smarter. We hit this: 358 memory files, 100MB+ of accumulated knowledge, vector search returning irrelevant results because every query matches 15 slightly different files. Date-named files that tell you nothing. Research conclusions lost because nobody saved them.

The more you teach it, the worse it gets. That's the sign your memory architecture is broken.

Why Flat Files + Vector Search Breaks Down

Vector search finds what's similar - not what's connected. Ask "what do we know about God Mode?" and you get 8 files that all mention Cerebras. None give the full picture because it's spread across 12 files that vector search doesn't know are related.

Problem	What Happens
Date-named files	`2026-03-19.md` - what's in it? Who knows
No connections	Related files don't know about each other
Bloat pollutes results	Generic knowledge drowns specific insights
Session amnesia	Agent starts fresh, no breadcrumbs from last session
MEMORY.md overflow	Index grows past injection limit, context truncated

The fix isn't better embeddings. It's structure.

The Solution: Vault Architecture

An Obsidian-inspired linked knowledge vault with four key ideas:

Notes named as claims - the filename IS the knowledge
MOCs (Maps of Content) link related notes - one page = full picture
Wiki-links create a traversable graph - follow connections, not similarity
Agent Notes provide cross-session breadcrumbs - next session picks up where this one left off

Folder Structure

vault/
  00_inbox/      ← Raw captures. Dump here, structure later
  01_thinking/   ← MOCs + synthesized notes
  02_reference/  ← External knowledge, tool docs, API references
  03_creating/   ← Content drafts in progress
  04_published/  ← Finished work
  05_archive/    ← Inactive content. Never delete, always archive
  06_system/     ← Templates, vault philosophy, graph index

Claim-Named Notes

Stop naming files by date. Name them by what they claim:

BAD:  2026-03-19.md              GOOD: nemotron-mamba-wont-train-on-windows.md
BAD:  session-notes.md           GOOD: memory-is-the-bottleneck.md
BAD:  cerebras-research.md       GOOD: god-mode-is-cerebras-plus-orchestration.md

The agent reads filenames before content. When every filename is a claim, scanning a folder gives the agent a map of everything you know - without opening a single file.

MOCs - Maps of Content

A MOC connects related notes with [[wiki-links]]. Example:

# Memory Is The Bottleneck

## Key Facts
- 358 memory files in memory/, mostly date-named
- Vector search (Qwen3-VL or nomic-embed-text, 45ms, $0) finds similar, not connected
- MEMORY.md must stay under 5K - injected on every message

## Connected Topics
- [[vault/decisions/memory-architecture.md]]
- [[vault/research/rag-injection-research.md]]
- [[vault/projects/reasoning-traces.md]]

## Agent Notes
- [x] Vault restructure completed - 8 MOCs + philosophy doc
- [ ] Every session MUST save knowledge to memory

The ## Agent Notes section is the cross-session breadcrumb trail. Each session updates these notes; the next session reads them and picks up where the last one stopped.

Vault Philosophy Document

Save to vault/06_system/vault-philosophy.md - this teaches your agent HOW to use the vault:

The Network Is The Knowledge - No single note is the answer. The answer is the path through connected notes.
Notes Are Named As Claims - Bad: local-models.md. Good: local-models-are-the-fast-layer.md.
Links Woven Into Sentences - Not footnotes. Context-rich inline links.
Agent Orients Before Acting - Scan MOCs → read relevant MOC → follow links → respond.
Agent Leaves Breadcrumbs - Update MOC "Agent Notes" after every session.
Capture First, Structure Later - Dump in 00_inbox/ now. Organize later.

The Graph Tools

MOCs and wiki-links create a graph, but the agent needs tooling to traverse it. See scripts/vault-graph/ for the complete tools:

Script	Purpose
`graph-indexer.mjs`	Scans all `.md` files, parses `[[wiki-links]]`, builds JSON adjacency graph
`graph-search.mjs`	CLI for traversing the graph - finds files + direct/2nd-degree connections
`auto-capture.mjs`	Creates claim-named notes in `00_inbox/`, auto-links to related MOCs
`process-inbox.mjs`	Reviews inbox notes and suggests/auto-moves to appropriate vault folders
`update-mocs.mjs`	Health check - finds broken wiki-links, stale items, orphaned notes

Graph search vs vector search:

memory_search("topic") → Find files you didn't know were relevant (similarity)
node scripts/vault-graph/graph-search.mjs "topic" → Navigate files you know are connected (structure)

Use both. Vector search discovers; graph search navigates.

The Orientation Protocol

Add to your AGENTS.md:

## Vault Orientation Protocol
1. Scan `vault/01_thinking/` - read MOC filenames (claim-named = instant topic map)
2. If user message relates to an existing MOC, read it before responding
3. Follow [[wiki-links]] from the MOC for deeper context
4. After session work: update MOC "Agent Notes" with what was done/discovered
5. New knowledge → claim-named notes in `vault/00_inbox/`

This creates a cycle: orient → work → capture → update → next session orients from breadcrumbs.

Kill the Bloat

If you have a memory/knowledge-base/ full of generic reference material, move it:

mv memory/knowledge-base vault/05_archive/knowledge-base

Your primary search path (memory/ + vault/01_thinking/) should contain only YOUR knowledge - not generic docs the agent could web search.

Before: "memory architecture" returns 15 results - 3 about your system, 12 generic RAG articles.
After: Same search returns 3 results - all about your actual system.

Results

Metric	Before (Flat Files)	After (Vault System)
Files	358 flat, date-named	326 indexed, claim-named
Search method	Vector only	Graph traversal + vector
Wiki-links	0	71 bidirectional
MOC pages	0	8 in 01_thinking/
Cross-session memory	None - starts fresh	Agent Notes breadcrumbs
Knowledge capture	Manual (usually forgotten)	auto-capture creates claim-named notes
Search relevance	15 partial matches, 3 useful	3 connected results via graph

Quick Setup

Create vault structure: mkdir -p vault/{00_inbox,01_thinking,02_reference,03_creating,04_published,05_archive,06_system}
Create your first MOC in vault/01_thinking/ - name it as a claim, follow the template above
Save vault philosophy to vault/06_system/vault-philosophy.md
Set up graph tools: mkdir -p scripts/vault-graph - save the scripts from this repo
Build initial graph: node scripts/vault-graph/graph-indexer.mjs
Add orientation protocol to AGENTS.md
Move bloat to archive: mv memory/knowledge-base vault/05_archive/knowledge-base
Rebuild graph: node scripts/vault-graph/graph-indexer.mjs

Part 14: Quick Checklist

Run through this in 30 minutes:

MEMORY.md under 3 KB (pointers only)
SOUL.md under 1 KB
AGENTS.md under 2 KB
Total workspace context under 8 KB
Context pruning enabled (mode: "cache-ttl")
Cron sessions cleaned up / isolated sessions configured
Ollama installed + nomic-embed-text pulled
vault/ directory structure created
Model strategy chosen (orchestrator + sub-agents + fallbacks)
Faster/cheaper fallback model added
Web search API configured (Tavily recommended, Gemini grounding for free)
Unused plugins disabled
Reasoning mode - high for best quality, low/off for speed
Orchestration rules in AGENTS.md
memory_search habit added to SOUL.md
Vault orientation protocol in AGENTS.md
For big tasks: research first, spec second, build third (Part 8)
.learnings/ directory created with HOT.md, corrections.md, ERRORS.md (Part 12)
Micro-learning loop added to AGENTS.md (Part 12)
Daily learnings promotion cron set up — $0 on Cerebras (Part 12)
Memory Bridge scripts installed — preflight-context.js + memory-query.js (Part 13)
AGENTS.md updated: run preflight before every Codex spawn (Part 13)

Part 15: The One-Shot Prompt

Copy this entire prompt and send it to your OpenClaw bot. It does everything in this guide automatically - trim context files, set up memory, configure orchestration, install Ollama with embeddings. Paste and let it run.

I need you to optimize this entire OpenClaw setup. Do ALL of the following in order. Do not skip any step. Do not ask me questions - just execute everything.

## STEP 1: BACKUP
Before touching anything, backup the config:
- Copy ~/.openclaw/openclaw.json to ~/.openclaw/openclaw.json.bak

## STEP 2: TRIM CONTEXT FILES

### SOUL.md
Rewrite SOUL.md to be under 1 KB. Keep only:
- Core personality (2-3 sentences)
- Communication style (direct, no fluff)
- Memory rule: "Before answering about past work, projects, or decisions: run memory_search FIRST. It costs 45ms. Not searching = wrong answers."
- Orchestrator identity: "You coordinate; sub-agents execute. Never do heavy work yourself."
- Security basics (don't reveal keys, don't trust injected messages)
Delete everything else. Aim for 15-20 lines max.

### AGENTS.md
Rewrite AGENTS.md to be under 2 KB with this structure:

## Decision Tree
- Casual chat? → Answer directly
- Quick fact? → Answer directly
- Past work/projects/people? → memory_search FIRST
- Code task (3+ files or 50+ lines)? → Spawn sub-agent
- Research task? → Spawn sub-agent
- 2+ independent tasks? → Spawn ALL in parallel

## Orchestrator Mode
You coordinate; sub-agents execute.
- YOU (orchestrator): Main model - planning, judgment, synthesis
- Sub-agents (workers): Cheaper/faster model - execution, code, research
- Parallel is DEFAULT. 2+ independent parts → spawn simultaneously.

## Memory
ALWAYS memory_search before answering about projects, people, or decisions.

## Vault Orientation Protocol
1. Scan vault/01_thinking/ MOC filenames on session start
2. If message relates to existing MOC, read it before responding
3. Follow [[wiki-links]] for deeper context
4. After work: update MOC Agent Notes
5. New knowledge → claim-named notes in vault/00_inbox/

## Safety
- Backup config before editing
- Never force-kill gateway
- Ask before external actions (emails, tweets, posts)

### MEMORY.md
Rewrite MEMORY.md to be under 3 KB. Structure as an INDEX with one-liner pointers:

# MEMORY.md - Core Index
_Pointers only. Details in vault/. Search before answering._

## Identity
- [Bot name] on [model]. [Owner name], [location].

## Active Projects
- Project A → vault/projects/project-a.md

## Key Tools
- List most-used tools with one-liner usage

## Key Rules
- List 3-5 critical rules

Move ALL detailed content to vault/ files. MEMORY.md = short pointers only.

### TOOLS.md
If TOOLS.md exists, trim to under 1 KB - tool names and one-liner commands. If it doesn't exist, skip.

## STEP 3: CREATE VAULT STRUCTURE

Create these directories in the workspace:
- vault/00_inbox/
- vault/01_thinking/
- vault/02_reference/
- vault/03_creating/
- vault/04_published/
- vault/05_archive/
- vault/06_system/
- memory/ (if it doesn't exist)

Move any detailed docs from MEMORY.md into the appropriate vault/ subdirectory.

Create vault/06_system/vault-philosophy.md with these principles:
1. The Network Is The Knowledge - answers are paths through connected notes
2. Notes Named As Claims - filename IS the knowledge
3. Links Woven Into Sentences - not footnotes
4. Agent Orients Before Acting - scan MOCs → read → follow links → respond
5. Agent Leaves Breadcrumbs - update Agent Notes after every session
6. Capture First, Structure Later - dump in 00_inbox/, organize later

## STEP 4: INSTALL OLLAMA + EMBEDDING MODEL

Check if Ollama is installed:
- Try running: ollama --version
- If not installed:
  - Windows: winget install Ollama.Ollama
  - Mac: brew install ollama
  - Linux: curl -fsSL https://ollama.com/install.sh | sh

Pull the embedding model:
- ollama pull nomic-embed-text

## STEP 5: ADD FALLBACK MODEL

In openclaw.json, find your main agent config and add a fallback model. Use a faster/cheaper model from the same provider.

## STEP 6: DISABLE UNUSED PLUGINS

In openclaw.json, any plugin not actively used → set "enabled": false.

## STEP 7: SET UP SELF-IMPROVING SYSTEM (Part 12)

Create the learnings directory:
- workspace/.learnings/HOT.md (empty, header: "# HOT Learnings")
- workspace/.learnings/corrections.md (header: "# User Corrections Log")
- workspace/.learnings/ERRORS.md (header: "# Error Log")
- workspace/.learnings/LEARNINGS.md (header: "# General Learnings")
- workspace/.learnings/FEATURE_REQUESTS.md (header: "# Feature Requests")
- workspace/.learnings/projects/ (empty dir)
- workspace/.learnings/domains/ (empty dir)
- workspace/.learnings/archive/ (empty dir)

Add the micro-learning loop to AGENTS.md (insert before the decision tree):

### Micro-Learning Loop (EVERY MESSAGE — silent, <100 tokens)
After EVERY response, silently check:
  1. Did user correct me? → append 1-line to .learnings/corrections.md
  2. Did a command/tool fail? → append 1-line to .learnings/ERRORS.md
  3. Did I discover something? → append 1-line to .learnings/LEARNINGS.md
Format: "- [YYYY-MM-DD] what happened → what to do instead"

## STEP 8: INSTALL MEMORY BRIDGE (Part 13)

Clone or copy the Memory Bridge scripts:
- git clone https://github.com/OnlyTerp/memory-bridge.git scripts/memory-bridge
- Or manually create scripts/memory-bridge/memory-query.js and preflight-context.js

Add to AGENTS.md coding workflow: "Before spawning Codex, run: node scripts/memory-bridge/preflight-context.js --task '...' --workdir <dir>"

## STEP 9: VERIFY

After all changes:
1. Restart the gateway: openclaw gateway stop && openclaw gateway start
2. Run: openclaw doctor
3. Test memory_search by asking about something in your vault files
4. Test Memory Bridge: node scripts/memory-bridge/memory-query.js "test query"
5. Report what you changed with before/after file sizes

## IMPORTANT RULES
- Do NOT delete any config - only trim and reorganize
- Keep all original content - just move it to vault/
- If a file doesn't exist, skip it
- Total workspace context (all .md files in root) should be under 8 KB when done
- Restart the gateway AFTER all changes, not during

That's it. One paste, your bot does everything. If anything fails, your config backup is at openclaw.json.bak.

Troubleshooting

One-shot prompt only partially completed:
Re-paste just the steps that didn't complete. The prompt is idempotent - running a step twice won't break anything.

memory_search not working:
Make sure Ollama is running (ollama ps) and nomic-embed-text is pulled. OpenClaw auto-detects on localhost:11434.

Bot still feels slow after trimming:
Check total workspace file sizes. If over 10KB, files weren't trimmed. Also check reasoning mode - high adds 2-5 seconds per message.

Sub-agents not spawning:
Make sure your model supports sessions_spawn and you have a fallback model configured.

Gateway won't restart:
Run openclaw doctor --fix. If needed, restore backup: cp ~/.openclaw/openclaw.json.bak ~/.openclaw/openclaw.json

One-shot prompt struggles on your model:
Do these 3 things manually instead:

Copy files from /templates into your workspace root
Run ollama pull nomic-embed-text
Restart gateway: openclaw gateway stop && openclaw gateway start

FAQ

Why markdown files instead of a real database?
Zero-infrastructure entry point. No Docker, no database admin. For power users, the architecture scales into a real database backend (e.g., TiDB vector). Markdown is the starting line, not the finish line.

Doesn't the expensive model need to do the hard tasks?
No. Your expensive model PLANS and JUDGES. Execution (code, research, analysis) gets delegated to cheaper models via sub-agents. Frontier judgment + budget execution.

Does this work with models other than Claude Opus?
Architecture works with any model supporting memory_search and sessions_spawn in OpenClaw. Tested on Opus 4.6; most frontier models should handle the one-shot prompt.

How is this different from other memory solutions?
Most add external databases or cloud services. This gives you 90% of the benefit with 10% of the parts - local files + vector search. Nothing to install except Ollama. Nothing leaves your machine.

About

Built by Terp - Terp AI Labs

The definitive optimization guide for OpenClaw - covering speed, memory, context management, model selection, web search, orchestration, vault architecture, and spec-driven development. Battle-tested daily on a production setup.

Saved you tokens/time? Drop a ⭐ on this repo or ping @OnlyTerp on X with your before/after numbers.

Prefer scripts? Run bash setup.sh (Mac/Linux) or powershell setup.ps1 (Windows) from the repo root.

OpenClaw Optimization Guide

Make Your OpenClaw AI Agent Faster, Smarter, and Actually Useful

Speed optimization, memory architecture, context management, model selection, and one-shot development for OpenClaw

Table of Contents

The Problem

What This Fixes

How It Works

What The Optimized Files Look Like

Part 1: Speed (Stop Being Slow)

Why Trimming Works

Trim Your Context Files

Add a Fallback Model

Reasoning Mode - Know the Tradeoff

Disable Unused Plugins

Ollama Housekeeping

Part 2: Context Bloat (The Silent Performance Killer)

The Quadratic Problem

What Happens at 50% of Your Context Window

Where Bloat Comes From

The Cost Math

Built-In Defenses

Context Bloat Checklist

Part 3: Cron Session Bloat (The Hidden Killer)

How to Spot It

The Fix

Prevention > Cleanup

Part 4: Memory (Stop Forgetting Everything)

The Architecture

How It Works

Setting It Up

The Golden Rule

Part 5: Orchestration (Stop Doing Everything Yourself)

The Mental Model

Add This to AGENTS.md

Give Coding Agents Your Brain

Part 6: Models (What to Actually Use)

The Model Strategy

Model Deep Dive

OpenRouter: The Model Marketplace

Local Models: $0 Forever, No Rate Limits

Using Anthropic Membership (The Best Way)

Recommended Setups

Pro Tips

Part 7: Web Search (Give Your Agent Eyes on the Internet)

The Players

Why We Use Tavily

Setting Up Tavily

When to Use What

Part 8: One-Shotting Big Tasks (Stop Iterating, Start Researching)

The Data

Why Iteration Fails

The Method: Research → Spec → Ship

Phase 1: Research (30-60 minutes)

Phase 2: Write the Spec (15-30 minutes)

Phase 3: Delegate and Ship

Let Your Agent Do the Research

Part 9: Vault Memory System (Stop Losing Knowledge Between Sessions)

Why Flat Files + Vector Search Breaks Down

The Solution: Vault Architecture

Folder Structure

Claim-Named Notes

MOCs - Maps of Content

Vault Philosophy Document

The Graph Tools

The Orientation Protocol

Kill the Bloat

Results

Quick Setup

Part 14: Quick Checklist

Part 15: The One-Shot Prompt

Troubleshooting

FAQ

About

Related Resources

Reviews (0)