llmfs

mcp
Security Audit
Warn
Health Warn
  • License — License: Apache-2.0
  • Description — Repository has a description
  • Active repo — Last push 0 days ago
  • Low visibility — Only 6 GitHub stars
Code Pass
  • Code scan — Scanned 12 files during light audit, no dangerous patterns found
Permissions Pass
  • Permissions — No dangerous permissions requested
Purpose
This tool provides filesystem-based persistent memory for LLMs and AI agents. It uses a local directory structure and a ChromaDB vector database to store and retrieve conversational context, aiming to prevent information loss when agents exceed their standard token limits.

Security Assessment
Overall risk: Low. The rule-based code scan evaluated 12 files and found no dangerous patterns, hardcoded secrets, or requests for excessive permissions. Because it acts as a memory store, the tool inherently handles whatever conversational data is passed to it, but it relies on local file storage rather than making external network requests or executing arbitrary shell commands. No major security red flags were identified.

Quality Assessment
The project is actively maintained, with its most recent push happening today. It uses a standard, permissive license (Apache-2.0), includes a comprehensive README, and has automated testing configured. The only notable drawback is its low visibility; it currently has only 6 GitHub stars, indicating it has not yet been widely adopted or heavily vetted by the broader developer community.

Verdict
Use with caution — the codebase appears safe and well-structured, but its low community adoption means you should test it internally before relying on it for critical workloads.
SUMMARY

Filesystem-based persistent memory for LLMs and AI agents -- unlimited context windows without lossy summarization

README.md

LLMFS — Filesystem Memory for LLMs and AI Agents

PyPI version
Python 3.10+
License
Tests

LLMFS gives LLMs and AI agents persistent, searchable, structured memory — organized like a filesystem. Instead of losing context when a conversation grows past the token limit, agents offload memories to LLMFS and retrieve exactly what they need, when they need it. The result is zero information loss and an effectively unlimited context window — even over thousands of turns.


Table of Contents


Why LLMFS?

Every LLM agent eventually hits the same wall: the context window fills up.

The standard solution — lossy summarization — destroys information. When an agent summarizes 80k tokens into 5k, 94% of the detail is gone forever. Ask it about a specific line of code from 30 turns ago, and it can only apologize.

LLMFS takes a different approach, borrowed directly from operating systems:

OS Concept     →   LLM Concept
──────────────────────────────────────────────────────────
RAM            →   Context Window (e.g. 128k tokens)
Disk / Swap    →   LLMFS  (500k+ tokens, full fidelity)
Page eviction  →   Offload old turns to LLMFS
Page fault     →   LLM calls memory_search / memory_read
Virtual addr   →   Memory path  (/session/turns/42)
MMU            →   ContextManager

Memories are stored at filesystem-style paths (/projects/auth/bug, /events/2026-03-15_fix) and searched semantically via ChromaDB + all-MiniLM-L6-v2. They persist across sessions, support TTLs, carry metadata and tags, and can be linked in a knowledge graph.


Quick Start

# Install
pip install llmfs

# Initialize a store in the current directory
llmfs init

# Write your first memory
llmfs write /knowledge/hello "LLMFS stores memories at filesystem paths"

# Search it back
llmfs search "how does memory storage work"

# Check what's in the store
llmfs status

Python API in 5 lines:

from llmfs import MemoryFS

mem = MemoryFS()
mem.write("/projects/auth/bug", "JWT expiry misconfigured at auth.py:45", tags=["jwt", "bug"])
results = mem.search("authentication error", k=3)
print(results[0].path, results[0].score)

Architecture

MemoryFS Architecture

On-Disk Layout

~/.llmfs/            # default; or .llmfs/ in current directory
  metadata.db        # SQLite — file registry, chunks, tags, graph, cache
  chroma/            # ChromaDB persistence — embedding vectors
  config.json        # optional configuration overrides

Memory Layers

Every memory belongs to one of four layers with different lifetime semantics:

Layer Purpose Default TTL Use When
short_term Temporary reasoning scratch space 60 minutes Intermediate calculations, draft thoughts
session Current conversation context Session-scoped Turn-by-turn chat, in-progress task state
knowledge Persistent facts, learnings, code Permanent Project knowledge, user preferences, decisions
events Timestamped occurrences Permanent Bug reports, deployments, meetings, milestones
mem.write("/scratch/step3", "intermediate result", layer="short_term", ttl_minutes=10)
mem.write("/session/task", "refactoring auth module", layer="session")
mem.write("/knowledge/auth/jwt-expiry", "JWT tokens expire after 1h", layer="knowledge")
mem.write("/events/2026-03-15/deploy", "v2.1 deployed to prod", layer="events")

Expired short_term memories are garbage-collected automatically on each write cycle (throttled to once per minute). Run llmfs gc to collect manually.


Feature Comparison

Feature mem0 Letta ChromaDB LLMFS
Filesystem metaphor
Memory layers with TTL Partial
Knowledge graph
Custom query language (MQL) SQL-like Custom MQL
Auto-compression & chunking
Infinite context (VM model)
CLI interface
Local-first, no server needed
Zero-config (llmfs init) Partial
MCP server built-in
FUSE filesystem mount ✓ (optional)
Drop-in agent middleware

Installation

Core (CLI + Python API)

pip install llmfs

Dependencies (auto-installed): ChromaDB, sentence-transformers, Click, Rich, scikit-learn, NumPy.

The first search or write call downloads all-MiniLM-L6-v2 (~22 MB) to your HuggingFace cache. No GPU required.

Optional Extras

# MCP server support (Claude, Cursor, Windsurf, Continue)
pip install "llmfs[mcp]"

# OpenAI function-calling integration
pip install "llmfs[openai]"

# LangChain memory adapters
pip install "llmfs[langchain]"

# FUSE filesystem mount (Linux / macOS only)
pip install "llmfs[fuse]"

# Everything
pip install "llmfs[mcp,openai,langchain,fuse]"

# Development
pip install "llmfs[dev]"

From Source

git clone https://github.com/viditraj/llmfs.git
cd llmfs
pip install -e ".[dev]"
pytest

CLI Reference

All commands accept --llmfs-path (or LLMFS_PATH env var) to point to a custom store. By default LLMFS looks for .llmfs/ in the current directory, then falls back to ~/.llmfs.

llmfs init

Initialize a new store in the current directory.

llmfs init
# Initialised LLMFS at /your/project/.llmfs
# Next steps:
#   llmfs write /knowledge/hello 'Hello world'
#   llmfs search 'hello'
#   llmfs status

llmfs write

Store content at a memory path.

# From inline content
llmfs write /knowledge/auth/bug "JWT expiry misconfigured at auth.py line 45"

# From a file
llmfs write /knowledge/architecture --file ARCHITECTURE.md

# With layer, tags, and TTL
llmfs write /session/plan "Refactor auth module today" \
    --layer session --tags "plan,auth" --ttl 480

# From stdin
cat report.md | llmfs write /knowledge/report

Options:

Flag Description
--layer short_term | session | knowledge | events (default: knowledge)
--tags Comma-separated tags, e.g. "jwt,bug,auth"
--ttl Minutes until auto-expiry
--file Read content from a file path

llmfs read

Read a memory by exact path.

llmfs read /knowledge/auth/bug

# Focused read: return only chunks relevant to your query
llmfs read /knowledge/auth/bug --query "what line number"

llmfs search

Semantic search across all memories.

llmfs search "authentication error"
llmfs search "bucket creation error" --layer knowledge --tags s3 --k 10
llmfs search "auth bug" --time "last 7 days"

Options:

Flag Description
--layer Restrict to a layer
--tags Comma-separated required tags
--k Number of results (default: 5)
--time Human time string: "last 30 minutes", "today", "last 7 days"

llmfs update

Modify an existing memory.

# Append new information
llmfs update /knowledge/auth/bug --append "Fixed in commit abc123"

# Replace content entirely
llmfs update /knowledge/auth/bug --content "Bug resolved. Root cause: missing null check."

# Manage tags
llmfs update /knowledge/auth/bug --tags-add "resolved" --tags-remove "in-progress"

llmfs forget

Delete memories.

# Delete a specific memory
llmfs forget /knowledge/auth/bug

# Wipe an entire layer
llmfs forget --layer short_term

# Delete memories older than a duration
llmfs forget --older-than "30 days"

# Skip confirmation prompt
llmfs forget /session/old-task --yes

llmfs relate

Link two memories in the knowledge graph.

llmfs relate /events/2026-03-15/bug /knowledge/auth/jwt-expiry caused_by
llmfs relate /knowledge/auth/jwt-expiry /knowledge/auth/architecture related_to --strength 0.95

llmfs query

Run a structured MQL query.

llmfs query 'SELECT memory FROM /knowledge WHERE SIMILAR TO "auth bug" LIMIT 5'
llmfs query 'SELECT memory FROM /events WHERE TAG = "deploy" LIMIT 10' --json

llmfs ls

List memories under a path prefix.

llmfs ls /knowledge
llmfs ls /session --layer session

llmfs status

Show storage statistics.

llmfs status
# LLMFS Status  (/home/user/.llmfs)
#   Total memories : 142
#   Total chunks   : 891
#   Disk usage     : 45.2 MB

llmfs gc

Garbage-collect expired (TTL) memories and orphaned chunks.

llmfs gc
# GC complete. Deleted 7 expired memories.

llmfs serve

Start the MCP server.

llmfs serve --stdio          # stdio transport (for Claude, Cursor, etc.)
llmfs serve --port 8765      # SSE transport on port 8765

llmfs install-mcp

Auto-configure LLMFS as an MCP server in a supported client.

llmfs install-mcp --client claude     # Claude Desktop
llmfs install-mcp --client cursor     # Cursor
llmfs install-mcp --client windsurf   # Windsurf
llmfs install-mcp --client continue   # Continue
llmfs install-mcp --print             # Print config JSON to stdout

llmfs mount / llmfs unmount

Mount LLMFS as a FUSE filesystem (requires pip install "llmfs[fuse]").

llmfs mount /mnt/memory
llmfs unmount /mnt/memory

Python API

from llmfs import MemoryFS

# Initialize — uses ~/.llmfs by default, or .llmfs/ in cwd
mem = MemoryFS()

# Or with a custom path
mem = MemoryFS(path="/tmp/myproject-memory")

write

obj = mem.write(
    path="/projects/auth/debug",
    content="User reports bucket creation failure with error: AccessDenied on s3://my-bucket",
    layer="knowledge",          # short_term | session | knowledge | events
    tags=["debug", "s3", "auth"],
    ttl_minutes=None,           # None = permanent; integer = auto-expire
    source="agent",             # manual | agent | mcp | cli
)

print(obj.path)          # /projects/auth/debug
print(obj.layer)         # knowledge
print(obj.chunks)        # list of Chunk objects (auto-chunked + embedded)
print(obj.summaries)     # level_1 (per-chunk) and level_2 (document) summaries
print(obj.metadata.created_at)

If you write to the same path with identical content, LLMFS skips re-embedding and returns the cached object immediately.

read

# Full read
obj = mem.read("/projects/auth/debug")
print(obj.content)
print(obj.metadata.tags)
print(obj.relationships)   # linked memories

# Focused read — returns only the chunks most relevant to your query
obj = mem.read("/projects/auth/debug", query="what was the exact error")
print(obj.content)  # only the relevant chunk(s)

Raises MemoryNotFoundError if the path does not exist.

search

# Basic semantic search
results = mem.search("bucket creation error", k=5)

# With filters
results = mem.search(
    "authentication bug",
    layer="knowledge",
    tags=["jwt"],
    path_prefix="/projects",
    time_range="last 7 days",
    k=10,
)

for r in results:
    print(f"{r.score:.2f}  {r.path}")
    print(f"  {r.chunk_text[:120]}")
    print(f"  tags={r.tags}  layer={r.metadata['layer']}")

search returns list[SearchResult] ordered by descending relevance. Results are cached for 5 minutes (configurable).

update

# Append new findings
mem.update("/projects/auth/debug", append="Fixed in commit abc123. Root cause: null pointer.")

# Full content replacement
mem.update("/projects/auth/debug", content="Completely new content.")

# Tag management only
mem.update("/projects/auth/debug", tags_add=["resolved"], tags_remove=["in-progress"])

forget

# Delete a specific memory
result = mem.forget("/projects/auth/debug")
print(result)  # {"deleted": 1, "status": "ok"}

# Wipe a layer
mem.forget(layer="short_term")

# Time-based cleanup
mem.forget(older_than="30 days")

relate

result = mem.relate(
    source="/events/2026-03-15/bug",
    target="/knowledge/auth/jwt-expiry",
    relationship="caused_by",   # related_to | follows | caused_by | contradicts
    strength=0.92,              # 0.0 to 1.0
)
print(result["relationship_id"])

list

memories = mem.list("/knowledge", recursive=True, layer="knowledge")
for obj in memories:
    print(obj.path, obj.metadata.modified_at)

query (MQL)

results = mem.query(
    'SELECT memory FROM /knowledge WHERE SIMILAR TO "auth bug" LIMIT 5'
)

status

info = mem.status()
# {
#   "total": 142,
#   "layers": {"knowledge": 98, "events": 31, "session": 11, "short_term": 2},
#   "chunks": 891,
#   "disk_mb": 45.2,
#   "base_path": "/home/user/.llmfs"
# }

gc

result = mem.gc()
# {"deleted": 7, "status": "ok"}

Error Handling

from llmfs import (
    MemoryNotFoundError,
    MemoryWriteError,
    LLMFSError,
)

try:
    obj = mem.read("/does/not/exist")
except MemoryNotFoundError as e:
    print(f"Not found: {e}")

try:
    mem.write("/bad", ...)
except MemoryWriteError as e:
    print(f"Write failed: {e}")

MCP Server

LLMFS ships a full Model Context Protocol server that exposes all six core tools to any MCP-compatible client. Once configured, the LLM can call memory_write, memory_search, memory_read, memory_update, memory_forget, and memory_relate natively in its tool loop.

Auto-Install (Recommended)

pip install "llmfs[mcp]"
llmfs install-mcp --client claude    # Claude Desktop
llmfs install-mcp --client cursor    # Cursor
llmfs install-mcp --client windsurf  # Windsurf
llmfs install-mcp --client continue  # Continue

This writes or merges the following into your client's config file:

{
  "mcpServers": {
    "llmfs": {
      "command": "llmfs",
      "args": ["serve", "--stdio"],
      "description": "AI memory filesystem — persistent, searchable, graph-linked memory"
    }
  }
}

Config file locations written by install-mcp:

Client Config Path
Claude ~/Library/Application Support/Claude/claude_desktop_config.json (macOS) or ~/.config/Claude/claude_desktop_config.json (Linux)
Cursor ~/.cursor/mcp.json
Windsurf ~/.codeium/windsurf/mcp_config.json
Continue ~/.continue/config.json

Manual Config

Print the config JSON to paste it yourself:

llmfs install-mcp --print

Or with a custom store path:

llmfs install-mcp --client claude --llmfs-path /my/project/.llmfs

Programmatic Usage

from llmfs import MemoryFS
from llmfs.mcp.server import LLMFSMCPServer

mem = MemoryFS(path="~/.llmfs")
server = LLMFSMCPServer(mem=mem)
server.run_stdio()    # blocking; use as CLI entry-point
# or:
server.run_sse(host="127.0.0.1", port=8765)

The 6 MCP Tools

Once the server is running, the LLM has access to:

Tool Description
memory_write Store content at a path with layer, tags, and optional TTL
memory_search Semantic search with layer/tag/time filters
memory_read Read a specific memory by exact path (with optional focused query)
memory_update Append or replace content; add/remove tags
memory_forget Delete by path, layer, or age
memory_relate Create a typed, weighted graph edge between two memories

A system prompt fragment is automatically injected that tells the LLM when and how to use each tool.


LangChain Integration

LLMFS provides two drop-in LangChain memory adapters. Install with pip install "llmfs[langchain]".

LLMFSChatMemory — Persistent Chat History

from llmfs.integrations.langchain import LLMFSChatMemory
from langchain.chains import ConversationChain
from langchain_openai import ChatOpenAI

memory = LLMFSChatMemory(memory_path="~/.llmfs")
chain = ConversationChain(llm=ChatOpenAI(model="gpt-4o"), memory=memory)

# Memory persists automatically — conversations survive process restarts
response = chain.predict(input="What was the JWT bug we discussed?")

LLMFSRetrieverMemory — Semantic Context Injection

LLMFSRetrieverMemory semantically searches past conversations on every turn and injects the most relevant passages into the LLM's context:

from llmfs.integrations.langchain import LLMFSRetrieverMemory
from langchain.chains import ConversationChain
from langchain_openai import ChatOpenAI

memory = LLMFSRetrieverMemory(
    memory_path="~/.llmfs",
    search_k=5,                   # inject top-5 relevant memories
    layer="knowledge",
)

chain = ConversationChain(llm=ChatOpenAI(model="gpt-4o"), memory=memory)

Both classes implement BaseChatMessageHistory / BaseMemory and work as drop-in replacements for LangChain's built-in memory classes.


OpenAI Function Calling

LLMFS exports OpenAI-format tool definitions and a handler. Install with pip install "llmfs[openai]".

import openai
from llmfs import MemoryFS
from llmfs.integrations.openai_tools import LLMFS_TOOLS, LLMFSToolHandler

mem = MemoryFS()
handler = LLMFSToolHandler(mem)

messages = [
    {"role": "system", "content": "You are a helpful assistant with persistent memory."},
    {"role": "user",   "content": "Remember that our database is PostgreSQL 15."},
]

# Pass LLMFS tools alongside any other tools you use
response = openai.chat.completions.create(
    model="gpt-4o",
    messages=messages,
    tools=LLMFS_TOOLS,
    tool_choice="auto",
)

# Handle all LLMFS tool calls in the response
tool_results = handler.handle_batch(response.choices[0].message.tool_calls)

# Append tool results and continue the conversation
for call, result in zip(response.choices[0].message.tool_calls, tool_results):
    messages.append({
        "role": "tool",
        "tool_call_id": call.id,
        "content": result,
    })

LLMFS_TOOLS is a plain Python list of JSON Schema dicts — pass it directly to any OpenAI-compatible API.


Infinite Context — ContextMiddleware

The ContextMiddleware is LLMFS's flagship feature. Wrap any agent with two lines and get effectively unlimited context with zero information loss.

The Problem

Turn 35: Context window hits 128k tokens
         ↓
  Standard approach: lossy summarization
  128k tokens → 5k tokens = 94% information LOST FOREVER
         ↓
Turn 36: "What was the exact error at auth.py line 45?"
  LLM: "I don't have that detail anymore." ← failure

The LLMFS Solution

LLMFS works like virtual memory. Old turns are evicted from the context window and stored in LLMFS at full fidelity. A compact memory index (≈2k tokens) stays in the system prompt, listing what has been stored and where. When the LLM needs something, it calls memory_read or memory_search to page it back in.

MemoryFS Architecture

Drop-In Usage

from llmfs import MemoryFS
from llmfs.context import ContextMiddleware

# Wrap your existing agent with 2 lines
agent = YourExistingAgent(model="gpt-4o")
agent = ContextMiddleware(agent, memory=MemoryFS())

# Now every call transparently manages context:
# 1. Intercepts every turn (before + after)
# 2. Scores importance of each message
# 3. Auto-evicts at 70% capacity, targets 50%
# 4. Extracts artifacts (code, errors, file refs) before eviction
# 5. Rebuilds the memory index after eviction
# 6. Injects the index into the system prompt
# 7. Provides memory_search / memory_read tools to the LLM
response = agent.chat("What was the exact error from turn 15?")

Importance Scoring

The middleware scores each turn before evicting the lowest-importance ones:

Signal Score Boost
Contains a code block (```) +0.20
Contains error / traceback +0.20
Contains decision keyword (decided, plan, must) +0.15
Role = user (user intent is high-value) +0.10
Very recent turn (last 3) +0.15
Very short / conversational filler −0.20

Artifact Extraction

Before a turn is evicted, the middleware automatically extracts and stores structured artifacts at dedicated sub-paths:

Artifact Stored At Tags
Code blocks /session/{id}/code/turn_{n}_{i} ["code", "<lang>"]
Stack traces / errors /session/{id}/errors/turn_{n} ["error"]
File paths mentioned /session/{id}/files/turn_{n} ["file_references"]
Decisions /session/{id}/decisions/turn_{n} ["decision"]
Full turn (always) /session/{id}/turns/{n}

Memory Index

The memory index is regenerated after each eviction cycle and injected into the system prompt:

## LLMFS Memory Index
You have the following memories (use memory_read / memory_search to retrieve):

- [/session/abc/turns/1]       (turn 1, 10:30) [user]      — User asked to fix auth module bug
- [/session/abc/turns/2]       (turn 2, 10:31) [assistant] — Found JWT expiry at auth.py:45
- [/session/abc/code/turn_2_0] (turn 2, 10:31) [code:py]   — Fixed auth.py token refresh logic
- [/session/abc/errors/turn_3] (turn 3, 10:32) [error]     — TypeError: NoneType at auth.py:45
- [/session/abc/turns/5]       (turn 5, 10:35) [user]      — Asked to also fix refresh endpoint
... (12 more — use memory_search "topic" to find relevant ones)

ContextManager API

For lower-level control:

from llmfs import MemoryFS
from llmfs.context.manager import ContextManager

mem = MemoryFS()
ctx = ContextManager(
    mem=mem,
    max_tokens=128000,
    evict_at=0.70,            # start evicting at 70% capacity
    target_after_evict=0.50,  # evict down to 50%
)

# Track a new turn
ctx.on_new_turn(role="user", content="Fix the JWT bug", tokens=12)
ctx.on_new_turn(role="assistant", content="Found the issue at auth.py:45", tokens=45)

# Get the current memory index for system prompt injection
index = ctx.get_system_prompt_addon()

# Get active (in-context) turns
turns = ctx.get_active_turns()

# Reset for a new session
ctx.reset_session()

MQL — Memory Query Language

LLMFS includes a custom query language that compiles to ChromaDB + SQLite queries.

Syntax

-- Semantic similarity search in a path prefix
SELECT memory FROM /knowledge WHERE SIMILAR TO "authentication bug" LIMIT 5

-- Tag filter
SELECT memory FROM /knowledge WHERE TAG = "s3" LIMIT 10

-- Combined semantic + tag filter
SELECT memory FROM /knowledge WHERE SIMILAR TO "bucket error" AND TAG = "s3" LIMIT 5

-- Time-scoped search
SELECT memory FROM /events WHERE date > 2026-01-01 AND date < 2026-04-01

-- Topic / keyword filter
SELECT memory FROM /projects WHERE topic = "authentication"

-- Order by recency
SELECT memory FROM /session ORDER BY created_at DESC LIMIT 10

-- Graph traversal (BFS, depth 2)
SELECT memory FROM /projects RELATED TO "/events/2026-03-15/bug" WITHIN 2

Python API

results = mem.query(
    'SELECT memory FROM /knowledge WHERE SIMILAR TO "JWT expiry" AND TAG = "auth" LIMIT 5'
)
for r in results:
    print(r.path, r.score)

CLI

llmfs query 'SELECT memory FROM /knowledge WHERE SIMILAR TO "auth bug"'
llmfs query 'SELECT memory FROM /events WHERE TAG = "deploy"' --json

Supported Conditions

Condition Syntax Backed By
SIMILAR TO SIMILAR TO "query string" ChromaDB vector search
TAG TAG = "tagname" SQLite tag index
date date > 2026-01-01 SQLite date filter
topic topic = "keyword" SQLite metadata filter
RELATED TO RELATED TO "/path" WITHIN N Graph BFS traversal
AND / OR logical combinators Merged result sets

Memory Graph

Link related memories to build a navigable knowledge graph.

# Create typed relationships
mem.relate("/events/2026-03-15/bug",     "/knowledge/auth/jwt-expiry",  "caused_by",  strength=0.92)
mem.relate("/knowledge/auth/jwt-expiry", "/knowledge/auth/architecture", "related_to", strength=0.85)
mem.relate("/events/2026-03-14/deploy",  "/events/2026-03-15/bug",      "follows",    strength=1.0)

Relationship types: related_to, follows, caused_by, contradicts

Use graph traversal via MQL:

SELECT memory FROM /knowledge RELATED TO "/events/2026-03-15/bug" WITHIN 2

Or from the Python API via the low-level MemoryGraph:

from llmfs.graph.memory_graph import MemoryGraph

graph = MemoryGraph(mem._db)
neighbors = graph.get_neighbors("/knowledge/auth/jwt-expiry")
path      = graph.traverse("/events/2026-03-15/bug", max_depth=3)

FUSE Filesystem Mount

Mount LLMFS as a real FUSE filesystem and access memories with ordinary shell tools. Requires Linux or macOS.

pip install "llmfs[fuse]"
mkdir /tmp/memory
llmfs mount /tmp/memory

# Now you can use standard tools
ls /tmp/memory/knowledge/
cat /tmp/memory/knowledge/auth/jwt-expiry
echo "New finding: also affects refresh endpoint" >> /tmp/memory/knowledge/auth/jwt-expiry

llmfs unmount /tmp/memory

Mount options:

llmfs mount /tmp/memory --layer session      # default write layer
llmfs mount /tmp/memory --background         # detach from terminal

Configuration Reference

LLMFS works with zero configuration — llmfs init is all you need. To tune behavior, create .llmfs/config.json:

{
  "embedder": "local",
  "embedder_model": "all-MiniLM-L6-v2",
  "chunk_size_tokens": 256,
  "chunk_overlap_tokens": 50,
  "search_cache_ttl_seconds": 300,
  "auto_relate_threshold": 0.85,
  "context_manager": {
    "max_tokens": 128000,
    "evict_at": 0.70,
    "target_after_evict": 0.50
  },
  "layers": {
    "short_term": { "ttl_minutes": 60 },
    "session":    { "ttl_minutes": null },
    "knowledge":  { "ttl_minutes": null },
    "events":     { "ttl_minutes": null }
  }
}

Configuration Options

Key Default Description
embedder "local" "local" (sentence-transformers) or "openai"
embedder_model "all-MiniLM-L6-v2" Model name for local embedder
chunk_size_tokens 256 Target chunk size in tokens (prose); 512 for code
chunk_overlap_tokens 50 Overlap between adjacent chunks
search_cache_ttl_seconds 300 How long to cache search results (0 = disabled)
auto_relate_threshold 0.85 Auto-create related_to edge when similarity exceeds this
context_manager.max_tokens 128000 Total context window size (tokens)
context_manager.evict_at 0.70 Fraction of max_tokens at which eviction starts
context_manager.target_after_evict 0.50 Fraction of max_tokens to reach after eviction
layers.short_term.ttl_minutes 60 TTL for short_term memories

Using OpenAI Embeddings

{
  "embedder": "openai",
  "embedder_model": "text-embedding-3-small"
}
export OPENAI_API_KEY=sk-...

OpenAI embeddings are higher quality for some domains but add latency and cost. The local model (22 MB, CPU-only) handles 1,000+ queries/second and is the default.

Environment Variables

Variable Description
LLMFS_PATH Override the storage directory (same as --llmfs-path)
OPENAI_API_KEY Required when using "embedder": "openai"

Examples

Basic Usage

from llmfs import MemoryFS

mem = MemoryFS()

# Store a few memories
mem.write("/knowledge/db",     "We use PostgreSQL 15 with TimescaleDB extension")
mem.write("/knowledge/auth",   "JWT tokens use HS256, expire in 1 hour")
mem.write("/knowledge/stack",  "Backend: FastAPI + SQLAlchemy. Frontend: Next.js 14")

# Search
results = mem.search("database technology", k=3)
for r in results:
    print(f"[{r.score:.2f}] {r.path}: {r.chunk_text[:80]}")

# Read with a focused query
obj = mem.read("/knowledge/auth", query="what algorithm is used")
print(obj.content)

# Update
mem.update("/knowledge/auth", append="Refresh tokens last 30 days.")

# Link related memories
mem.relate("/knowledge/auth", "/knowledge/db", relationship="related_to")

Agent Memory Across Sessions

from llmfs import MemoryFS

mem = MemoryFS(path="~/.llmfs")

# ── Session 1 ─────────────────────────────────────────────────────────────────
print("=== Session 1 ===")
mem.write("/projects/myapp/auth",
          "Implemented OAuth2 with PKCE. Decided to use Keycloak.",
          tags=["auth", "oauth2", "decision"])

mem.write("/projects/myapp/db",
          "Migrated from MySQL to PostgreSQL 15 on 2026-03-10. "
          "All foreign keys use UUID, not integer IDs.",
          tags=["database", "migration"])

# ── Session 2 (different process, days later) ─────────────────────────────────
print("\n=== Session 2 ===")
mem2 = MemoryFS(path="~/.llmfs")  # same store

results = mem2.search("authentication decisions", layer="knowledge", k=3)
for r in results:
    print(f"  [{r.score:.2f}] {r.path}")
    print(f"  {r.chunk_text[:100]}\n")

# The agent still has the full auth decision from Session 1
obj = mem2.read("/projects/myapp/auth")
print("Auth context:", obj.content[:200])

Codebase Ingestion and Search

import os
from pathlib import Path
from llmfs import MemoryFS

mem = MemoryFS()
src = Path("./src")

# Ingest all Python files
ingested = 0
for py_file in src.rglob("*.py"):
    content = py_file.read_text(errors="replace")
    if len(content.strip()) < 20:
        continue
    # Use relative path as memory path
    mem_path = "/code" + str(py_file).replace(str(src), "").replace("\\", "/")
    mem.write(mem_path, content, layer="knowledge",
              tags=["code", "python"], content_type="python")
    ingested += 1

print(f"Ingested {ingested} files")

# Now search semantically
results = mem.search("database connection pooling", layer="knowledge", k=5)
for r in results:
    print(f"[{r.score:.2f}] {r.path}")

# Or use MQL
results = mem.query(
    'SELECT memory FROM /code WHERE SIMILAR TO "authentication middleware" LIMIT 3'
)

Infinite Context with OpenAI

import openai
from llmfs import MemoryFS
from llmfs.context import ContextMiddleware

mem = MemoryFS()
client = openai.OpenAI()

agent = ContextMiddleware(client, memory=mem, max_tokens=128000)

conversation = []
while True:
    user_input = input("You: ")
    if user_input.lower() == "quit":
        break

    conversation.append({"role": "user", "content": user_input})

    # ContextMiddleware automatically:
    # - Injects the memory index into the system prompt
    # - Evicts old turns to LLMFS when context fills
    # - Makes memory_read / memory_search available as tools
    response = agent.chat(conversation)
    assistant_message = response.choices[0].message.content
    conversation.append({"role": "assistant", "content": assistant_message})
    print(f"Assistant: {assistant_message}")

# Session statistics
stats = agent.get_context_stats()
print(f"\nTurns evicted: {stats['evicted_turns']}")
print(f"Cache hits:    {stats['cache_hits']}")
print(f"Token usage:   {stats['current_tokens']} / {stats['max_tokens']}")

Multi-Agent Shared Memory

from llmfs import MemoryFS

# Both agents share the same store
shared_mem = MemoryFS(path="/tmp/shared-project")

def planner_agent(task: str) -> str:
    """Agent 1: Plans the task and stores its findings."""
    plan = f"Plan for '{task}': 1. Analyze requirements, 2. Design schema, 3. Implement"
    shared_mem.write(
        f"/session/plans/{task.replace(' ', '_')}",
        plan,
        layer="session",
        tags=["plan", "planner"],
    )
    return plan

def executor_agent(task: str) -> str:
    """Agent 2: Reads the plan and executes it."""
    # Search for relevant plans and context
    plans = shared_mem.search(f"plan for {task}", layer="session", k=3)
    knowledge = shared_mem.search(f"{task} patterns", layer="knowledge", k=5)

    context = "\n".join([r.chunk_text for r in plans + knowledge])
    return f"Executing with context:\n{context[:500]}..."

# Agents collaborate via shared memory
plan = planner_agent("build user authentication")
result = executor_agent("build user authentication")
print(result)

Performance

Targets on commodity hardware (no GPU):

Operation Target Notes
Write (single memory, ~500 tokens) < 200 ms Includes chunking + embedding
Search (10k memories, top-5) < 100 ms Cached repeat queries in < 1 ms
Read (by path) < 10 ms SQLite lookup + chunk assembly
MQL query < 200 ms Parse + search
Context eviction (20 turns) < 500 ms Includes artifact extraction

Why it's fast:

  • all-MiniLM-L6-v2 runs at 1,000+ queries/second on CPU
  • SQLite WAL mode allows concurrent reads
  • Search results are cached for 5 minutes (SHA256-keyed)
  • Content hash check skips re-embedding unchanged content
  • ChromaDB uses an HNSW index for sub-linear search

Contributing

Contributions are welcome! LLMFS is an early-stage project and the best areas for contribution are listed below.

Getting Started

# Fork and clone
git clone https://github.com/viditraj/llmfs.git
cd llmfs

# Create a branch
git checkout -b feature/my-improvement

# Install in editable mode with dev dependencies
pip install -e ".[dev,mcp,openai,langchain]"

# Run the test suite
pytest --cov=llmfs --cov-report=term-missing

# Run linting
ruff check llmfs/ tests/

Running Tests

# All tests
pytest

# Specific module
pytest tests/test_filesystem.py -v

# With coverage report
pytest --cov=llmfs --cov-report=html
open htmlcov/index.html

# Fast: skip slow embedding tests
pytest -m "not slow"

Project Structure

llmfs/
├── llmfs/
│   ├── __init__.py           # Public API: MemoryFS, MemoryObject, SearchResult
│   ├── core/
│   │   ├── filesystem.py     # MemoryFS — single entry point for all operations
│   │   ├── memory_object.py  # MemoryObject, SearchResult, Chunk dataclasses
│   │   ├── memory_layers.py  # MemoryLayer enum, TTL logic
│   │   └── exceptions.py     # Typed exception hierarchy
│   ├── embeddings/
│   │   ├── base.py           # EmbedderBase abstract class
│   │   ├── local.py          # SentenceTransformer (all-MiniLM-L6-v2)
│   │   └── openai.py         # OpenAI text-embedding-3-small
│   ├── storage/
│   │   ├── vector_store.py   # ChromaDB wrapper
│   │   └── metadata_db.py    # SQLite wrapper (WAL mode)
│   ├── compression/
│   │   ├── chunker.py        # Adaptive chunker: code (AST) vs prose (headers)
│   │   └── summarizer.py     # TF-IDF extractive summarizer
│   ├── retrieval/
│   │   ├── engine.py         # Hybrid retrieval (semantic + temporal + graph)
│   │   └── ranker.py         # Score fusion, recency boost, MMR diversity
│   ├── graph/
│   │   └── memory_graph.py   # Relationship CRUD + BFS/DFS traversal
│   ├── query/
│   │   ├── parser.py         # MQL tokenizer + AST builder
│   │   └── executor.py       # AST → ChromaDB + SQLite
│   ├── context/
│   │   ├── manager.py        # ContextManager — virtual memory manager
│   │   ├── importance.py     # Importance scoring (0–1)
│   │   ├── extractor.py      # Artifact extraction before eviction
│   │   ├── index_builder.py  # Memory index (~2k tokens)
│   │   └── middleware.py     # Drop-in ContextMiddleware
│   ├── mcp/
│   │   ├── server.py         # MCP server (stdio + SSE)
│   │   ├── tools.py          # 6 tool handlers
│   │   └── prompts.py        # System prompt for LLMs
│   ├── cli/
│   │   ├── main.py           # Click entry point
│   │   └── commands.py       # All CLI command implementations
│   └── integrations/
│       ├── langchain.py      # LangChain adapters
│       ├── openai_tools.py   # OpenAI function-calling definitions
│       └── fuse_mount.py     # Optional FUSE mount
├── tests/
│   ├── test_filesystem.py
│   ├── test_embeddings.py
│   ├── test_storage.py
│   ├── test_retrieval.py
│   ├── test_compression.py
│   ├── test_graph.py
│   ├── test_query.py
│   ├── test_context.py
│   ├── test_mcp.py
│   └── test_cli.py
├── examples/
│   ├── basic_usage.py
│   ├── agent_memory.py
│   ├── code_search.py
│   ├── infinite_context.py
│   ├── multi_agent.py
│   ├── langchain_agent.py
│   ├── openai_agent.py
│   └── mcp_config.json
├── pyproject.toml
└── README.md

Areas for Contribution

  • New embedders — Add adapters for Cohere, Mistral, or local Ollama models
  • Retrieval improvements — Better score fusion, BM25 hybrid, cross-encoder reranking
  • MQL extensions — Additional condition types, subqueries, aggregations
  • Graph algorithms — PageRank-based memory importance, community detection
  • Streaming support — Streaming writes for real-time transcript ingestion
  • Async APIAsyncMemoryFS for use in async agents and servers
  • Windows FUSE — WinFsp-based FUSE mount for Windows
  • UI — A web dashboard for browsing and editing memories

Submitting a Pull Request

  1. Fork the repository and create a feature branch from main
  2. Write tests for any new behavior — we target 90%+ coverage
  3. Run the full suitepytest && ruff check .
  4. Update the README if you're adding a user-facing feature
  5. Open a PR with a clear description of what the change does and why

Reporting Issues

Use the GitHub issue tracker. For bugs, please include:

  • LLMFS version (pip show llmfs)
  • Python version
  • OS
  • Minimal reproduction script

License

MIT — free for commercial and personal use.


Acknowledgements

LLMFS stands on the shoulders of:

Reviews (0)

No results found