largefile
mcp
Warn
Health Warn
- No license — Repository has no license file
- Description — Repository has a description
- Active repo — Last push 0 days ago
- Low visibility — Only 9 GitHub stars
Code Pass
- Code scan — Scanned 12 files during light audit, no dangerous patterns found
Permissions Pass
- Permissions — No dangerous permissions requested
Purpose
This tool is an MCP server designed to help AI models read, search, and edit files that are too large to fit into their standard context windows. It offers semantic code navigation and smart search capabilities to handle multi-gigabyte files efficiently.
Security Assessment
The overall risk is rated as Low. A code scan of 12 files found no dangerous patterns, no hardcoded secrets, and no dangerous permissions requested. While the tool allows AI agents to read and edit local files, this is its intended function. It relies on standard Python libraries and does not appear to execute arbitrary shell commands or make suspicious external network requests. Automatic backups for edits also provide a nice safety net against unintended file changes.
Quality Assessment
The project is actively maintained, with its most recent push occurring today. However, it currently has low community visibility with only 9 GitHub stars, meaning it hasn't been widely battle-tested by a large audience. There is a minor discrepancy in repository health: the automated scan flagged a missing license file, but the README indicates it is MIT licensed. As a developer, you should verify the license terms in the repository before adopting it for commercial projects.
Verdict
Safe to use, though you should verify the licensing status and keep in mind that the tool is still in its early stages of community adoption.
This tool is an MCP server designed to help AI models read, search, and edit files that are too large to fit into their standard context windows. It offers semantic code navigation and smart search capabilities to handle multi-gigabyte files efficiently.
Security Assessment
The overall risk is rated as Low. A code scan of 12 files found no dangerous patterns, no hardcoded secrets, and no dangerous permissions requested. While the tool allows AI agents to read and edit local files, this is its intended function. It relies on standard Python libraries and does not appear to execute arbitrary shell commands or make suspicious external network requests. Automatic backups for edits also provide a nice safety net against unintended file changes.
Quality Assessment
The project is actively maintained, with its most recent push occurring today. However, it currently has low community visibility with only 9 GitHub stars, meaning it hasn't been widely battle-tested by a large audience. There is a minor discrepancy in repository health: the automated scan flagged a missing license file, but the README indicates it is MIT licensed. As a developer, you should verify the license terms in the repository before adopting it for commercial projects.
Verdict
Safe to use, though you should verify the licensing status and keep in mind that the tool is still in its early stages of community adoption.
MCP server for reading, searching, and editing files too large for LLM context windows
README.md
Largefile MCP Server
Navigate, search, and edit large codebases, logs, and data files that exceed AI context limits.
Why Largefile?
- Go beyond context limits - Read, search, and edit files too large to fit in AI context windows
- Semantic code navigation - Tree-sitter extracts functions/classes for Python, JS/TS, Rust, Go
- Fewer LLM errors - Search/replace editing eliminates line number mistakes common with line-based edits
- Smart search - Fuzzy matching, regex, case-insensitive, inverted, and count-only modes
- No size limits - Handles multi-GB files via tiered memory strategy (RAM → mmap → streaming)
Quick Start
Prerequisite: Install uv for the uvx command.
{
"mcpServers": {
"largefile": {
"command": "uvx",
"args": ["--from", "largefile", "largefile-mcp"]
}
}
}
Tools
| Tool | Use For |
|---|---|
get_overview |
File structure and semantic outline before diving in |
search_content |
Finding patterns, counting occurrences, regex matching |
read_content |
Reading specific sections; tail/head modes for logs |
edit_content |
Safe search/replace with automatic backups |
revert_edit |
Recovering from bad edits |
list_directory |
Browse directory trees with recursive depth control |
search_directory |
Search patterns across all files in a directory |
When to Use Largefile
Use when:
- File exceeds ~1000 lines or 100KB (supports multi-GB files)
- Navigating large codebases with semantic structure
- Analyzing log files (especially recent entries with tail mode)
- Making search/replace edits across large files
- Counting occurrences without loading full content
Don't use for:
- Small files that fit in context (AI doesn't need help with those)
- Binary files (images, executables, compressed)
Usage Examples
Large Codebase Navigation
# Get semantic structure of a large Python file
overview = get_overview("/path/to/large_module.py")
# Returns: 2,847 lines, 15 classes, function outline via Tree-sitter
# Find all class definitions
classes = search_content("/path/to/large_module.py", "class ", fuzzy=False)
# Read complete class with semantic chunking
code = read_content("/path/to/large_module.py", pattern="class UserModel", mode="semantic")
Batch Refactoring
# Preview rename across file
preview = edit_content("/path/to/api.py", changes=[
{"search": "process_data", "replace": "transform_data"},
{"search": "old_endpoint", "replace": "new_endpoint"}
], preview=True)
# Apply changes (creates automatic backup)
result = edit_content("/path/to/api.py", changes=[...], preview=False)
# Undo if needed
revert_edit("/path/to/api.py")
Log Analysis
# Get log file overview
overview = get_overview("/var/log/app.log")
# Returns: 150,000 lines, 2.1GB
# Read last 500 lines efficiently
recent = read_content("/var/log/app.log", limit=500, mode="tail")
# Count errors without loading content
error_count = search_content("/var/log/app.log", "ERROR", count_only=True, fuzzy=False)
# Find errors with regex
errors = search_content("/var/log/app.log", r"ERROR.*timeout", regex=True)
Supported Languages
Tree-sitter semantic analysis for: Python, JavaScript/JSX, TypeScript/TSX, Rust, Go, Java
Other file types use text-based analysis with graceful fallback.
File Size Handling
| Size | Strategy |
|---|---|
| < 50MB | Full memory loading with AST caching |
| 50-500MB | Memory-mapped access |
| > 500MB | Streaming (tail/head modes recommended) |
Configuration
Environment variables for tuning:
LARGEFILE_MEMORY_THRESHOLD_MB=50 # RAM loading limit
LARGEFILE_MMAP_THRESHOLD_MB=500 # Memory mapping limit
LARGEFILE_FUZZY_THRESHOLD=0.8 # Match sensitivity (0.0-1.0)
LARGEFILE_MAX_SEARCH_RESULTS=20 # Results per search
LARGEFILE_BACKUP_DIR=~/.largefile/backups
Documentation
- API Reference - Detailed tool documentation
- Configuration Guide - All environment variables
- Examples - More workflow examples
- Design Document - Architecture details
- Contributing - Development setup
Reviews (0)
Sign in to leave a review.
Leave a reviewNo results found