codeindex
Health Warn
- License — License: MIT
- Description — Repository has a description
- Active repo — Last push 0 days ago
- Low visibility — Only 5 GitHub stars
Code Pass
- Code scan — Scanned 12 files during light audit, no dangerous patterns found
Permissions Pass
- Permissions — No dangerous permissions requested
No AI report is available for this listing yet.
AI-readable codebase navigation indexes (README_AI.md) via a deterministic CLI — coding agents navigate faster, with fewer tokens. A/B-benchmarked.
codeindex
Make AI coding agents navigate your codebase by reading, not grepping.
codeindex is an open-source CLI that turns any codebase into AI-readable navigation indexes (README_AI.md) via a two-phase pipeline — structural indexing (tree-sitter AST) + optional one-line AI module descriptions. Agents browse the README_AI.md hierarchy, see what each module does, and jump straight to the right file — across Python, PHP, Java, TypeScript, JavaScript, Swift, and Objective-C. The measured payoff is efficiency, not magic (benchmark below).
Runs fully offline. Structural indexing needs no AI at all; AI descriptions use your local agent CLI (e.g. claude -p), so no code leaves your network — fine for air-gapped intranets. MIT-licensed, free, and meant to stay that way: codeindex is the open-source navigate layer; the reason/retrieval layer lives in LoomGraph.
Does this actually help an agent? We measured it.
Most "AI code understanding" tools assert value. We A/B-tested ours — and published the unflattering parts.
Across 15 graded navigation questions on 3 heterogeneous real projects, a coding agent with README_AI.md vs without:
- −28% tokens, −19% wall-time on average — agents reach the right file faster and cheaper.
- Answer quality is a wash. It does not make answers more correct — the win is efficiency, not capability. (An undisciplined index even hurt a few precise-mechanism questions; fixed in ADR-005.)
- Smallest win on the largest codebases. On a 250-directory legacy system the token win nearly vanished — a flat index points you to files but can't synthesize cross-module semantics. codeindex is the navigate layer, not the understand-everything layer (pair it with source-reading / Serena for precise mechanism).
Full data incl. the failure cases: 2026-05 benchmark. Reproduce on your own repos: bench/ (make setup && make run && make grade).
Why publish the parts that don't flatter the tool: a navigation index that quietly degrades answer quality is worse than none. Knowing exactly where it helps — and where to drop to source — is the point.
For LoomGraph Developers:
FOR_LOOMGRAPH.md(quick start) |docs/guides/loomgraph-integration.md(full guide)
Features
Core: Code Understanding for AI Agents
- Two-phase documentation pipeline (v0.23.0) — Phase 1: structural README_AI.md via SmartWriter; Phase 2: AI generates one-line functional descriptions per module. AI agents can browse README_AI.md hierarchy and find the right module without grep.
- Smart indexing — Tiered documentation (overview → navigation → detailed) optimized for AI agents, ≤10KB per file (navigation index, not a tech doc — see ADR-005)
- Auto-AI enrichment — When
ai_commandis configured,scan-allautomatically enables AI module descriptions. Use--no-aito opt out - Auto-update hooks — Optional post-commit hook (
codeindex hooks install) regenerates README_AI.md for changed directories. Thin wrapper pattern:pipx upgrade ai-codeindexauto-updates hook logic
Parsing & Analysis
- Multi-language AST parsing — Python, PHP, Java, TypeScript, JavaScript, Swift, Objective-C via tree-sitter; more languages plug in via the extractor API (
src/codeindex/extractors/, community-contributed) - Call relationship extraction — Function/method call graphs across Python, Java, PHP, TypeScript, JavaScript
- Inheritance extraction — Class hierarchy and interface relationships
- Framework route extraction — ThinkPHP and Spring Boot route tables (more planned)
- Technical debt analysis — Detect large files, god classes, symbol overload, test smells
- Single file parse —
codeindex parse <file>with JSON output for tool integration - Structured JSON output —
--output jsonfor CI/CD, knowledge graphs, and downstream tools
Developer Experience
- Adaptive symbol extraction — Dynamic 5–150 symbols per file based on size
- CLAUDE.md injection —
codeindex initinjects a codeindex section into your project'sCLAUDE.md(never~/.claude) - Claude Code plugin —
codeindex:arch/:index/:hooks/:update-guideskills via dreamlx/codeindex-claude - Template-based test generation — YAML + Jinja2 for rapid language support (88–91% time savings)
- Parallel scanning — Concurrent directory processing with configurable workers
Use Cases
🏢 Enterprise Intranet (Core Scenario)
Without external tools: When Serena MCP or other cloud-based code intelligence tools are unavailable due to network isolation or security policies, codeindex becomes the primary code understanding tool.
# Enterprise developer workflow
git clone <internal-repo>
codeindex init # Configure project
codeindex scan-all # Structural + AI descriptions (auto)
# AI agent reads README_AI.md → sees module purposes → navigates directly
# No grep needed for code discovery
codeindex tech-debt src/ --output review.md # Code quality analysis
Why enterprises choose codeindex:
- ✅ Semantic navigation — AI agents understand module purposes from README_AI.md hierarchy
- ✅ Intranet compatible — no external dependencies, fully offline
- ✅ Self-contained — no upstream MCP servers required
- ✅ Version stable — enterprise-controlled release cycle
- ✅ Data sovereignty — code never leaves internal network
🕸️ Knowledge Graph Integration (LoomGraph)
For enterprise teams: codeindex serves as the core data source for LoomGraph knowledge graphs, enabling semantic code search across the organization.
# Data pipeline
codeindex scan --output json > parse_results.json
loomgraph inject parse_results.json # Build knowledge graph
# Team can now search code using natural language
Three-repo architecture:
codeindex (Parse) → LoomGraph (Orchestrate) → LightRAG (Store)
↓ ParseResult ↓ Embeddings ↓ Semantic Search
AST extraction Knowledge Graph Vector + Graph DB
Without codeindex, LoomGraph cannot function. See LoomGraph Integration Guide.
👤 Personal Developers (Complementary)
With Serena MCP: For individual developers using Claude Code + Serena MCP, codeindex provides complementary value:
- codeindex (build-time): Semantic architecture map (README_AI.md with module descriptions) + quality analysis
- Serena (real-time): Precise symbol navigation (
find_symbol,find_referencing_symbols)
# Personal developer workflow
codeindex init # Setup CLAUDE.md integration
codeindex scan-all # Structural + AI descriptions (auto)
codeindex hooks install post-commit # Auto-update on commit
# Claude Code reads README_AI.md → understands module purpose → uses Serena for details
Relationship: codeindex provides the "map with labels," Serena provides the "GPS navigation."
Installation
codeindex is a CLI tool — install it with pipx (isolated, no dependency conflicts):
pipx install ai-codeindex
Claude Code users — also install the companion plugin for skills
(codeindex:arch/:index/:hooks/:update-guide):/plugin marketplace add dreamlx/codeindex-claude /plugin install codeindex@codeindex-claudeThe plugin is optional and only for Claude Code. The CLI works standalone
in any editor / terminal. See dreamlx/codeindex-claude.
Language parsers
codeindex uses lazy loading — language parsers are imported only when needed.pipx install ai-codeindex pulls all of them by default. To inject extras into the
pipx environment later, or to install a subset:
pipx inject ai-codeindex tree-sitter-python tree-sitter-typescript # add to pipx env
# or pin a subset at install time:
pipx install "ai-codeindex[python]" # python only
pipx install "ai-codeindex[ios]" # Swift + Objective-C
Alternatives to pipx
pip install --user ai-codeindex # if you don't have pipx
🇨🇳 China users: if your default mirror (e.g. aliyun) hasn't synced the
latest release yet, install straight from upstream PyPI:pipx install --index-url https://pypi.org/simple/ ai-codeindex
From Source
git clone https://github.com/dreamlx/codeindex.git
cd codeindex
pip install -e ".[all]"
Quick Start
1. Initialize Your Project
cd /your/project
codeindex init
This creates:
.codeindex.yaml— scan configuration (languages, include/exclude patterns)CLAUDE.md— injects codeindex instructions so Claude Code uses README_AI.md automaticallyCODEINDEX.md— project-level documentation reference
2. Scan Your Codebase
# Scan all directories
# When ai_command is configured → auto Phase 1 (structural) + Phase 2 (AI descriptions)
# Without ai_command → Phase 1 only (structural)
codeindex scan-all
# Structural only (skip AI enrichment)
codeindex scan-all --no-ai
# Scan a single directory
codeindex scan ./src/auth
# Full AI-generated README for a single directory
codeindex scan ./src/auth --ai
# Preview AI prompt without executing
codeindex scan ./src/auth --ai --dry-run
3. Check Status
codeindex status
Indexing Status
───────────────────────────────
✅ src/auth/
✅ src/utils/
⚠️ src/api/ (no README_AI.md)
Indexed: 2/3 (67%)
4. Generate Indexes
# Global symbol index (PROJECT_SYMBOLS.md)
codeindex symbols
# Module overview (PROJECT_INDEX.md)
codeindex index
# Git change impact analysis
codeindex affected --since HEAD~5
More Commands
| Command | Description | Guide |
|---|---|---|
codeindex scan --output json |
JSON output for tools | JSON Output Guide |
codeindex parse <file> |
Parse single file to JSON | LoomGraph Integration |
codeindex tech-debt ./src |
Code quality analysis (debt + test smells) | Enhanced in v0.22.0 |
codeindex debt-scan ./src |
Alias for tech-debt | Backward compatibility |
codeindex hooks install |
Git hooks for auto-update | Git Hooks Guide |
codeindex doctor |
Health/sync check (CLI, parsers, CLAUDE.md, plugin) | Read-only diagnostic |
codeindex config explain <param> |
Parameter help | Configuration Guide |
Claude Code Integration
The codeindex plugin gives Claude Code four skills backed by the CLI:
/plugin marketplace add dreamlx/codeindex-claude
/plugin install codeindex@codeindex-claude
| Skill | What it does |
|---|---|
codeindex:arch |
Answer architecture / "where is X" questions from README_AI.md |
codeindex:index |
Walk you through codeindex init → scan-all |
codeindex:hooks |
Set up the auto-update post-commit hook |
codeindex:update-guide |
Refresh the codeindex section in your project's CLAUDE.md |
codeindex init also injects a codeindex section into your project's CLAUDE.md
so Claude Code reads README_AI.md files first. (As of v0.25.0, init only
touches project-scoped files — see ADR-006.)
For enterprise users without Serena: README_AI.md and PROJECT_SYMBOLS.md become your primary code navigation tools.
The plugin skills don't replace the
codeindex claude-md/codeindex hooks
CLI commands — they orchestrate them. The commands stay first-class for
CLI-only users (Cursor, scripts); the skills add a guided Claude Code UX on top.
Language Support
| Language | Status | Since | Key Features |
|---|---|---|---|
| Python | ✅ Supported | v0.1.0 | Classes, functions, methods, imports, docstrings, inheritance, calls |
| PHP | ✅ Supported | v0.5.0 | Classes (extends/implements), methods, properties, PHPDoc, inheritance, calls |
| Java | ✅ Supported | v0.7.0 | Classes, interfaces, enums, records, annotations, Spring routes, Lombok, calls |
| TypeScript/JS | ✅ Supported | v0.19.0 | Classes, interfaces, enums, type aliases, arrow functions, JSX/TSX, imports/exports, calls |
| Swift | ✅ Supported | v0.21.0 | Classes, structs, enums, protocols, extensions, methods, properties |
| Objective-C | ✅ Supported | v0.21.0 | Classes, protocols, categories, properties, methods (instance/class) |
| Go | 📋 Planned | — | Packages, interfaces, struct methods |
| Rust | 📋 Planned | — | Structs, traits, modules |
| C# | 📋 Planned | — | Classes, interfaces, .NET projects |
Want to add a language? The template-based test system lets you contribute by writing YAML specs — no Python knowledge required. See CONTRIBUTING.md for details.
Framework Route Extraction
| Framework | Language | Status |
|---|---|---|
| ThinkPHP | PHP | ✅ Stable (v0.5.0) |
| Spring Boot | Java | ✅ Stable (v0.8.0) |
| Laravel | PHP | 📋 Planned |
| FastAPI | Python | 📋 Planned |
| Django | Python | 📋 Planned |
| Express.js | JS/TS | 📋 Planned |
Code Quality Analysis
tech-debt: Comprehensive Quality Analysis (Enhanced in v0.22.0)
The tech-debt command provides comprehensive code quality analysis, now including test smells detection:
# JSON output (for LoomGraph integration)
codeindex tech-debt ./src --format json > debt-data.json
# Markdown report (for documentation)
codeindex tech-debt ./src --format markdown > report.md
# Console output (for quick checks)
codeindex tech-debt ./src --format console
# Alias: debt-scan also works (backward compatibility)
codeindex debt-scan ./src --format json
What it detects:
- 🔴 Super large files (>5000 lines), Large files (>2000 lines)
- 🔴 God Classes (>50 methods)
- 🔴 Long methods (>80/150 lines)
- 🟡 High coupling (>8 internal imports)
- 🟡 Symbol overload (>100 symbols, high noise ratio)
- 🧪 Test smells (skipped tests, giant test files) — New in v0.22.0
- 📊 Quality scoring (0-100 scale per file)
Enhanced JSON output (v0.22.0):
{
"timestamp": "2026-03-06T13:45:39Z",
"summary": {
"total_files": 97,
"giant_files": 0,
"giant_functions": 3,
"test_smells": 64,
"avg_maintainability": 9.9
},
"total_files": 97,
"average_quality_score": 99.4,
"giant_files": [],
"giant_functions": [...],
"test_smells": [
{
"path": "tests/test_example.py",
"type": "skipped_test",
"details": "Skipped test detected: @pytest.mark.skip at line 42",
"line_number": 42
}
],
"file_reports": [...]
}
Key features:
- ✅ Unified command: Single entry point for all quality checks
- ✅ Backward compatible: All existing JSON fields preserved
- ✅ LoomGraph ready: Enhanced summary for knowledge graph integration
- ✅ Framework-agnostic: Detects test smells across Jest, pytest, JUnit, etc.
- ✅ KISS design: 90% code reuse, simple regex patterns for test detection
How It Works
Two-Phase Pipeline (v0.23.0)
Phase 1 (Structural):
Directory → Scanner → Parser (tree-sitter) → SmartWriter → README_AI.md
Phase 2 (AI Enrichment, automatic when ai_command configured):
README_AI.md → symbol names + file names → AI → one-line description → blockquote injection
Phase 1: Structural generation (always runs)
- Scanner — walks directories, filters by config patterns
- Parser — extracts symbols (classes, functions, imports, calls, inheritance) via tree-sitter
- SmartWriter — generates tiered documentation with size limits (≤50KB)
- Output —
README_AI.mdoptimized for AI consumption, or JSON for tool integration
Phase 2: AI enrichment (auto-enabled when ai_command configured)
- Generates a one-line functional description for each non-leaf module
- Writes as blockquote:
> 会员等级管理、积分兑换、权益卡券 - ~200-400 tokens per directory, 10-20x cheaper than full AI generation
- Parent directories read child descriptions for hierarchical navigation
Before vs After: Code Navigation
Before (structural only):
└── Application/
├── Vip/ — 48 files | 386 symbols ← AI agent cannot determine purpose
├── Pay/ — 23 files | 178 symbols
└── SmallProgramApi/ — 31 files | 245 symbols
After (structural + AI enrichment):
└── Application/
├── Vip/ — 会员等级管理、积分兑换、权益卡券 | 48 files
├── Pay/ — 支付网关(支付宝/微信/退款) | 23 files
└── SmallProgramApi/ — 小程序端API(登录、头像、商品) | 31 files
↑ AI agent can navigate directly
Three-Repo Architecture (Enterprise Knowledge Graph)
┌────────────────────────────────────────────────────┐
│ Enterprise Intranet Environment │
├────────────────────────────────────────────────────┤
│ │
│ 📦 Code Repository (Git) │
│ ↓ │
│ 🔍 codeindex (Parse Layer) │
│ ├── scan --output json → ParseResult │
│ ├── README_AI.md → architecture docs │
│ └── tech-debt → comprehensive quality scan │
│ ↓ │
│ 🕸️ LoomGraph (Orchestration Layer) │
│ ├── inject ParseResult │
│ ├── generate embeddings │
│ └── build knowledge graph │
│ ↓ │
│ 💾 LightRAG (Storage Layer) │
│ ├── PostgreSQL (graph data) │
│ ├── Vector DB (embeddings) │
│ └── Query API (semantic search) │
│ ↓ │
│ 💬 AI Agents (Claude Code, Internal Chat) │
│ └── Natural language code search │
│ │
└────────────────────────────────────────────────────┘
codeindex role: Bottom layer (data collection & parsing) — the entire system depends on codeindex providing structured ParseResult data.
Documentation
User Guides
| Guide | Description |
|---|---|
| Getting Started | Installation and first scan |
| Configuration Guide | All config options explained |
| Advanced Usage | Parallel scanning, custom prompts |
| Git Hooks Integration | Automated quality checks and doc updates |
| Claude Code Integration | AI agent setup and MCP skills |
| JSON Output Integration | Machine-readable output for tools |
| LoomGraph Integration | Knowledge graph data pipeline |
Developer Guides
| Guide | Description |
|---|---|
| CONTRIBUTING.md | Development setup, TDD workflow, code style |
| CLAUDE.md | Quick reference for Claude Code and contributors |
| Design Philosophy | Core design principles and architecture |
| ADR-005 | 2026-05: navigation-contract disclaimer + size cap, backed by benchmark |
| Release Automation | 5-minute automated release workflow |
| Multi-Language Support | Adding new language parsers |
| Language Support Contribution | Template-based test generation for new languages |
Evidence & benchmarks
| Doc | What it shows |
|---|---|
| 2026-05 README impact benchmark | Measured agent comprehension delta WITH vs WITHOUT README_AI.md across 3 heterogeneous projects (15 graded questions). Headline: 19% faster / 28% fewer tokens on average, but speed gains masked quality drops on some detail questions — fix is shipped (see ADR-005). |
bench/ |
Reproducible harness (Makefile + python) used to produce the benchmark above; run your own with cd bench && make setup && make run && make grade && make report. |
Planning
- Strategic Roadmap — long-term vision and priorities
- Changelog — version history and breaking changes
Contributing
We welcome contributions! See CONTRIBUTING.md for guidelines.
git clone https://github.com/dreamlx/codeindex.git
cd codeindex
pip install -e ".[dev,all]"
make install-hooks
make test
Release Process (Maintainers)
make release VERSION=0.17.0
# GitHub Actions: tests → PyPI publish → GitHub Release
See Release Automation Guide for details.
Roadmap
Current version: v0.25.0
Recent milestones:
- v0.23.0 — AI-Enhanced Module Descriptions: two-phase pipeline, auto-AI enrichment, post-commit thin wrapper
- v0.22.2 — Auto-update CLAUDE.md on
pip upgrade,/codeindex-update-guideskill - v0.22.0 — Unified tech-debt + test smells analysis
- v0.21.0 — Swift & Objective-C language support
- v0.19.0 — TypeScript/JavaScript support with call extraction
Next:
- Framework routes expansion: Express, Laravel, FastAPI, Django (Epic 17)
- Go, Rust, C# language support
Moved to LoomGraph:
- Code similarity search, refactoring suggestions, team collaboration, IDE integration
See Strategic Roadmap for detailed plans.
License
MIT License — see LICENSE file for details.
Acknowledgments
- tree-sitter — fast, incremental parsing
- Claude CLI — AI integration inspiration
- All contributors and users
Support
- Questions: GitHub Discussions
- Bugs: GitHub Issues
- Feature Requests: GitHub Issues
Made with ❤️ by the codeindex team
Reviews (0)
Sign in to leave a review.
Leave a reviewNo results found