codeindex

Make AI coding agents navigate your codebase by reading, not grepping.

codeindex is an open-source CLI that turns any codebase into AI-readable navigation indexes (README_AI.md) via a two-phase pipeline — structural indexing (tree-sitter AST) + optional one-line AI module descriptions. Agents browse the README_AI.md hierarchy, see what each module does, and jump straight to the right file — across Python, PHP, Java, TypeScript, JavaScript, Swift, and Objective-C. The measured payoff is efficiency, not magic (benchmark below).

Runs fully offline. Structural indexing needs no AI at all; AI descriptions use your local agent CLI (e.g. claude -p), so no code leaves your network — fine for air-gapped intranets. MIT-licensed, free, and meant to stay that way: codeindex is the open-source navigate layer; the reason/retrieval layer lives in LoomGraph.

Does this actually help an agent? We measured it.

Most "AI code understanding" tools assert value. We A/B-tested ours — and published the unflattering parts.

Across 15 graded navigation questions on 3 heterogeneous real projects, a coding agent with README_AI.md vs without:

−28% tokens, −19% wall-time on average — agents reach the right file faster and cheaper.
Answer quality is a wash. It does not make answers more correct — the win is efficiency, not capability. (An undisciplined index even hurt a few precise-mechanism questions; fixed in ADR-005.)
Smallest win on the largest codebases. On a 250-directory legacy system the token win nearly vanished — a flat index points you to files but can't synthesize cross-module semantics. codeindex is the navigate layer, not the understand-everything layer (pair it with source-reading / Serena for precise mechanism).

Full data incl. the failure cases: 2026-05 benchmark. Reproduce on your own repos: bench/ (make setup && make run && make grade).

Why publish the parts that don't flatter the tool: a navigation index that quietly degrades answer quality is worse than none. Knowing exactly where it helps — and where to drop to source — is the point.

For LoomGraph Developers: FOR_LOOMGRAPH.md (quick start) | docs/guides/loomgraph-integration.md (full guide)

Features

Core: Code Understanding for AI Agents

Two-phase documentation pipeline (v0.23.0) — Phase 1: structural README_AI.md via SmartWriter; Phase 2: AI generates one-line functional descriptions per module. AI agents can browse README_AI.md hierarchy and find the right module without grep.
Smart indexing — Tiered documentation (overview → navigation → detailed) optimized for AI agents, ≤10KB per file (navigation index, not a tech doc — see ADR-005)
Auto-AI enrichment — When ai_command is configured, scan-all automatically enables AI module descriptions. Use --no-ai to opt out
Auto-update hooks — Optional post-commit hook (codeindex hooks install) regenerates README_AI.md for changed directories. Thin wrapper pattern: pipx upgrade ai-codeindex auto-updates hook logic

Parsing & Analysis

Multi-language AST parsing — Python, PHP, Java, TypeScript, JavaScript, Swift, Objective-C via tree-sitter; more languages plug in via the extractor API (src/codeindex/extractors/, community-contributed)
Call relationship extraction — Function/method call graphs across Python, Java, PHP, TypeScript, JavaScript
Inheritance extraction — Class hierarchy and interface relationships
Framework route extraction — ThinkPHP and Spring Boot route tables (more planned)
Technical debt analysis — Detect large files, god classes, symbol overload, test smells
Single file parse — codeindex parse <file> with JSON output for tool integration
Structured JSON output — --output json for CI/CD, knowledge graphs, and downstream tools

Developer Experience

Adaptive symbol extraction — Dynamic 5–150 symbols per file based on size
CLAUDE.md injection — codeindex init injects a codeindex section into your project's CLAUDE.md (never ~/.claude)
Claude Code plugin — codeindex:arch / :index / :hooks / :update-guide skills via dreamlx/codeindex-claude
Template-based test generation — YAML + Jinja2 for rapid language support (88–91% time savings)
Parallel scanning — Concurrent directory processing with configurable workers

Use Cases

🏢 Enterprise Intranet (Core Scenario)

Without external tools: When Serena MCP or other cloud-based code intelligence tools are unavailable due to network isolation or security policies, codeindex becomes the primary code understanding tool.

# Enterprise developer workflow
git clone <internal-repo>
codeindex init                       # Configure project
codeindex scan-all                   # Structural + AI descriptions (auto)
# AI agent reads README_AI.md → sees module purposes → navigates directly
# No grep needed for code discovery
codeindex tech-debt src/ --output review.md  # Code quality analysis

Why enterprises choose codeindex:

✅ Semantic navigation — AI agents understand module purposes from README_AI.md hierarchy
✅ Intranet compatible — no external dependencies, fully offline
✅ Self-contained — no upstream MCP servers required
✅ Version stable — enterprise-controlled release cycle
✅ Data sovereignty — code never leaves internal network

🕸️ Knowledge Graph Integration (LoomGraph)

For enterprise teams: codeindex serves as the core data source for LoomGraph knowledge graphs, enabling semantic code search across the organization.

# Data pipeline
codeindex scan --output json > parse_results.json
loomgraph inject parse_results.json  # Build knowledge graph
# Team can now search code using natural language

Three-repo architecture:

codeindex (Parse)  →  LoomGraph (Orchestrate)  →  LightRAG (Store)
   ↓ ParseResult         ↓ Embeddings              ↓ Semantic Search
   AST extraction        Knowledge Graph           Vector + Graph DB

Without codeindex, LoomGraph cannot function. See LoomGraph Integration Guide.

👤 Personal Developers (Complementary)

With Serena MCP: For individual developers using Claude Code + Serena MCP, codeindex provides complementary value:

codeindex (build-time): Semantic architecture map (README_AI.md with module descriptions) + quality analysis
Serena (real-time): Precise symbol navigation (find_symbol, find_referencing_symbols)

# Personal developer workflow
codeindex init                    # Setup CLAUDE.md integration
codeindex scan-all                # Structural + AI descriptions (auto)
codeindex hooks install post-commit  # Auto-update on commit
# Claude Code reads README_AI.md → understands module purpose → uses Serena for details

Relationship: codeindex provides the "map with labels," Serena provides the "GPS navigation."

Installation

codeindex is a CLI tool — install it with pipx (isolated, no dependency conflicts):

pipx install ai-codeindex

Claude Code users — also install the companion plugin for skills
(codeindex:arch / :index / :hooks / :update-guide):
/plugin marketplace add dreamlx/codeindex-claude
/plugin install codeindex@codeindex-claude
The plugin is optional and only for Claude Code. The CLI works standalone
in any editor / terminal. See dreamlx/codeindex-claude.

Language parsers

codeindex uses lazy loading — language parsers are imported only when needed.
pipx install ai-codeindex pulls all of them by default. To inject extras into the
pipx environment later, or to install a subset:

pipx inject ai-codeindex tree-sitter-python tree-sitter-typescript   # add to pipx env
# or pin a subset at install time:
pipx install "ai-codeindex[python]"      # python only
pipx install "ai-codeindex[ios]"         # Swift + Objective-C

Alternatives to pipx

pip install --user ai-codeindex          # if you don't have pipx

🇨🇳 China users: if your default mirror (e.g. aliyun) hasn't synced the
latest release yet, install straight from upstream PyPI:
pipx install --index-url https://pypi.org/simple/ ai-codeindex

From Source

git clone https://github.com/dreamlx/codeindex.git
cd codeindex
pip install -e ".[all]"

Quick Start

1. Initialize Your Project

cd /your/project
codeindex init

This creates:

.codeindex.yaml — scan configuration (languages, include/exclude patterns)
CLAUDE.md — injects codeindex instructions so Claude Code uses README_AI.md automatically
CODEINDEX.md — project-level documentation reference

2. Scan Your Codebase

# Scan all directories
# When ai_command is configured → auto Phase 1 (structural) + Phase 2 (AI descriptions)
# Without ai_command → Phase 1 only (structural)
codeindex scan-all

# Structural only (skip AI enrichment)
codeindex scan-all --no-ai

# Scan a single directory
codeindex scan ./src/auth

# Full AI-generated README for a single directory
codeindex scan ./src/auth --ai

# Preview AI prompt without executing
codeindex scan ./src/auth --ai --dry-run

3. Check Status

codeindex status

Indexing Status
───────────────────────────────
✅ src/auth/
✅ src/utils/
⚠️  src/api/ (no README_AI.md)
Indexed: 2/3 (67%)

4. Generate Indexes

# Global symbol index (PROJECT_SYMBOLS.md)
codeindex symbols

# Module overview (PROJECT_INDEX.md)
codeindex index

# Git change impact analysis
codeindex affected --since HEAD~5

More Commands

Command	Description	Guide
`codeindex scan --output json`	JSON output for tools	JSON Output Guide
`codeindex parse <file>`	Parse single file to JSON	LoomGraph Integration
`codeindex tech-debt ./src`	Code quality analysis (debt + test smells)	Enhanced in v0.22.0
`codeindex debt-scan ./src`	Alias for tech-debt	Backward compatibility
`codeindex hooks install`	Git hooks for auto-update	Git Hooks Guide
`codeindex doctor`	Health/sync check (CLI, parsers, CLAUDE.md, plugin)	Read-only diagnostic
`codeindex config explain <param>`	Parameter help	Configuration Guide

Claude Code Integration

The codeindex plugin gives Claude Code four skills backed by the CLI:

/plugin marketplace add dreamlx/codeindex-claude
/plugin install codeindex@codeindex-claude

Skill	What it does
`codeindex:arch`	Answer architecture / "where is X" questions from `README_AI.md`
`codeindex:index`	Walk you through `codeindex init` → `scan-all`
`codeindex:hooks`	Set up the auto-update post-commit hook
`codeindex:update-guide`	Refresh the codeindex section in your project's `CLAUDE.md`

codeindex init also injects a codeindex section into your project's CLAUDE.md
so Claude Code reads README_AI.md files first. (As of v0.25.0, init only
touches project-scoped files — see ADR-006.)

For enterprise users without Serena: README_AI.md and PROJECT_SYMBOLS.md become your primary code navigation tools.

The plugin skills don't replace the codeindex claude-md / codeindex hooks
CLI commands — they orchestrate them. The commands stay first-class for
CLI-only users (Cursor, scripts); the skills add a guided Claude Code UX on top.

Language Support

Language	Status	Since	Key Features
Python	✅ Supported	v0.1.0	Classes, functions, methods, imports, docstrings, inheritance, calls
PHP	✅ Supported	v0.5.0	Classes (extends/implements), methods, properties, PHPDoc, inheritance, calls
Java	✅ Supported	v0.7.0	Classes, interfaces, enums, records, annotations, Spring routes, Lombok, calls
TypeScript/JS	✅ Supported	v0.19.0	Classes, interfaces, enums, type aliases, arrow functions, JSX/TSX, imports/exports, calls
Swift	✅ Supported	v0.21.0	Classes, structs, enums, protocols, extensions, methods, properties
Objective-C	✅ Supported	v0.21.0	Classes, protocols, categories, properties, methods (instance/class)
Go	📋 Planned	—	Packages, interfaces, struct methods
Rust	📋 Planned	—	Structs, traits, modules
C#	📋 Planned	—	Classes, interfaces, .NET projects

Want to add a language? The template-based test system lets you contribute by writing YAML specs — no Python knowledge required. See CONTRIBUTING.md for details.

Framework Route Extraction

Framework	Language	Status
ThinkPHP	PHP	✅ Stable (v0.5.0)
Spring Boot	Java	✅ Stable (v0.8.0)
Laravel	PHP	📋 Planned
FastAPI	Python	📋 Planned
Django	Python	📋 Planned
Express.js	JS/TS	📋 Planned

Code Quality Analysis

tech-debt: Comprehensive Quality Analysis (Enhanced in v0.22.0)

The tech-debt command provides comprehensive code quality analysis, now including test smells detection:

# JSON output (for LoomGraph integration)
codeindex tech-debt ./src --format json > debt-data.json

# Markdown report (for documentation)
codeindex tech-debt ./src --format markdown > report.md

# Console output (for quick checks)
codeindex tech-debt ./src --format console

# Alias: debt-scan also works (backward compatibility)
codeindex debt-scan ./src --format json

What it detects:

🔴 Super large files (>5000 lines), Large files (>2000 lines)
🔴 God Classes (>50 methods)
🔴 Long methods (>80/150 lines)
🟡 High coupling (>8 internal imports)
🟡 Symbol overload (>100 symbols, high noise ratio)
🧪 Test smells (skipped tests, giant test files) — New in v0.22.0
📊 Quality scoring (0-100 scale per file)

Enhanced JSON output (v0.22.0):

{
  "timestamp": "2026-03-06T13:45:39Z",
  "summary": {
    "total_files": 97,
    "giant_files": 0,
    "giant_functions": 3,
    "test_smells": 64,
    "avg_maintainability": 9.9
  },
  "total_files": 97,
  "average_quality_score": 99.4,
  "giant_files": [],
  "giant_functions": [...],
  "test_smells": [
    {
      "path": "tests/test_example.py",
      "type": "skipped_test",
      "details": "Skipped test detected: @pytest.mark.skip at line 42",
      "line_number": 42
    }
  ],
  "file_reports": [...]
}

Key features:

✅ Unified command: Single entry point for all quality checks
✅ Backward compatible: All existing JSON fields preserved
✅ LoomGraph ready: Enhanced summary for knowledge graph integration
✅ Framework-agnostic: Detects test smells across Jest, pytest, JUnit, etc.
✅ KISS design: 90% code reuse, simple regex patterns for test detection

How It Works

Two-Phase Pipeline (v0.23.0)

Phase 1 (Structural):
  Directory → Scanner → Parser (tree-sitter) → SmartWriter → README_AI.md

Phase 2 (AI Enrichment, automatic when ai_command configured):
  README_AI.md → symbol names + file names → AI → one-line description → blockquote injection

Phase 1: Structural generation (always runs)

Scanner — walks directories, filters by config patterns
Parser — extracts symbols (classes, functions, imports, calls, inheritance) via tree-sitter
SmartWriter — generates tiered documentation with size limits (≤50KB)
Output — README_AI.md optimized for AI consumption, or JSON for tool integration

Phase 2: AI enrichment (auto-enabled when ai_command configured)

Generates a one-line functional description for each non-leaf module
Writes as blockquote: > 会员等级管理、积分兑换、权益卡券
~200-400 tokens per directory, 10-20x cheaper than full AI generation
Parent directories read child descriptions for hierarchical navigation

Before vs After: Code Navigation

Before (structural only):
  └── Application/
      ├── Vip/           — 48 files | 386 symbols     ← AI agent cannot determine purpose
      ├── Pay/           — 23 files | 178 symbols
      └── SmallProgramApi/ — 31 files | 245 symbols

After (structural + AI enrichment):
  └── Application/
      ├── Vip/           — 会员等级管理、积分兑换、权益卡券 | 48 files
      ├── Pay/           — 支付网关（支付宝/微信/退款） | 23 files
      └── SmallProgramApi/ — 小程序端API（登录、头像、商品） | 31 files
                             ↑ AI agent can navigate directly

Three-Repo Architecture (Enterprise Knowledge Graph)

┌────────────────────────────────────────────────────┐
│            Enterprise Intranet Environment          │
├────────────────────────────────────────────────────┤
│                                                    │
│  📦 Code Repository (Git)                          │
│       ↓                                            │
│  🔍 codeindex (Parse Layer)                        │
│       ├── scan --output json → ParseResult         │
│       ├── README_AI.md → architecture docs         │
│       └── tech-debt → comprehensive quality scan   │
│       ↓                                            │
│  🕸️ LoomGraph (Orchestration Layer)                │
│       ├── inject ParseResult                       │
│       ├── generate embeddings                      │
│       └── build knowledge graph                    │
│       ↓                                            │
│  💾 LightRAG (Storage Layer)                       │
│       ├── PostgreSQL (graph data)                  │
│       ├── Vector DB (embeddings)                   │
│       └── Query API (semantic search)              │
│       ↓                                            │
│  💬 AI Agents (Claude Code, Internal Chat)         │
│       └── Natural language code search             │
│                                                    │
└────────────────────────────────────────────────────┘

codeindex role: Bottom layer (data collection & parsing) — the entire system depends on codeindex providing structured ParseResult data.

Documentation

User Guides

Guide	Description
Getting Started	Installation and first scan
Configuration Guide	All config options explained
Advanced Usage	Parallel scanning, custom prompts
Git Hooks Integration	Automated quality checks and doc updates
Claude Code Integration	AI agent setup and MCP skills
JSON Output Integration	Machine-readable output for tools
LoomGraph Integration	Knowledge graph data pipeline

Developer Guides

Guide	Description
CONTRIBUTING.md	Development setup, TDD workflow, code style
CLAUDE.md	Quick reference for Claude Code and contributors
Design Philosophy	Core design principles and architecture
ADR-005	2026-05: navigation-contract disclaimer + size cap, backed by benchmark
Release Automation	5-minute automated release workflow
Multi-Language Support	Adding new language parsers
Language Support Contribution	Template-based test generation for new languages

Evidence & benchmarks

Doc	What it shows
2026-05 README impact benchmark	Measured agent comprehension delta WITH vs WITHOUT `README_AI.md` across 3 heterogeneous projects (15 graded questions). Headline: 19% faster / 28% fewer tokens on average, but speed gains masked quality drops on some detail questions — fix is shipped (see ADR-005).
`bench/`	Reproducible harness (Makefile + python) used to produce the benchmark above; run your own with `cd bench && make setup && make run && make grade && make report`.

Planning

Strategic Roadmap — long-term vision and priorities
Changelog — version history and breaking changes

Contributing

We welcome contributions! See CONTRIBUTING.md for guidelines.

git clone https://github.com/dreamlx/codeindex.git
cd codeindex
pip install -e ".[dev,all]"
make install-hooks
make test

Release Process (Maintainers)

make release VERSION=0.17.0
# GitHub Actions: tests → PyPI publish → GitHub Release

See Release Automation Guide for details.

Roadmap

Current version: v0.25.0

Recent milestones:

v0.23.0 — AI-Enhanced Module Descriptions: two-phase pipeline, auto-AI enrichment, post-commit thin wrapper
v0.22.2 — Auto-update CLAUDE.md on pip upgrade, /codeindex-update-guide skill
v0.22.0 — Unified tech-debt + test smells analysis
v0.21.0 — Swift & Objective-C language support
v0.19.0 — TypeScript/JavaScript support with call extraction

Next:

Framework routes expansion: Express, Laravel, FastAPI, Django (Epic 17)
Go, Rust, C# language support

Moved to LoomGraph:

Code similarity search, refactoring suggestions, team collaboration, IDE integration

See Strategic Roadmap for detailed plans.

License

MIT License — see LICENSE file for details.

Acknowledgments

tree-sitter — fast, incremental parsing
Claude CLI — AI integration inspiration
All contributors and users

Support

Questions: GitHub Discussions
Bugs: GitHub Issues
Feature Requests: GitHub Issues

Made with ❤️ by the codeindex team