codeindex

agent
Guvenlik Denetimi
Uyari
Health Uyari
  • License — License: MIT
  • Description — Repository has a description
  • Active repo — Last push 0 days ago
  • Low visibility — Only 5 GitHub stars
Code Gecti
  • Code scan — Scanned 12 files during light audit, no dangerous patterns found
Permissions Gecti
  • Permissions — No dangerous permissions requested

Bu listing icin henuz AI raporu yok.

SUMMARY

AI-readable codebase navigation indexes (README_AI.md) via a deterministic CLI — coding agents navigate faster, with fewer tokens. A/B-benchmarked.

README.md

codeindex

🇬🇧 English | 🇨🇳 中文

PyPI version
Python 3.10+
License: MIT
Tests

Make AI coding agents navigate your codebase by reading, not grepping.

codeindex is an open-source CLI that turns any codebase into AI-readable navigation indexes (README_AI.md) via a two-phase pipeline — structural indexing (tree-sitter AST) + optional one-line AI module descriptions. Agents browse the README_AI.md hierarchy, see what each module does, and jump straight to the right file — across Python, PHP, Java, TypeScript, JavaScript, Swift, and Objective-C. The measured payoff is efficiency, not magic (benchmark below).

Runs fully offline. Structural indexing needs no AI at all; AI descriptions use your local agent CLI (e.g. claude -p), so no code leaves your network — fine for air-gapped intranets. MIT-licensed, free, and meant to stay that way: codeindex is the open-source navigate layer; the reason/retrieval layer lives in LoomGraph.


Does this actually help an agent? We measured it.

Most "AI code understanding" tools assert value. We A/B-tested ours — and published the unflattering parts.

Across 15 graded navigation questions on 3 heterogeneous real projects, a coding agent with README_AI.md vs without:

  • −28% tokens, −19% wall-time on average — agents reach the right file faster and cheaper.
  • Answer quality is a wash. It does not make answers more correct — the win is efficiency, not capability. (An undisciplined index even hurt a few precise-mechanism questions; fixed in ADR-005.)
  • Smallest win on the largest codebases. On a 250-directory legacy system the token win nearly vanished — a flat index points you to files but can't synthesize cross-module semantics. codeindex is the navigate layer, not the understand-everything layer (pair it with source-reading / Serena for precise mechanism).

Full data incl. the failure cases: 2026-05 benchmark. Reproduce on your own repos: bench/ (make setup && make run && make grade).

Why publish the parts that don't flatter the tool: a navigation index that quietly degrades answer quality is worse than none. Knowing exactly where it helps — and where to drop to source — is the point.


For LoomGraph Developers: FOR_LOOMGRAPH.md (quick start) | docs/guides/loomgraph-integration.md (full guide)


Features

Core: Code Understanding for AI Agents

  • Two-phase documentation pipeline (v0.23.0) — Phase 1: structural README_AI.md via SmartWriter; Phase 2: AI generates one-line functional descriptions per module. AI agents can browse README_AI.md hierarchy and find the right module without grep.
  • Smart indexing — Tiered documentation (overview → navigation → detailed) optimized for AI agents, ≤10KB per file (navigation index, not a tech doc — see ADR-005)
  • Auto-AI enrichment — When ai_command is configured, scan-all automatically enables AI module descriptions. Use --no-ai to opt out
  • Auto-update hooks — Optional post-commit hook (codeindex hooks install) regenerates README_AI.md for changed directories. Thin wrapper pattern: pipx upgrade ai-codeindex auto-updates hook logic

Parsing & Analysis

  • Multi-language AST parsing — Python, PHP, Java, TypeScript, JavaScript, Swift, Objective-C via tree-sitter; more languages plug in via the extractor API (src/codeindex/extractors/, community-contributed)
  • Call relationship extraction — Function/method call graphs across Python, Java, PHP, TypeScript, JavaScript
  • Inheritance extraction — Class hierarchy and interface relationships
  • Framework route extraction — ThinkPHP and Spring Boot route tables (more planned)
  • Technical debt analysis — Detect large files, god classes, symbol overload, test smells
  • Single file parsecodeindex parse <file> with JSON output for tool integration
  • Structured JSON output--output json for CI/CD, knowledge graphs, and downstream tools

Developer Experience

  • Adaptive symbol extraction — Dynamic 5–150 symbols per file based on size
  • CLAUDE.md injectioncodeindex init injects a codeindex section into your project's CLAUDE.md (never ~/.claude)
  • Claude Code plugincodeindex:arch / :index / :hooks / :update-guide skills via dreamlx/codeindex-claude
  • Template-based test generation — YAML + Jinja2 for rapid language support (88–91% time savings)
  • Parallel scanning — Concurrent directory processing with configurable workers

Use Cases

🏢 Enterprise Intranet (Core Scenario)

Without external tools: When Serena MCP or other cloud-based code intelligence tools are unavailable due to network isolation or security policies, codeindex becomes the primary code understanding tool.

# Enterprise developer workflow
git clone <internal-repo>
codeindex init                       # Configure project
codeindex scan-all                   # Structural + AI descriptions (auto)
# AI agent reads README_AI.md → sees module purposes → navigates directly
# No grep needed for code discovery
codeindex tech-debt src/ --output review.md  # Code quality analysis

Why enterprises choose codeindex:

  • Semantic navigation — AI agents understand module purposes from README_AI.md hierarchy
  • Intranet compatible — no external dependencies, fully offline
  • Self-contained — no upstream MCP servers required
  • Version stable — enterprise-controlled release cycle
  • Data sovereignty — code never leaves internal network

🕸️ Knowledge Graph Integration (LoomGraph)

For enterprise teams: codeindex serves as the core data source for LoomGraph knowledge graphs, enabling semantic code search across the organization.

# Data pipeline
codeindex scan --output json > parse_results.json
loomgraph inject parse_results.json  # Build knowledge graph
# Team can now search code using natural language

Three-repo architecture:

codeindex (Parse)  →  LoomGraph (Orchestrate)  →  LightRAG (Store)
   ↓ ParseResult         ↓ Embeddings              ↓ Semantic Search
   AST extraction        Knowledge Graph           Vector + Graph DB

Without codeindex, LoomGraph cannot function. See LoomGraph Integration Guide.


👤 Personal Developers (Complementary)

With Serena MCP: For individual developers using Claude Code + Serena MCP, codeindex provides complementary value:

  • codeindex (build-time): Semantic architecture map (README_AI.md with module descriptions) + quality analysis
  • Serena (real-time): Precise symbol navigation (find_symbol, find_referencing_symbols)
# Personal developer workflow
codeindex init                    # Setup CLAUDE.md integration
codeindex scan-all                # Structural + AI descriptions (auto)
codeindex hooks install post-commit  # Auto-update on commit
# Claude Code reads README_AI.md → understands module purpose → uses Serena for details

Relationship: codeindex provides the "map with labels," Serena provides the "GPS navigation."


Installation

codeindex is a CLI tool — install it with pipx (isolated, no dependency conflicts):

pipx install ai-codeindex

Claude Code users — also install the companion plugin for skills
(codeindex:arch / :index / :hooks / :update-guide):

/plugin marketplace add dreamlx/codeindex-claude
/plugin install codeindex@codeindex-claude

The plugin is optional and only for Claude Code. The CLI works standalone
in any editor / terminal. See dreamlx/codeindex-claude.

Language parsers

codeindex uses lazy loading — language parsers are imported only when needed.
pipx install ai-codeindex pulls all of them by default. To inject extras into the
pipx environment later, or to install a subset:

pipx inject ai-codeindex tree-sitter-python tree-sitter-typescript   # add to pipx env
# or pin a subset at install time:
pipx install "ai-codeindex[python]"      # python only
pipx install "ai-codeindex[ios]"         # Swift + Objective-C

Alternatives to pipx

pip install --user ai-codeindex          # if you don't have pipx

🇨🇳 China users: if your default mirror (e.g. aliyun) hasn't synced the
latest release yet, install straight from upstream PyPI:

pipx install --index-url https://pypi.org/simple/ ai-codeindex

From Source

git clone https://github.com/dreamlx/codeindex.git
cd codeindex
pip install -e ".[all]"

Quick Start

1. Initialize Your Project

cd /your/project
codeindex init

This creates:

  • .codeindex.yaml — scan configuration (languages, include/exclude patterns)
  • CLAUDE.md — injects codeindex instructions so Claude Code uses README_AI.md automatically
  • CODEINDEX.md — project-level documentation reference

2. Scan Your Codebase

# Scan all directories
# When ai_command is configured → auto Phase 1 (structural) + Phase 2 (AI descriptions)
# Without ai_command → Phase 1 only (structural)
codeindex scan-all

# Structural only (skip AI enrichment)
codeindex scan-all --no-ai

# Scan a single directory
codeindex scan ./src/auth

# Full AI-generated README for a single directory
codeindex scan ./src/auth --ai

# Preview AI prompt without executing
codeindex scan ./src/auth --ai --dry-run

3. Check Status

codeindex status
Indexing Status
───────────────────────────────
✅ src/auth/
✅ src/utils/
⚠️  src/api/ (no README_AI.md)
Indexed: 2/3 (67%)

4. Generate Indexes

# Global symbol index (PROJECT_SYMBOLS.md)
codeindex symbols

# Module overview (PROJECT_INDEX.md)
codeindex index

# Git change impact analysis
codeindex affected --since HEAD~5

More Commands

Command Description Guide
codeindex scan --output json JSON output for tools JSON Output Guide
codeindex parse <file> Parse single file to JSON LoomGraph Integration
codeindex tech-debt ./src Code quality analysis (debt + test smells) Enhanced in v0.22.0
codeindex debt-scan ./src Alias for tech-debt Backward compatibility
codeindex hooks install Git hooks for auto-update Git Hooks Guide
codeindex doctor Health/sync check (CLI, parsers, CLAUDE.md, plugin) Read-only diagnostic
codeindex config explain <param> Parameter help Configuration Guide

Claude Code Integration

The codeindex plugin gives Claude Code four skills backed by the CLI:

/plugin marketplace add dreamlx/codeindex-claude
/plugin install codeindex@codeindex-claude
Skill What it does
codeindex:arch Answer architecture / "where is X" questions from README_AI.md
codeindex:index Walk you through codeindex initscan-all
codeindex:hooks Set up the auto-update post-commit hook
codeindex:update-guide Refresh the codeindex section in your project's CLAUDE.md

codeindex init also injects a codeindex section into your project's CLAUDE.md
so Claude Code reads README_AI.md files first. (As of v0.25.0, init only
touches project-scoped files — see ADR-006.)

For enterprise users without Serena: README_AI.md and PROJECT_SYMBOLS.md become your primary code navigation tools.

The plugin skills don't replace the codeindex claude-md / codeindex hooks
CLI commands — they orchestrate them. The commands stay first-class for
CLI-only users (Cursor, scripts); the skills add a guided Claude Code UX on top.


Language Support

Language Status Since Key Features
Python ✅ Supported v0.1.0 Classes, functions, methods, imports, docstrings, inheritance, calls
PHP ✅ Supported v0.5.0 Classes (extends/implements), methods, properties, PHPDoc, inheritance, calls
Java ✅ Supported v0.7.0 Classes, interfaces, enums, records, annotations, Spring routes, Lombok, calls
TypeScript/JS ✅ Supported v0.19.0 Classes, interfaces, enums, type aliases, arrow functions, JSX/TSX, imports/exports, calls
Swift ✅ Supported v0.21.0 Classes, structs, enums, protocols, extensions, methods, properties
Objective-C ✅ Supported v0.21.0 Classes, protocols, categories, properties, methods (instance/class)
Go 📋 Planned Packages, interfaces, struct methods
Rust 📋 Planned Structs, traits, modules
C# 📋 Planned Classes, interfaces, .NET projects

Want to add a language? The template-based test system lets you contribute by writing YAML specs — no Python knowledge required. See CONTRIBUTING.md for details.

Framework Route Extraction

Framework Language Status
ThinkPHP PHP ✅ Stable (v0.5.0)
Spring Boot Java ✅ Stable (v0.8.0)
Laravel PHP 📋 Planned
FastAPI Python 📋 Planned
Django Python 📋 Planned
Express.js JS/TS 📋 Planned

Code Quality Analysis

tech-debt: Comprehensive Quality Analysis (Enhanced in v0.22.0)

The tech-debt command provides comprehensive code quality analysis, now including test smells detection:

# JSON output (for LoomGraph integration)
codeindex tech-debt ./src --format json > debt-data.json

# Markdown report (for documentation)
codeindex tech-debt ./src --format markdown > report.md

# Console output (for quick checks)
codeindex tech-debt ./src --format console

# Alias: debt-scan also works (backward compatibility)
codeindex debt-scan ./src --format json

What it detects:

  • 🔴 Super large files (>5000 lines), Large files (>2000 lines)
  • 🔴 God Classes (>50 methods)
  • 🔴 Long methods (>80/150 lines)
  • 🟡 High coupling (>8 internal imports)
  • 🟡 Symbol overload (>100 symbols, high noise ratio)
  • 🧪 Test smells (skipped tests, giant test files) — New in v0.22.0
  • 📊 Quality scoring (0-100 scale per file)

Enhanced JSON output (v0.22.0):

{
  "timestamp": "2026-03-06T13:45:39Z",
  "summary": {
    "total_files": 97,
    "giant_files": 0,
    "giant_functions": 3,
    "test_smells": 64,
    "avg_maintainability": 9.9
  },
  "total_files": 97,
  "average_quality_score": 99.4,
  "giant_files": [],
  "giant_functions": [...],
  "test_smells": [
    {
      "path": "tests/test_example.py",
      "type": "skipped_test",
      "details": "Skipped test detected: @pytest.mark.skip at line 42",
      "line_number": 42
    }
  ],
  "file_reports": [...]
}

Key features:

  • Unified command: Single entry point for all quality checks
  • Backward compatible: All existing JSON fields preserved
  • LoomGraph ready: Enhanced summary for knowledge graph integration
  • Framework-agnostic: Detects test smells across Jest, pytest, JUnit, etc.
  • KISS design: 90% code reuse, simple regex patterns for test detection

How It Works

Two-Phase Pipeline (v0.23.0)

Phase 1 (Structural):
  Directory → Scanner → Parser (tree-sitter) → SmartWriter → README_AI.md

Phase 2 (AI Enrichment, automatic when ai_command configured):
  README_AI.md → symbol names + file names → AI → one-line description → blockquote injection

Phase 1: Structural generation (always runs)

  1. Scanner — walks directories, filters by config patterns
  2. Parser — extracts symbols (classes, functions, imports, calls, inheritance) via tree-sitter
  3. SmartWriter — generates tiered documentation with size limits (≤50KB)
  4. OutputREADME_AI.md optimized for AI consumption, or JSON for tool integration

Phase 2: AI enrichment (auto-enabled when ai_command configured)

  • Generates a one-line functional description for each non-leaf module
  • Writes as blockquote: > 会员等级管理、积分兑换、权益卡券
  • ~200-400 tokens per directory, 10-20x cheaper than full AI generation
  • Parent directories read child descriptions for hierarchical navigation

Before vs After: Code Navigation

Before (structural only):
  └── Application/
      ├── Vip/           — 48 files | 386 symbols     ← AI agent cannot determine purpose
      ├── Pay/           — 23 files | 178 symbols
      └── SmallProgramApi/ — 31 files | 245 symbols

After (structural + AI enrichment):
  └── Application/
      ├── Vip/           — 会员等级管理、积分兑换、权益卡券 | 48 files
      ├── Pay/           — 支付网关(支付宝/微信/退款) | 23 files
      └── SmallProgramApi/ — 小程序端API(登录、头像、商品) | 31 files
                             ↑ AI agent can navigate directly

Three-Repo Architecture (Enterprise Knowledge Graph)

┌────────────────────────────────────────────────────┐
│            Enterprise Intranet Environment          │
├────────────────────────────────────────────────────┤
│                                                    │
│  📦 Code Repository (Git)                          │
│       ↓                                            │
│  🔍 codeindex (Parse Layer)                        │
│       ├── scan --output json → ParseResult         │
│       ├── README_AI.md → architecture docs         │
│       └── tech-debt → comprehensive quality scan   │
│       ↓                                            │
│  🕸️ LoomGraph (Orchestration Layer)                │
│       ├── inject ParseResult                       │
│       ├── generate embeddings                      │
│       └── build knowledge graph                    │
│       ↓                                            │
│  💾 LightRAG (Storage Layer)                       │
│       ├── PostgreSQL (graph data)                  │
│       ├── Vector DB (embeddings)                   │
│       └── Query API (semantic search)              │
│       ↓                                            │
│  💬 AI Agents (Claude Code, Internal Chat)         │
│       └── Natural language code search             │
│                                                    │
└────────────────────────────────────────────────────┘

codeindex role: Bottom layer (data collection & parsing) — the entire system depends on codeindex providing structured ParseResult data.


Documentation

User Guides

Guide Description
Getting Started Installation and first scan
Configuration Guide All config options explained
Advanced Usage Parallel scanning, custom prompts
Git Hooks Integration Automated quality checks and doc updates
Claude Code Integration AI agent setup and MCP skills
JSON Output Integration Machine-readable output for tools
LoomGraph Integration Knowledge graph data pipeline

Developer Guides

Guide Description
CONTRIBUTING.md Development setup, TDD workflow, code style
CLAUDE.md Quick reference for Claude Code and contributors
Design Philosophy Core design principles and architecture
ADR-005 2026-05: navigation-contract disclaimer + size cap, backed by benchmark
Release Automation 5-minute automated release workflow
Multi-Language Support Adding new language parsers
Language Support Contribution Template-based test generation for new languages

Evidence & benchmarks

Doc What it shows
2026-05 README impact benchmark Measured agent comprehension delta WITH vs WITHOUT README_AI.md across 3 heterogeneous projects (15 graded questions). Headline: 19% faster / 28% fewer tokens on average, but speed gains masked quality drops on some detail questions — fix is shipped (see ADR-005).
bench/ Reproducible harness (Makefile + python) used to produce the benchmark above; run your own with cd bench && make setup && make run && make grade && make report.

Planning


Contributing

We welcome contributions! See CONTRIBUTING.md for guidelines.

git clone https://github.com/dreamlx/codeindex.git
cd codeindex
pip install -e ".[dev,all]"
make install-hooks
make test

Release Process (Maintainers)

make release VERSION=0.17.0
# GitHub Actions: tests → PyPI publish → GitHub Release

See Release Automation Guide for details.


Roadmap

Current version: v0.25.0

Recent milestones:

  • v0.23.0 — AI-Enhanced Module Descriptions: two-phase pipeline, auto-AI enrichment, post-commit thin wrapper
  • v0.22.2 — Auto-update CLAUDE.md on pip upgrade, /codeindex-update-guide skill
  • v0.22.0 — Unified tech-debt + test smells analysis
  • v0.21.0 — Swift & Objective-C language support
  • v0.19.0 — TypeScript/JavaScript support with call extraction

Next:

  • Framework routes expansion: Express, Laravel, FastAPI, Django (Epic 17)
  • Go, Rust, C# language support

Moved to LoomGraph:

  • Code similarity search, refactoring suggestions, team collaboration, IDE integration

See Strategic Roadmap for detailed plans.


License

MIT License — see LICENSE file for details.

Acknowledgments

  • tree-sitter — fast, incremental parsing
  • Claude CLI — AI integration inspiration
  • All contributors and users

Support


Made with ❤️ by the codeindex team

Yorumlar (0)

Sonuc bulunamadi