UNPKG

codebase-map

Version:

A lightweight TypeScript/JavaScript code indexer that generates comprehensive project maps for LLMs

483 lines (368 loc) 14.1 kB
# codebase-map A lightweight TypeScript/JavaScript code indexer that generates comprehensive project maps optimized for LLMs like Claude. ## Features - **AST-based analysis** - Accurate extraction of functions, classes, and constants - **Dependency resolution** - Tracks imports/exports and builds a complete dependency graph - **Multiple output formats** - Optimized for different project sizes and use cases - **LLM-optimized** - Formats designed to minimize token usage while preserving structure - **Fast incremental updates** - Update individual files without full re-scan - **Works from any directory** - Automatically finds project root - **Flexible file filtering** - Include/exclude patterns using glob syntax for precise control - **Performance optimized** - Built-in pattern caching and analysis tools ## Installation ```bash npm install -g codebase-map ``` Or add to your project: ```bash npm install --save-dev codebase-map ``` ## Quick Start ```bash # Generate index for your project codebase-map scan # Output formatted structure to stdout codebase-map format # Copy to clipboard (macOS) codebase-map format | pbcopy ``` ## Commands ### `scan` Analyzes your codebase and generates a .codebasemap file. ```bash codebase-map scan [options] Options: -r, --root <path> Root directory to scan (default: auto-detect) -o, --output <path> Output file path (default: .codebasemap) -v, --verbose Show detailed progress --include <patterns> Include file patterns (glob syntax) --exclude <patterns> Exclude file patterns (glob syntax) ``` ### `format` Formats the index for LLM consumption (outputs to stdout). ```bash codebase-map format [options] Options: -f, --format <type> Output format: auto|json|dsl|graph|markdown|tree -s, --stats Show statistics to stderr --include <patterns...> Include file patterns (glob syntax) --exclude <patterns...> Exclude file patterns (glob syntax) ``` ### `update` Updates the index for a specific file. ```bash codebase-map update <file> [options] Options: -r, --root <path> Root directory ``` ### `list` Lists files in the index with various filters. ```bash codebase-map list [options] Options: -d, --deps Show files with most dependencies -e, --entries Show entry point files -l, --leaves Show leaf files (no dependencies) ``` ## Pattern Support Control which files are analyzed using powerful glob patterns: ### Basic Usage ```bash # Include only specific directories codebase-map scan --include "src/**" --include "lib/**" # Exclude test files and documentation codebase-map scan --exclude "**/*.test.ts" --exclude "**/*.spec.ts" --exclude "docs/**" # Combine include and exclude patterns codebase-map scan --include "src/**" --exclude "**/*.test.ts" ``` ### Advanced Examples ```bash # Focus on specific packages in a monorepo codebase-map scan --include "packages/*/src/**" --exclude "**/*.test.ts" # Analyze only TypeScript files, exclude build outputs codebase-map scan --include "**/*.ts" --include "**/*.tsx" --exclude "dist/**" --exclude "build/**" # Complex filtering with multiple criteria codebase-map scan \ --include "src/**" \ --include "lib/**" \ --exclude "**/*.test.ts" \ --exclude "**/*.spec.ts" \ --exclude "**/fixtures/**" \ --exclude "**/mocks/**" ``` ### Pattern Syntax | Pattern | Description | Example | |---------|-------------|---------| | `**` | Match any number of directories | `src/**` matches all files in src and subdirectories | | `*` | Match any characters except `/` | `*.ts` matches all TypeScript files | | `?` | Match single character | `test?.ts` matches `test1.ts`, `testa.ts` | | `[abc]` | Match any character in brackets | `test[123].ts` matches `test1.ts`, `test2.ts`, `test3.ts` | | `{a,b}` | Match any of the alternatives | `**/*.{ts,js}` matches all TypeScript and JavaScript files | ### Common Use Cases #### Monorepo Analysis ```bash # Analyze specific packages codebase-map scan --include "packages/core/**" --include "packages/utils/**" # Exclude all test files across packages codebase-map scan --include "packages/*/src/**" --exclude "**/*.{test,spec}.{ts,js}" ``` #### Test File Filtering ```bash # Exclude all test-related files codebase-map scan --exclude "**/*.{test,spec}.{ts,tsx,js,jsx}" --exclude "**/tests/**" --exclude "**/__tests__/**" # Include only test files for test coverage analysis codebase-map scan --include "**/*.{test,spec}.{ts,tsx,js,jsx}" ``` #### Library Development ```bash # Focus on source code, exclude examples and documentation codebase-map scan --include "src/**" --exclude "examples/**" --exclude "docs/**" # Include only public API files codebase-map scan --include "src/public/**" --include "src/index.ts" ``` ### Pattern Performance Tips - **Use specific patterns**: `src/**/*.ts` is faster than `**/*.ts` for large codebases - **Order matters**: Place more restrictive include patterns first - **Avoid overly broad excludes**: Specific exclusions perform better than `**/*` - **Cache benefits**: Repeated pattern usage is automatically optimized ### Pattern Analysis Use `--verbose` mode to see pattern effectiveness: ```bash codebase-map scan --include "src/**" --exclude "**/*.test.ts" --verbose ``` Output includes: - Pattern match statistics - Performance metrics - Warnings for ineffective patterns - Suggestions for optimization ## Output Formats The tool automatically selects the best format based on project size, or you can specify one: | Format | Description | Best For | Token Reduction | |--------|-------------|----------|-----------------| | `tree` | ASCII art directory structure | Structure visualization (any size) | ~97% | | `dsl` | Domain-specific language | Most projects (≤5000 files) | ~90% | | `graph` | Dependency graph with signatures | Very large projects (>5000 files) | ~92% | | `markdown` | Human-readable markdown | Documentation | ~93% | | `json` | Compact JSON | Baseline | 0% | ### Format Selection Guide Choose the right format for your use case: #### Tree Format (`tree`) **Best for:** Visualizing project structure, understanding file organization - **Token reduction:** 97% (most efficient for structure) - **Readability:** Excellent visual hierarchy - **Use cases:** Understanding project layout, presenting structure to stakeholders, exploring unfamiliar codebases - **Selection:** Manual only - complements other formats by showing structure rather than code content ```bash # Ideal for understanding project structure, exploring new codebases codebase-map format --format tree ``` #### DSL Format (`dsl`) **Best for:** Most development workflows, AI context - **Token reduction:** 90% (excellent balance) - **Readability:** High - shows functions, classes, dependencies clearly - **Use cases:** Code analysis, AI assistance, dependency tracking - **Automatic selection:** Projects with ≤5000 files (default choice) ```bash # Perfect for typical applications and libraries codebase-map format --format dsl ``` #### Graph Format (`graph`) **Best for:** Large codebases, dependency analysis - **Token reduction:** 92% (maximum compression for content) - **Readability:** Good - focuses on relationships - **Use cases:** Large projects, dependency debugging, architecture analysis - **Automatic selection:** Projects with >5000 files ```bash # Essential for large monorepos and enterprise applications codebase-map format --format graph ``` #### Markdown Format (`markdown`) **Best for:** Documentation, reports, human reading - **Token reduction:** 93% (great for docs) - **Readability:** Excellent - formatted for humans - **Use cases:** Project documentation, README generation, reports - **Manual selection only** ```bash # Great for generating project documentation codebase-map format --format markdown > PROJECT_STRUCTURE.md ``` ### Project Size Examples ```bash # Understanding any project's structure codebase-map format --format tree # └── Shows clear visual hierarchy # Typical web application (100-500 files) codebase-map format --format dsl # └── Shows functions, dependencies, classes compactly # Large enterprise application (1000-5000 files) codebase-map format --format dsl --stats # └── Monitor token usage with --stats # Massive monorepo (>5000 files) codebase-map format --format graph # └── Maximum compression for context limits ``` ### Token Budget Planning **Recommended token allocations for AI context:** - **Small projects (<100 files):** DSL format, ~2,100 tokens - **Medium projects (100-1000 files):** DSL format, ~2,100-21,000 tokens - **Large projects (1000-2000 files):** DSL format, ~21,000-42,000 tokens ⚠️ - **Very large projects (2000-5000 files):** DSL format, approaching context limits - **Enterprise projects (>5000 files):** Graph format, auto-switches for maximum efficiency - **Structure visualization (any size):** Tree format, manual choice for exploring layout **Context window usage guidelines:** - Keep project structure under 25% of available context - Use `--stats` flag to monitor token usage - Consider using pattern filtering for large projects ## Filtering on Format The `format` command supports filtering the already-generated index without needing to re-scan. This enables powerful workflows: ### Basic Filtering ```bash # Generate index once codebase-map scan # Format different views without re-scanning codebase-map format --include "src/**" --exclude "**/*.test.ts" codebase-map format --include "docs/**" --format markdown codebase-map format --include "packages/core/**" --format dsl ``` ### Workflow Benefits **Scan once, format many times:** - Generate comprehensive index: `codebase-map scan` - Create focused views: `codebase-map format --include "src/components/**"` - Exclude test files: `codebase-map format --exclude "**/*.{test,spec}.ts"` - Focus on specific packages: `codebase-map format --include "packages/utils/**"` **Performance advantages:** - No file system scanning on format (instant filtering) - Apply different filters to same index data - Combine with any output format (`--format dsl`, `--format tree`, etc.) ### Filtering Examples #### Focus on Source Code ```bash # Show only source files, exclude tests and build outputs codebase-map format --include "src/**" --exclude "**/*.test.ts" --exclude "dist/**" ``` #### Monorepo Package Analysis ```bash # Focus on specific packages codebase-map format --include "packages/core/**" --include "packages/utils/**" # Analyze one package in detail codebase-map format --include "packages/ui/src/**" --format dsl ``` #### Documentation Focus ```bash # Extract documentation structure codebase-map format --include "docs/**" --include "*.md" --format tree # Get markdown files in readable format codebase-map format --include "**/*.md" --format markdown ``` #### Component Analysis ```bash # Focus on React components codebase-map format --include "**/*.{tsx,jsx}" --exclude "**/*.test.*" # Analyze utility functions codebase-map format --include "**/utils/**" --include "**/helpers/**" ``` ### Filtering Statistics The format command shows filtering impact to stderr (doesn't affect stdout): ```bash codebase-map format --include "src/**" --exclude "**/*.test.ts" --stats # Output to stderr: # --- Filtering Applied --- # Files: 456 of 1,234 # Dependencies: 980 of 2,100 # Include patterns: src/** # Exclude patterns: **/*.test.ts # # --- Statistics (dsl format) --- # Size: 45.2 KB (89% reduction) # Tokens: ~12,450 (27 per file) # Files: 456 ``` ### Advanced Filtering Patterns ```bash # Complex monorepo filtering codebase-map format \ --include "packages/*/src/**" \ --include "shared/**" \ --exclude "**/*.{test,spec,mock}.{ts,tsx,js,jsx}" \ --exclude "**/fixtures/**" \ --exclude "**/__tests__/**" # TypeScript-only analysis codebase-map format \ --include "**/*.{ts,tsx}" \ --exclude "**/*.d.ts" \ --exclude "node_modules/**" # API-focused view codebase-map format \ --include "src/api/**" \ --include "src/routes/**" \ --include "src/middleware/**" \ --format graph ``` ## Integration with Claude ### Using with Claude Code Hooks Configure hooks in `.claude/settings.json` to automatically include project structure when starting a session: ```json { "hooks": { "SessionStart": [ { "hooks": [ {"type": "command", "command": "codebase-map format"} ] } ] } } ``` ### Direct Usage ```bash # Generate and copy structure to clipboard codebase-map format | pbcopy # Then paste into Claude conversation ``` ## Example Output ### Tree Format (Structure Visualization) ``` project/ ├── src/ │ ├── components/ │ │ ├── Button.tsx │ │ └── Input.tsx │ ├── utils/ │ │ └── helpers.ts │ ├── index.ts │ └── types.ts ├── tests/ │ └── setup.ts └── package.json ``` ### DSL Format (Most Projects) ``` src/core/dependency-resolver.ts > types/index.ts cl DependencyResolver(9m,2p) src/core/index-formatter.ts > types/index.ts fn toMinifiedJSON(index:ProjectIndex):string fn toDSL(index:ProjectIndex):string fn toGraph(index:ProjectIndex):string fn formatAuto(index:ProjectIndex):{ format: FormatType; content: string } src/utils/find-project-root.ts > fn findProjectRoot(startDir:string):string | null fn findIndexFile(startDir:string):string | null ``` ## Performance - Processes ~400 files/second - Generates ~3 tokens/file in tree format (97% reduction) - Generates ~29 tokens/file in DSL format (90% reduction) - Generates ~24 tokens/file in graph format (92% reduction) - Generates ~23 tokens/file in markdown format (93% reduction) - 90-97% token reduction vs compact JSON ## Requirements - Node.js ≥ 18.0.0 - TypeScript/JavaScript project ## License MIT ## Contributing Contributions are welcome! Please feel free to submit a Pull Request. ## Links - [GitHub Repository](https://github.com/carlrannaberg/codebase-map) - [Issue Tracker](https://github.com/carlrannaberg/codebase-map/issues)