ai-index
Version:
AI-powered local code indexing and search system for any codebase
324 lines (255 loc) • 9.6 kB
Markdown
# AI Agent Guide - Smart Code Index
## Overview
This is an intelligent code indexing and search system designed specifically to help AI agents navigate and understand codebases with deep semantic understanding. Unlike traditional search tools, it uses AST (Abstract Syntax Tree) analysis, symbol tracking, and neural embeddings to provide context-aware code discovery.
## Key Capabilities
### 1. **Deep Code Understanding**
- **AST-Based Analysis**: Parses JavaScript/TypeScript code into structured representations
- **Symbol Extraction**: Identifies functions, classes, variables with full metadata (parameters, complexity, async/sync, etc.)
- **Relationship Tracking**: Maps imports, exports, inheritance, and function calls
- **Semantic Chunking**: Splits code by logical boundaries (functions, classes) rather than arbitrary line counts
### 2. **Intelligent Search**
- **Hybrid Search**: Combines vector similarity with lexical matching for optimal results
- **Context-Aware Embeddings**: Embeddings include file context, symbol relationships, and usage patterns
- **Intent Detection**: Automatically identifies if you're looking for definitions, implementations, usage examples, or relationships
### 3. **Continuous Monitoring**
- **File Watching**: Automatically detects changes and reindexes modified files
- **Debounced Processing**: Waits for file changes to stabilize before reindexing (default: 2-3 seconds)
- **Incremental Updates**: Only reprocesses changed files, maintaining index performance
## Installation
```bash
npm install ai-index
```
## Usage Patterns
### Basic Indexing
```bash
# Index current directory
ai-index smart-index
# Index with continuous monitoring
ai-index smart-index --watch
# Index from specific entry points (follows imports)
ai-index smart-index -e src/index.js -e src/api.js
```
### Searching Code
```bash
# Natural language search
ai-index smart-query --q "authentication middleware"
# Symbol-specific search
ai-index smart-query --q "User" --symbol --exact
# Get more results
ai-index smart-query --q "database connection" --k 20
```
## Output Format for AI Agents
The query system returns structured JSON optimized for AI consumption:
```json
{
"query": "authentication middleware",
"total_results": 15,
"intent": {
"looking_for_definition": false,
"looking_for_implementation": true,
"looking_for_usage": false,
"looking_for_relationship": false,
"looking_for_flow": false
},
"results": [
{
"path": "src/middleware/auth.js",
"relevance": 0.92,
"matches": [
{
"lines": [15, 45],
"type": "symbol",
"symbol": "function:authenticateUser",
"params": ["req", "res", "next"],
"async": true,
"used_by": ["src/routes/api.js", "src/routes/admin.js"]
},
{
"lines": [50, 75],
"type": "symbol",
"symbol": "function:validateToken",
"params": ["token"],
"async": false
}
]
}
],
"navigation": {
"entry_files": ["src/index.js", "src/app.js"],
"key_symbols": [
{
"name": "authenticateUser",
"type": "function",
"primary_file": "src/middleware/auth.js"
}
],
"suggested_order": [
"src/types/auth.d.ts",
"src/middleware/auth.js",
"src/routes/api.js"
]
}
}
```
## Understanding the Output
### Result Structure
Each result contains:
- **path**: File location
- **relevance**: Score indicating match quality (higher is better)
- **matches**: Array of specific code locations within the file
### Match Types
- **symbol**: A function, class, or variable definition
- **imports**: Import statements block
- **exports**: Export statements block
### Symbol Information
For symbol matches, you get:
- **name**: Symbol identifier
- **type**: function, class, variable, arrow_function
- **params**: Function parameters (if applicable)
- **async**: Whether the function is async
- **methods**: Class methods (for classes)
- **used_by**: Files that import/use this symbol
- **complexity**: Cyclomatic complexity score
### Navigation Hints
The system provides navigation suggestions:
- **entry_files**: Main entry points to start exploring
- **key_symbols**: Most relevant symbols for your query
- **suggested_order**: Recommended file exploration sequence
## Advanced Features
### 1. Symbol Relationships
The index tracks:
- **Import chains**: What each file imports and from where
- **Export mappings**: What each file exports and who consumes it
- **Call graphs**: Function call relationships
- **Inheritance**: Class extension relationships
### 2. Code Complexity Analysis
Each function/method is analyzed for:
- Cyclomatic complexity
- Parameter count
- Async/sync nature
- Generator functions
### 3. Semantic Understanding
The system understands:
- **File areas**: backend, frontend, utils, types, tests
- **Code patterns**: Middleware, components, API routes, utilities
- **Architecture**: Entry points, module boundaries, service layers
## Best Practices for AI Agents
### 1. Start with Entry Points
When exploring a new codebase:
```bash
ai-index smart-query --q "index main app entry point"
```
### 2. Find Type Definitions First
Understanding types helps comprehend the codebase:
```bash
ai-index smart-query --q "User Task type interface" --symbol
```
### 3. Follow Import Chains
To understand dependencies:
```bash
ai-index smart-query --q "imports from auth middleware"
```
### 4. Use Intent-Based Queries
Be specific about what you're looking for:
- "How is X implemented?" - for implementation details
- "Where is X used?" - for usage examples
- "What is the X type?" - for type definitions
- "How does X relate to Y?" - for relationships
### 5. Leverage Navigation Hints
The `suggested_order` field provides an intelligent path through the codebase:
1. Type definitions first (understand the data)
2. Entry points second (understand the flow)
3. Implementation details last
## Query Examples
### Finding Features
```
"user authentication flow"
"payment processing implementation"
"real-time sync with WebRTC"
```
### Finding Definitions
```
"User class definition"
"AuthContext type"
"database schema models"
```
### Finding Usage
```
"where is validateToken used"
"components using UserContext"
"API endpoints calling database"
```
### Finding Relationships
```
"what imports the auth module"
"exports from utils folder"
"inheritance hierarchy for BaseController"
```
## Performance Considerations
1. **Initial Indexing**: First run processes all files (30-60 seconds for medium codebases)
2. **Incremental Updates**: Subsequent runs only process changes (2-5 seconds)
3. **Query Speed**: Vector search returns results in <100ms
4. **Memory Usage**: ~100MB for embedding model + vector store
## Troubleshooting
### Index Not Found
```bash
# Rebuild index from scratch
ai-index smart-index --force
```
### Incomplete Results
```bash
# Increase result count
ai-index smart-query --q "your query" --k 30
```
### Monitoring Not Working
```bash
# Check file watcher status
ai-index smart-index --watch --verbose
```
## Integration Tips
### For AI Assistants
1. **Parse JSON output** directly - it's designed for programmatic consumption
2. **Use intent detection** to understand what the user is looking for
3. **Follow suggested navigation** for efficient code exploration
4. **Combine multiple queries** to build complete understanding
5. **Track symbols** across files using the relationship data
### For Development Tools
1. **Monitor changes** with `--watch` flag for real-time updates
2. **Use entry points** for targeted indexing of specific modules
3. **Export code graphs** for visualization tools
4. **Leverage complexity scores** for code quality metrics
## Architecture Overview
```
┌─────────────────┐
│ File Monitor │ ← Watches for changes
└────────┬────────┘
│
┌────────▼────────┐
│ Code Analyzer │ ← AST parsing & symbol extraction
└────────┬────────┘
│
┌────────▼────────┐
│ Smart Chunking │ ← Semantic code splitting
└────────┬────────┘
│
┌────────▼────────┐
│ Embedder │ ← Neural embeddings with context
└────────┬────────┘
│
┌────────▼────────┐
│ Vector Store │ ← Indexed for fast retrieval
└────────┬────────┘
│
┌────────▼────────┐
│ Smart Query │ ← Hybrid search with AI optimization
└─────────────────┘
```
## Conclusion
This smart indexing system transforms code search from keyword matching to semantic understanding. It provides AI agents with the context, relationships, and navigation hints needed to truly understand and work with codebases effectively.
The key advantages:
- **Understands code structure**, not just text
- **Tracks relationships** between files and symbols
- **Provides context** for better comprehension
- **Suggests navigation** paths through code
- **Updates automatically** as code changes
Use this tool to make your AI agent interactions with codebases more intelligent, efficient, and context-aware.