tree-ast-grep-mcp
Version:
Simple, direct ast-grep wrapper for AI coding agents. Zero abstractions, maximum performance.
935 lines (771 loc) • 25.7 kB
Markdown
# tree-ast-grep MCP Server - Comprehensive Development Plan
## Project Overview
This document provides comprehensive technical details for developers working on or extending the tree-ast-grep MCP server. The project transforms ast-grep's powerful AST manipulation capabilities into a standardized Model Context Protocol server for AI coding agents.
## Implementation Progress
### Phase 1: Architecture & Foundation ✅
#### 1.1 Project Structure Design
- **TypeScript Setup**: ES modules with strict type checking
- **Directory Structure**: Modular organization with clear separation of concerns
- **Build System**: Simple TypeScript compilation with npm scripts
- **Dependencies**: Minimal, focused on MCP SDK and validation
```
tree-ast-grep-mcp/
├── src/
│ ├── index.ts # Main MCP server entry point
│ ├── tools/ # MCP tool implementations
│ │ ├── search.ts # ast_search tool
│ │ ├── replace.ts # ast_replace tool
│ │ └── scan.ts # ast_scan tool
│ ├── core/ # Core infrastructure
│ │ ├── binary-manager.ts # ast-grep binary management
│ │ ├── validator.ts # Parameter validation
│ │ └── workspace-manager.ts # Security & workspace handling
│ └── types/ # Type definitions
│ ├── errors.ts # Custom error types
│ └── schemas.ts # Zod validation schemas
├── build/ # Compiled JavaScript
├── package.json # NPM configuration
├── tsconfig.json # TypeScript configuration
├── README.md # User documentation
└── PLAN.md # This technical documentation
```
#### 1.2 Error Handling System
Custom error hierarchy with specific error types:
```typescript
abstract class AstGrepMCPError extends Error {
abstract readonly code: string;
abstract readonly recoverable: boolean;
}
// Specific error types:
ValidationError // Invalid parameters, recoverable
BinaryError // Binary not found/invalid, not recoverable
SecurityError // Path traversal attempts, not recoverable
TimeoutError // Operation timeout, recoverable
FileSystemError // File access issues, recoverable
ExecutionError // ast-grep execution failure, recoverable
```
#### 1.3 Validation Framework
Comprehensive parameter validation using Zod schemas:
- **Type Safety**: Runtime validation with TypeScript types
- **Security**: Path traversal prevention, system directory blocking
- **Resource Limits**: File size and count constraints
- **Sanitization**: Input cleaning and normalization
### Phase 2: ast-grep Integration ✅
#### 2.1 Understanding ast-grep CLI Structure
ast-grep uses a command-based interface:
```bash
ast-grep <COMMAND> [OPTIONS] [PATHS]
Commands:
- run: Search/replace operations (primary use)
- scan: Rule-based analysis
- test: Test rules
- new: Create projects/rules
- lsp: Language server
```
**Critical Discovery**: ast-grep requires `--pattern` flag, not positional argument:
```bash
# CORRECT
ast-grep run --pattern 'console.log' --lang javascript file.js
# INCORRECT
ast-grep run 'console.log' --lang javascript file.js
```
#### 2.2 Binary Management Strategy
**Multi-tier Binary Resolution**:
1. **Custom Path**: `AST_GREP_BINARY_PATH` environment variable
2. **Auto-Install**: Download platform-specific binary to cache
3. **System Binary**: Find ast-grep in PATH
4. **Graceful Failure**: Clear error messages with installation options
**Platform Support Matrix**:
| Platform | Architecture | Binary Size | Status |
|----------|-------------|-------------|---------|
| Windows | x64 | 6.44 MB | ✅ Tested |
| Windows | ARM64 | 6.15 MB | ✅ Implemented |
| macOS | x64 | 6.95 MB | ✅ Implemented |
| macOS | ARM64 | 6.93 MB | ✅ Implemented |
| Linux | x64 | 7.1 MB | ✅ Implemented |
| Linux | ARM64 | 6.82 MB | ✅ Implemented |
#### 2.3 Download & Caching Implementation
**Robust Download Process**:
- **Retry Logic**: 3 attempts with exponential backoff
- **Stream Processing**: Handles large files without memory issues
- **Cache Validation**: Tests binaries before use, removes corrupted files
- **Progress Reporting**: Shows download progress for user feedback
**Cache Management**:
- **Location**: `~/.ast-grep-mcp/binaries/`
- **Naming**: `ast-grep-{platform}-{arch}[.exe]`
- **Validation**: Version check with `--version` command
- **Cleanup**: Automatic removal of invalid binaries
### Phase 3: MCP Tool Implementation ✅
#### 3.1 ast_search Tool
**Purpose**: AST-aware code pattern search
**CLI Mapping**:
```bash
ast-grep run --pattern <PATTERN> --lang <LANG> --json=stream [PATHS]
```
**Parameter Translation**:
```typescript
// MCP Input
{
pattern: "function logMessage",
language: "javascript",
context: 3,
paths: ["src/"]
}
// ast-grep Command
ast-grep run --pattern "function logMessage" --lang javascript --context 3 --json=stream src/
```
**JSON Output Parsing**:
ast-grep outputs JSON objects per match:
```json
{
"text": "function logMessage(message) {\n console.log(\"Debug: \" + message);\n}",
"range": {
"start": {"line": 1, "column": 0},
"end": {"line": 3, "column": 1}
},
"file": "test-example.js",
"language": "JavaScript"
}
```
**Output Normalization**:
- Convert 0-based line numbers to 1-based
- Extract context from range information
- Standardize field names across different ast-grep versions
#### 3.2 ast_replace Tool
**Purpose**: Structural find-and-replace operations
**CLI Mapping**:
```bash
ast-grep run --pattern <PATTERN> --rewrite <REPLACEMENT> --lang <LANG> [OPTIONS] [PATHS]
```
**Safety Features**:
- **Dry-run Default**: All operations default to preview mode
- **Backup Creation**: Automatic backups before modifications
- **Interactive Mode**: Built-in ast-grep interactive confirmation
- **Update-all Mode**: Batch operations for non-interactive use
**Parameter Translation**:
```typescript
// MCP Input
{
pattern: "console.log($_)",
replacement: "logger.info($_)",
dryRun: true,
interactive: false
}
// ast-grep Command (dry-run)
ast-grep run --pattern "console.log($_)" --rewrite "logger.info($_)" --json=stream
// ast-grep Command (apply)
ast-grep run --pattern "console.log($_)" --rewrite "logger.info($_)" --update-all --json=stream
```
#### 3.3 ast_scan Tool
**Purpose**: Rule-based code analysis and linting
**CLI Mapping**:
```bash
ast-grep scan --config <RULES_FILE> --json=stream [PATHS]
```
**Rule File Handling**:
- **Inline Rules**: Create temporary YAML file from string input
- **File Rules**: Validate and use existing rule files
- **Built-in Rules**: Future extension point for common patterns
**Configuration Example**:
```yaml
rules:
- id: no-console-log
message: Avoid console.log in production
severity: warning
language: javascript
rule:
pattern: console.log($_)
```
### Phase 4: Security & Safety Implementation ✅
#### 4.1 Workspace Security Model
**Boundary Enforcement**:
- **Root Detection**: Automatic detection using project indicators
- **Path Validation**: All paths resolved relative to workspace root
- **Traversal Prevention**: Block `../` and absolute paths outside workspace
- **System Protection**: Block access to system directories
**Project Root Detection Priority**:
1. Explicit `WORKSPACE_ROOT` environment variable
2. Auto-detection using indicators: `.git`, `package.json`, etc.
3. Fallback to current working directory
**Blocked Paths**:
```typescript
const SYSTEM_PATHS = [
'/etc', '/bin', '/usr', '/sys', '/proc', // Unix system
'C:\\Windows', 'C:\\Program Files', // Windows system
'~/.ssh', '~/.aws', // Credentials
'node_modules/.bin', // Executables
'.git' // Version control
];
```
#### 4.2 Resource Management
**File Limits**:
- **Maximum File Size**: 10MB per file (configurable)
- **Maximum File Count**: 10,000 files (configurable)
- **Timeout Limits**: 30-60 seconds per operation
- **Memory Buffer**: 10MB stdout/stderr buffer
**Environment Variables**:
```bash
MAX_FILE_SIZE=10485760 # 10MB file size limit
MAX_FILES=10000 # File count limit
EXECUTION_TIMEOUT=30000 # Operation timeout (ms)
```
### Phase 5: AST Pattern Alignment ✅
#### 5.1 Understanding ast-grep Patterns
**Pattern Types**:
1. **Literal Patterns**: Exact text matching
```javascript
// Pattern: "console.log"
// Matches: console.log("hello")
// Matches: console.log(variable)
```
2. **Metavariable Patterns**: Flexible matching
```javascript
// Pattern: "console.log($_)"
// $_ matches any single expression
// Matches: console.log("hello"), console.log(x + y)
```
3. **Multi-variable Patterns**: Multiple captures
```javascript
// Pattern: "function $NAME($ARGS) { $$$ }"
// $NAME matches function name
// $ARGS matches parameter list
// $$$ matches function body
```
4. **Complex Patterns**: Nested structures
```javascript
// Pattern: "if ($COND) { console.log($MSG) }"
// Matches conditional console.log statements
```
#### 5.2 Pattern Translation Guidelines
**From User Intent to ast-grep Pattern**:
1. **Search for console.log calls**:
```
User: "Find all console.log statements"
Pattern: "console.log"
Refined: "console.log($_)" (to capture arguments)
```
2. **Search for function declarations**:
```
User: "Find function declarations"
Pattern: "function"
Refined: "function $NAME($ARGS) { $$$ }"
```
3. **Search for class methods**:
```
User: "Find class methods"
Pattern: "class $CLASS { $METHOD() { $$$ } }"
```
#### 5.3 Tool Adjustment Recommendations
**For ast_search Tool**:
1. **Pattern Enhancement**: Add pattern suggestion logic
```typescript
function enhancePattern(userPattern: string, language: string): string {
// If user provides simple identifier, suggest metavariable pattern
if (/^[a-zA-Z_][a-zA-Z0-9_]*$/.test(userPattern)) {
if (language === 'javascript' || language === 'typescript') {
return `${userPattern}($_)`; // Enhance to capture calls
}
}
return userPattern;
}
```
2. **Language-Specific Defaults**:
```typescript
const LANGUAGE_PATTERNS = {
javascript: {
'functions': 'function $NAME($ARGS) { $$$ }',
'classes': 'class $NAME { $$$ }',
'imports': 'import $WHAT from $WHERE',
'exports': 'export $WHAT'
},
java: {
'methods': 'public $TYPE $NAME($ARGS) { $$$ }',
'classes': 'public class $NAME { $$$ }',
'interfaces': 'interface $NAME { $$$ }'
}
};
```
**For ast_replace Tool**:
1. **Replacement Templates**: Support metavariable substitution
```typescript
// Pattern: "console.log($_)"
// Replacement: "logger.info($_)"
// Result: console.log("hello") → logger.info("hello")
```
2. **Context-Aware Replacements**: Consider surrounding code
```typescript
// Pattern: "if ($COND) { console.log($MSG); }"
// Replacement: "if ($COND) { logger.debug($MSG); }"
```
**For ast_scan Tool**:
1. **Rule Templates**: Provide common rule patterns
```yaml
# Security rules
- id: no-eval
pattern: eval($_)
message: "eval() is dangerous"
severity: error
# Style rules
- id: prefer-const
pattern: let $VAR = $VAL
message: "Use const for non-reassigned variables"
severity: warning
```
### Phase 6: Advanced Features & Optimizations
#### 6.1 Pattern Intelligence
**Smart Pattern Detection**:
```typescript
class PatternIntelligence {
// Suggest better patterns based on user input
suggestPattern(input: string, language: string): string[] {
const suggestions: string[] = [];
// Function search enhancements
if (input.includes('function')) {
suggestions.push('function $NAME($ARGS) { $$$ }');
suggestions.push('const $NAME = ($ARGS) => { $$$ }');
}
// Class search enhancements
if (input.includes('class')) {
suggestions.push('class $NAME extends $PARENT { $$$ }');
suggestions.push('class $NAME { $$$ }');
}
return suggestions;
}
}
```
**Language-Specific Pattern Libraries**:
```typescript
interface LanguagePatterns {
[key: string]: {
common: Record<string, string>;
security: Record<string, string>;
performance: Record<string, string>;
};
}
const PATTERN_LIBRARY: LanguagePatterns = {
javascript: {
common: {
'async-function': 'async function $NAME($ARGS) { $$$ }',
'arrow-function': '($ARGS) => { $$$ }',
'method-call': '$OBJ.$METHOD($ARGS)',
'property-access': '$OBJ.$PROP'
},
security: {
'eval-usage': 'eval($_)',
'innerHTML': '$_.innerHTML = $_',
'document-write': 'document.write($_)'
},
performance: {
'dom-query': 'document.querySelector($_)',
'loop-dom': 'for ($_) { document.querySelector($_) }'
}
}
};
```
#### 6.2 Advanced CLI Integration
**Command Argument Builder**:
```typescript
class AstGrepCommandBuilder {
static buildSearchCommand(params: SearchParams): string[] {
const args = ['run'];
// Required pattern
args.push('--pattern', params.pattern);
// Language detection/specification
if (params.language) {
args.push('--lang', params.language);
} else {
// Auto-detect language from file extensions
const detectedLang = this.detectLanguage(params.paths);
if (detectedLang) {
args.push('--lang', detectedLang);
}
}
// Context options
if (params.context > 0) {
args.push('--context', params.context.toString());
}
// File filtering
if (params.include?.length > 0) {
params.include.forEach(pattern => {
args.push('--globs', pattern);
});
}
if (params.exclude?.length > 0) {
params.exclude.forEach(pattern => {
args.push('--globs', `!${pattern}`);
});
}
// Output formatting
args.push('--json=stream'); // Stream JSON for parsing
args.push('--heading=never'); // Consistent output format
// Paths (must be last)
args.push(...params.paths);
return args;
}
}
```
#### 6.3 Result Processing & Normalization
**JSON Output Parsing**:
```typescript
interface AstGrepOutput {
text: string; // Matched code text
range: {
start: { line: number; column: number };
end: { line: number; column: number };
};
file: string; // File path
language: string; // Detected language
lines: string; // Alternative text field
}
class ResultProcessor {
parseSearchResults(stdout: string): SearchMatch[] {
const matches: SearchMatch[] = [];
// ast-grep outputs one JSON object per line in stream mode
const lines = stdout.trim().split('\n');
for (const line of lines) {
if (!line.trim()) continue;
try {
const astGrepMatch: AstGrepOutput = JSON.parse(line);
// Normalize to MCP format
const normalizedMatch: SearchMatch = {
file: astGrepMatch.file,
line: astGrepMatch.range.start.line + 1, // Convert to 1-based
column: astGrepMatch.range.start.column,
text: astGrepMatch.text || astGrepMatch.lines,
context: this.extractContext(astGrepMatch),
matchedNode: astGrepMatch.text || astGrepMatch.lines
};
matches.push(normalizedMatch);
} catch (error) {
console.error('Failed to parse ast-grep JSON:', error);
}
}
return matches;
}
}
```
### Phase 7: Production Considerations
#### 7.1 Performance Optimizations
**Binary Execution Optimization**:
- **Process Reuse**: Consider long-running ast-grep process for multiple operations
- **Batch Operations**: Group multiple patterns into single command
- **Memory Management**: 10MB buffer limits with streaming output
- **Timeout Handling**: Graceful cancellation with partial results
**Caching Strategy**:
```typescript
class OperationCache {
// Cache frequently used patterns and results
private patternCache = new Map<string, SearchResult>();
async executeWithCache(
operation: () => Promise<SearchResult>,
cacheKey: string,
ttl: number = 300000 // 5 minutes
): Promise<SearchResult> {
const cached = this.patternCache.get(cacheKey);
if (cached && this.isValidCache(cached, ttl)) {
return cached;
}
const result = await operation();
this.patternCache.set(cacheKey, result);
return result;
}
}
```
#### 7.2 Error Recovery & Diagnostics
**Diagnostic Information Collection**:
```typescript
interface DiagnosticInfo {
binaryPath: string;
binaryVersion: string;
workspaceRoot: string;
platformInfo: {
platform: string;
arch: string;
nodeVersion: string;
};
operationMetrics: {
executionTime: number;
filesProcessed: number;
cacheHitRate: number;
};
}
```
**Progressive Fallback Strategy**:
1. **Primary**: Platform-specific binary execution
2. **Fallback 1**: System binary in PATH
3. **Fallback 2**: Alternative pattern matching (regex-based)
4. **Final**: Clear error with manual installation instructions
#### 7.3 Monitoring & Telemetry
**Operation Metrics**:
```typescript
interface OperationMetrics {
toolName: string;
executionTime: number;
filesProcessed: number;
matchesFound: number;
errorCount: number;
binaryPath: string;
cacheHit: boolean;
}
```
## ast-grep Usage Pattern Deep Dive
### Pattern Syntax Reference
#### Basic Patterns
```bash
# Literal matching
--pattern "console.log"
# Single metavariable (matches any expression)
--pattern "console.log($_)"
# Named metavariable (can be referenced in replacement)
--pattern "console.log($MSG)"
# Multiple metavariables
--pattern "function $NAME($ARGS) { $$$ }"
```
#### Advanced Patterns
```bash
# Conditional patterns
--pattern "if ($COND) { $$$ }"
# Class patterns
--pattern "class $NAME extends $PARENT { $$$ }"
# Import/Export patterns
--pattern "import $WHAT from $WHERE"
--pattern "export { $EXPORTS } from $MODULE"
# Async patterns
--pattern "await $PROMISE"
--pattern "async function $NAME($ARGS) { $$$ }"
```
#### Language-Specific Patterns
**JavaScript/TypeScript**:
```bash
# React components
--pattern "function $COMPONENT(props) { return $$$ }"
--pattern "const $COMPONENT = () => { return $$$ }"
# Promise patterns
--pattern "$PROMISE.then($HANDLER)"
--pattern "new Promise(($RESOLVE, $REJECT) => { $$$ })"
# DOM manipulation
--pattern "document.getElementById($_)"
--pattern "$ELEMENT.addEventListener($EVENT, $HANDLER)"
```
**Java**:
```bash
# Method patterns
--pattern "public $TYPE $METHOD($ARGS) { $$$ }"
--pattern "@$ANNOTATION public $TYPE $METHOD($ARGS) { $$$ }"
# Class patterns
--pattern "public class $NAME extends $PARENT { $$$ }"
--pattern "@Entity public class $NAME { $$$ }"
# Spring patterns
--pattern "@Autowired private $TYPE $FIELD"
--pattern "@RequestMapping($PATH) public $TYPE $METHOD($ARGS) { $$$ }"
```
**Python**:
```bash
# Function patterns
--pattern "def $NAME($ARGS): $$$"
--pattern "async def $NAME($ARGS): $$$"
# Class patterns
--pattern "class $NAME($PARENT): $$$"
--pattern "class $NAME: $$$"
# Decorator patterns
--pattern "@$DECORATOR def $NAME($ARGS): $$$"
```
### Tool Adjustment Guidelines
#### 5.1 Parameter Enhancement Strategies
**Auto-Pattern Enhancement**:
```typescript
function enhanceSearchPattern(pattern: string, language: string): string {
// If pattern is just an identifier, make it more specific
if (isSimpleIdentifier(pattern)) {
switch (language) {
case 'javascript':
case 'typescript':
// Enhance function names to capture calls
return `${pattern}($_)`;
case 'java':
// Enhance method names to capture definitions
return `$TYPE ${pattern}($ARGS) { $$$ }`;
case 'python':
// Enhance function names to capture definitions
return `def ${pattern}($ARGS): $$$`;
}
}
return pattern;
}
```
**Context-Aware Suggestions**:
```typescript
function suggestPatterns(intent: string, language: string): string[] {
const suggestions: string[] = [];
if (intent.includes('function')) {
switch (language) {
case 'javascript':
suggestions.push('function $NAME($ARGS) { $$$ }');
suggestions.push('const $NAME = ($ARGS) => { $$$ }');
suggestions.push('async function $NAME($ARGS) { $$$ }');
break;
case 'java':
suggestions.push('public $TYPE $NAME($ARGS) { $$$ }');
suggestions.push('private $TYPE $NAME($ARGS) { $$$ }');
break;
}
}
return suggestions;
}
```
#### 5.2 Replacement Template Guidelines
**Template Validation**:
```typescript
function validateReplacement(pattern: string, replacement: string): ValidationResult {
const patternVars = extractMetavariables(pattern);
const replaceVars = extractMetavariables(replacement);
// Ensure all replacement variables exist in pattern
const unmatchedVars = replaceVars.filter(v => !patternVars.includes(v));
if (unmatchedVars.length > 0) {
return {
valid: false,
errors: [`Replacement variables not found in pattern: ${unmatchedVars.join(', ')}`]
};
}
return { valid: true, errors: [] };
}
```
**Smart Replacement Suggestions**:
```typescript
const COMMON_REPLACEMENTS = {
'console.log($_)': [
'logger.info($_)',
'logger.debug($_)',
'console.warn($_)',
'// TODO: Remove debug log'
],
'var $NAME = $VALUE': [
'const $NAME = $VALUE',
'let $NAME = $VALUE'
],
'function $NAME($ARGS) { $$$ }': [
'const $NAME = ($ARGS) => { $$$ }',
'async function $NAME($ARGS) { $$$ }'
]
};
```
#### 5.3 Rule Development Guidelines
**Rule File Structure**:
```yaml
rules:
- id: rule-identifier
message: Human-readable description
severity: error|warning|info
language: javascript|java|python|etc
rule:
pattern: AST pattern to match
inside: Optional container pattern
not: Optional exclusion pattern
fix: Optional replacement pattern
note: Additional documentation
```
**Example Security Rules**:
```yaml
rules:
- id: no-eval
message: "Avoid eval() - security risk"
severity: error
language: javascript
rule:
pattern: eval($_)
note: "eval() can execute arbitrary code"
- id: sql-injection-risk
message: "Potential SQL injection"
severity: warning
language: javascript
rule:
pattern: $DB.query($SQL)
where:
SQL:
not: { pattern: "$_" } # Not a simple variable
note: "Use parameterized queries"
```
**Example Style Rules**:
```yaml
rules:
- id: prefer-const
message: "Use const for non-reassigned variables"
severity: warning
language: javascript
rule:
pattern: let $VAR = $_
not:
inside:
any:
- pattern: $VAR = $_
- pattern: $VAR++
- pattern: ++$VAR
fix: const $VAR = $_
- id: no-var
message: "Use let or const instead of var"
severity: warning
language: javascript
rule:
pattern: var $VAR = $_
fix: let $VAR = $_
```
### Future Enhancement Opportunities
#### 7.1 Interactive Mode Integration
- **Step-by-step Confirmation**: Integrate ast-grep's `--interactive` mode
- **Preview Mode**: Enhanced diff display for replacements
- **Batch Operations**: Queue multiple operations for user review
#### 7.2 Language Server Integration
- **Real-time Analysis**: Use ast-grep LSP for live feedback
- **IDE Integration**: Provide diagnostics and quick fixes
- **Workspace Analysis**: Background analysis with caching
#### 7.3 Rule Management
- **Rule Repository**: Curated collection of common rules
- **Custom Rules**: User-defined rule management
- **Rule Validation**: Test rules against sample code
#### 7.4 Performance Enhancements
- **Parallel Processing**: Multiple ast-grep processes for large codebases
- **Incremental Analysis**: Only analyze changed files
- **Smart Caching**: Cache results based on file fingerprints
## Installation & Deployment
### Package Distribution Strategy
**Multiple Package Approach**:
1. **Core Package** (`@cabbages/tree-ast-grep-mcp`): 200KB
- MCP server implementation
- Binary management logic
- Auto-download capabilities
2. **Platform Packages**: 7MB each
- Contains pre-compiled binaries
- Faster installation for specific platforms
3. **Full Package**: 40MB
- All platform binaries included
- Offline deployment support
### MCP Configuration Examples
**Standard Configuration**:
```json
{
"mcpServers": {
"tree-ast-grep": {
"command": "npx",
"args": ["-y", "@cabbages/tree-ast-grep-mcp", "--auto-install"]
}
}
}
```
**Advanced Configuration**:
```json
{
"mcpServers": {
"tree-ast-grep": {
"command": "npx",
"args": ["-y", "@cabbages/tree-ast-grep-mcp", "--platform=win32"],
"env": {
"WORKSPACE_ROOT": "C:/Projects/MyApp",
"AST_GREP_CACHE_DIR": "C:/Tools/ast-grep-cache",
"MAX_FILE_SIZE": "20971520",
"MAX_FILES": "20000"
}
}
}
}
```
This comprehensive plan provides all the technical details needed for developers to understand, maintain, and extend the tree-ast-grep MCP server implementation.