codesummary
Version:
Cross-platform CLI tool that generates professional PDF documentation and RAG-optimized JSON outputs from project source code. Perfect for code reviews, audits, documentation, and AI/ML applications with semantic chunking and precision offsets.
502 lines (379 loc) • 17.2 kB
Markdown
# CodeSummary – Detailed Features and Functional Specification
## 1. Overview
**CodeSummary** is a **Node.js-based, cross-platform CLI tool** (distributed via **npm**) that automatically scans a project's source code and generates a **clean, professional PDF document** containing:
- **Complete project file structure** with hierarchical organization
- **Full source code content** for all selected files (no truncation)
- **Intelligent file type detection** and user-selectable filtering
- **Clean, readable formatting** optimized for code documentation
Its primary goal is to **simplify code reviews, audits, and archival snapshots**, enabling teams and individuals to produce **self-contained, complete documentation** of their codebases with minimal setup.
> **Repository**: [https://github.com/skamoll/CodeSummary](https://github.com/skamoll/CodeSummary)
> **npm Package Name**: `codesummary`
---
### 1.1 Target Audience
- **Developers** needing quick overviews of large projects with complete content
- **Auditors/Consultants** requiring traceable documentation snapshots without size limits
- **Educators/Students** preparing comprehensive code handovers or learning materials
- **Teams** performing thorough code reviews or compliance checks
- **Project Managers** creating complete project documentation for stakeholders
---
### 1.2 Core Objectives
1. **Complete automated documentation** — includes ALL file content without truncation
2. **Cross-platform reliability** — identical behavior on Windows, macOS, and Linux
3. **Advanced configurability** — user-defined filters, styles, and output preferences
4. **Unlimited scalability** — handles projects of any size with efficient streaming
5. **Intelligent safe defaults** — avoids binaries and unwanted files with smart filtering
6. **Professional output** — clean, readable PDFs suitable for all professional contexts
7. **Smart conflict handling** — automatic timestamped filenames when files are in use
---
### 1.3 Key Differentiators
- **No content limits** — processes files of any size completely
- **Smart file conflict resolution** — automatic timestamped fallbacks
- **Terminal compatibility** — works with all terminal types across platforms
- **Whitelist-driven filtering** with extensive language support
- **Interactive first-run setup** with persistent global configuration
- **Memory-efficient streaming** for optimal performance on large projects
- **Non-destructive scanning** with comprehensive error handling
- **Fully offline operation** with no external dependencies
---
### 1.4 Technology Stack
- **Node.js** ≥ 18 (for native ES modules and modern APIs)
- **PDFKit** for professional PDF generation with streaming support
- **Inquirer.js** for interactive command-line prompts
- **Chalk** for cross-platform terminal styling
- **Ora** for progress indicators and status updates
- **fs-extra** for enhanced file system operations
---
## 2. Functional Requirements
### 2.1 Command-Line Interface
#### 2.1.1 Primary Commands
| Command | Description | Example |
|---------|-------------|---------|
| `codesummary` | Scan current directory and generate PDF | `codesummary` |
| `codesummary config` | Launch interactive configuration editor | `codesummary config` |
| `codesummary --show-config` | Display current configuration settings | `codesummary --show-config` |
| `codesummary --reset-config` | Reset to defaults and run setup wizard | `codesummary --reset-config` |
| `codesummary --help` | Show comprehensive help information | `codesummary --help` |
#### 2.1.2 Command-Line Options
| Option | Short | Description | Example |
|--------|-------|-------------|---------|
| `--output` | `-o` | Override output directory | `codesummary -o ./docs` |
| `--show-config` | - | Display current configuration | `codesummary --show-config` |
| `--reset-config` | - | Reset configuration to defaults | `codesummary --reset-config` |
| `--help` | `-h` | Show help message | `codesummary -h` |
#### 2.1.3 Interactive Workflow
1. **First-Run Setup**
- Detects missing configuration automatically
- Launches interactive setup wizard
- Configures output mode (relative/fixed path)
- Creates output directory if needed
- Saves persistent global configuration
2. **Directory Scanning**
- Recursively scans current working directory
- Applies whitelist filtering for file extensions
- Excludes common build/dependency directories
- Shows comprehensive scan summary with statistics
3. **Extension Selection**
- Presents detected file types in checkbox format
- Shows file counts for each extension
- Allows selective inclusion/exclusion
- Pre-selects all detected extensions by default
4. **PDF Generation**
- Processes all selected files completely (no truncation)
- Shows progress indicators for large files
- Handles file conflicts with timestamped names
- Generates clean, professional PDF output
---
### 2.2 Configuration Management
#### 2.2.1 Global Configuration Storage
**Storage Locations:**
- **Linux/macOS**: `~/.codesummary/config.json`
- **Windows**: `%APPDATA%\\CodeSummary\\config.json`
#### 2.2.2 Configuration Structure
```json
{
\"output\": {
\"mode\": \"fixed\" | \"relative\",
\"fixedPath\": \"string (absolute path)\"
},
\"allowedExtensions\": [\"array of file extensions\"],
\"excludeDirs\": [\"array of directory names to exclude\"],
\"styles\": {
\"colors\": {
\"title\": \"#333353\",
\"section\": \"#00FFB9\",
\"text\": \"#333333\",
\"error\": \"#FF4D4D\",
\"footer\": \"#666666\"
},
\"layout\": {
\"marginLeft\": 40,
\"marginTop\": 40,
\"marginRight\": 40,
\"footerHeight\": 20
}
},
\"settings\": {
\"documentTitle\": \"Project Code Summary\",
\"maxFilesBeforePrompt\": 500
}
}
```
#### 2.2.3 Configuration Features
- **Cross-platform path handling** with automatic normalization
- **Validation system** prevents invalid configurations
- **Interactive editor** for all configuration sections
- **Automatic backup and recovery** for corrupted configurations
- **Reset functionality** to restore defaults
---
### 2.3 File System Scanning
#### 2.3.1 Scanning Algorithm
1. **Recursive Directory Traversal**
- Starts from current working directory
- Follows symbolic links safely
- Respects file system permissions
- Handles large directory structures efficiently
2. **Filtering Logic**
- **Whitelist approach**: Only processes explicitly allowed extensions
- **Directory exclusions**: Skips common build/dependency directories
- **Hidden file handling**: Includes important dot files (.gitignore, .env.example)
- **Binary detection**: Automatically skips binary files
3. **Error Handling**
- Graceful handling of permission denied errors
- Continues scanning despite individual file failures
- Logs warnings for inaccessible files
- Provides detailed error context
#### 2.3.2 Supported File Extensions
**Programming Languages:**
- JavaScript: `.js`, `.jsx`, `.mjs`
- TypeScript: `.ts`, `.tsx`, `.d.ts`
- Python: `.py`, `.pyw`, `.pyx`
- Java: `.java`
- C/C++: `.c`, `.cpp`, `.cc`, `.cxx`, `.h`, `.hpp`
- C#: `.cs`
- Go: `.go`
- Rust: `.rs`
- Swift: `.swift`
- Kotlin: `.kt`, `.kts`
- Scala: `.scala`
- PHP: `.php`, `.phtml`
- Ruby: `.rb`, `.rbw`
**Web Technologies:**
- HTML: `.html`, `.htm`
- CSS: `.css`, `.scss`, `.sass`, `.less`
- Vue.js: `.vue`
- Svelte: `.svelte`
**Data & Configuration:**
- JSON: `.json`, `.jsonc`
- XML: `.xml`, `.xsd`, `.xsl`
- YAML: `.yaml`, `.yml`
- TOML: `.toml`
- SQL: `.sql`
- GraphQL: `.graphql`, `.gql`
**Scripts & Shell:**
- Shell: `.sh`, `.bash`, `.zsh`
- Batch: `.bat`, `.cmd`
- PowerShell: `.ps1`, `.psm1`
**Documentation:**
- Markdown: `.md`, `.markdown`
- Text: `.txt`
- Dockerfile: `.dockerfile`
#### 2.3.3 Directory Exclusions
**Default Excluded Directories:**
- `node_modules` (Node.js dependencies)
- `.git` (Git version control)
- `.vscode` (VS Code settings)
- `dist`, `build` (Build outputs)
- `coverage` (Test coverage reports)
- `out` (Output directories)
- `__pycache__` (Python cache)
- `.next` (Next.js build)
- `.nuxt` (Nuxt.js build)
- `vendor` (Dependency directories)
- `.cache` (Cache directories)
---
### 2.4 PDF Generation
#### 2.4.1 Document Structure
**1. Project Overview Section**
- Document title (configurable)
- Project name (derived from directory)
- Generation timestamp
- List of included file types with descriptions
- Clean, professional formatting
**2. File Structure Section**
- Complete hierarchical file listing
- Organized by relative paths from project root
- Sorted alphabetically for easy navigation
- Monospace font for proper alignment
**3. File Content Section**
- **Complete source code** for each selected file
- **No truncation or size limits**
- Proper monospace formatting for code readability
- File headers with clear identification
- Natural page breaks when needed
- Error handling for unreadable files
#### 2.4.2 PDF Specifications
**Format & Layout:**
- **Paper size**: A4 (595 × 842 points)
- **Margins**: 40pt on all sides for optimal content area
- **Fonts**:
- Headers: Helvetica Bold
- Body text: Helvetica
- Code content: Courier (monospace)
- **Colors**: Professional color scheme with high contrast
**Advanced Features:**
- **Streaming generation** for memory efficiency
- **Automatic page breaks** handled by PDFKit
- **Smart file conflict handling** with timestamped names
- **Progress indicators** for large file processing
- **Error recovery** with graceful failure handling
#### 2.4.3 File Naming Convention
**Standard naming:**
```
PROJECTNAME_code.pdf
```
**Conflict resolution (when file is in use):**
```
PROJECTNAME_code_YYYYMMDD_HHMMSS.pdf
```
**Example:**
```
MYPROJECT_code.pdf # Standard
MYPROJECT_code_20250729_141602.pdf # Timestamped fallback
```
---
### 2.5 Cross-Platform Compatibility
#### 2.5.1 Operating System Support
- **Windows** (10, 11, Server 2019+)
- **macOS** (10.15+, including Apple Silicon)
- **Linux** (Ubuntu 18.04+, CentOS 7+, other major distributions)
#### 2.5.2 Terminal Compatibility
- **Universal ASCII output** - no special Unicode characters
- **Color support detection** with graceful fallbacks
- **All terminal types supported** (cmd, PowerShell, bash, zsh, fish)
- **Screen reader compatible** output format
#### 2.5.3 Path Handling
- **Automatic path normalization** across platforms
- **Unicode filename support** for international characters
- **Long path support** on Windows (>260 characters)
- **Case sensitivity handling** appropriate to each platform
---
### 2.6 Performance & Scalability
#### 2.6.1 Memory Management
- **Streaming file processing** to minimize memory usage
- **Efficient PDF generation** with incremental building
- **Garbage collection optimization** for large projects
- **Memory usage monitoring** with warnings for extreme cases
#### 2.6.2 Large Project Handling
- **No file size limits** - processes files of any size completely
- **Progress indicators** for files with >1000 lines
- **Configurable warning thresholds** (default: 500 files)
- **User confirmation** for very large projects
- **Streaming architecture** prevents memory overflow
#### 2.6.3 Performance Optimizations
- **Parallel file scanning** where safe
- **Efficient binary detection** to skip non-text files quickly
- **Smart caching** of file metadata
- **Optimized PDF rendering** with minimal memory footprint
---
### 2.7 Error Handling & Validation
#### 2.7.1 Input Validation
- **Path validation** with security checks
- **Configuration validation** with schema enforcement
- **File extension validation** with normalization
- **Permission checking** before operations
#### 2.7.2 Error Recovery
- **Graceful degradation** when files are inaccessible
- **Automatic retry** for transient failures
- **Detailed error logging** with context information
- **User-friendly error messages** with suggested solutions
#### 2.7.3 File Conflict Handling
- **Automatic detection** of files in use
- **Timestamped filename generation** for conflicts
- **User notification** of filename changes
- **Fallback mechanisms** for write failures
---
## 3. Technical Architecture
### 3.1 Module Structure
```
src/
├── cli.js # Command-line interface and user interaction
├── configManager.js # Global configuration management
├── scanner.js # File system scanning and filtering
├── pdfGenerator.js # PDF creation and formatting
└── errorHandler.js # Comprehensive error handling
```
### 3.2 Key Design Patterns
- **Modular architecture** with clear separation of concerns
- **Event-driven processing** for scalable file handling
- **Stream-based operations** for memory efficiency
- **Functional programming principles** where appropriate
- **Comprehensive error boundaries** with graceful recovery
### 3.3 Dependencies
**Core Dependencies:**
- `pdfkit` - Professional PDF generation
- `inquirer` - Interactive command-line prompts
- `chalk` - Cross-platform terminal styling
- `ora` - Progress indicators and spinners
- `fs-extra` - Enhanced file system operations
**Development Dependencies:**
- Modern ES modules (Node.js 18+)
- Native Promise-based APIs
- Cross-platform path handling
- Unicode and internationalization support
---
## 4. Quality Assurance
### 4.1 Testing Strategy
- **Cross-platform testing** on Windows, macOS, and Linux
- **Large project stress testing** with thousands of files
- **Memory usage profiling** for optimization
- **Terminal compatibility verification** across different environments
- **File conflict scenario testing** with various edge cases
### 4.2 Security Considerations
- **Path traversal prevention** with input validation
- **Permission-based access control** respecting system security
- **No external network dependencies** for complete offline operation
- **Safe file handling** with proper error boundaries
- **Configuration validation** to prevent malicious settings
### 4.3 Documentation Standards
- **Comprehensive README** with usage examples
- **Detailed feature specification** (this document)
- **Inline code documentation** with JSDoc standards
- **Error message clarity** with actionable guidance
- **Contributing guidelines** for open-source collaboration
---
## 5. Future Enhancements
### 5.1 Planned Features
- **Syntax highlighting** in PDF output for better code readability
- **Clickable table of contents** with bookmarks for navigation
- **Multiple output formats** (HTML, JSON, Markdown)
- **Project metrics and statistics** (line counts, complexity analysis)
- **CI/CD integration mode** for automated documentation pipelines
- **Custom PDF themes** and styling options
- **Plugin system** for custom file processors
### 5.2 Advanced Capabilities
- **Incremental updates** for changed files only
- **Git integration** for commit-specific documentation
- **Code annotation** system for additional context
- **Multi-language support** for international users
- **Web-based configuration** interface for easier setup
- **Integration APIs** for third-party tools
---
## 6. Success Metrics
### 6.1 Performance Targets
- **Scan speed**: >1000 files per second on modern hardware
- **Memory usage**: <200MB for projects with 10,000+ files
- **PDF generation**: <30 seconds for typical projects (100 files)
- **Cross-platform consistency**: 100% feature parity across platforms
### 6.2 Quality Targets
- **Zero data loss**: All file content included without truncation
- **Error rate**: <0.1% failure rate on valid projects
- **User satisfaction**: Clear, actionable error messages for all failure cases
- **Compatibility**: Works on 99%+ of supported platform/terminal combinations
---
## 7. Conclusion
CodeSummary represents a comprehensive solution for automated code documentation, combining professional-grade PDF output with intelligent file processing and cross-platform compatibility. Its focus on complete content inclusion, smart conflict handling, and terminal compatibility makes it suitable for both individual developers and enterprise environments.
The tool's architecture supports unlimited scalability while maintaining efficient resource usage, ensuring it can handle projects of any size. With its extensive language support and intelligent filtering, CodeSummary serves as a valuable tool for code reviews, audits, documentation, and archival purposes.
---
**Document Version**: 2.0
**Last Updated**: January 2025
**Status**: Implementation Complete - Ready for Release