codesummary
Version:
Cross-platform CLI tool that generates professional PDF documentation and RAG-optimized JSON outputs from project source code. Perfect for code reviews, audits, documentation, and AI/ML applications with semantic chunking and precision offsets.
191 lines (154 loc) • 7.63 kB
Markdown
# Changelog
All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
## [1.1.1] - 2025-07-31
### 🔧 **Fixes & Improvements**
#### **CLI Enhancements**
- **Added Version Flag**: New `--version` and `-v` flags to display current version
- **Cross-Platform Compatibility**: Fixed Windows path resolution for version detection
- **Help Documentation**: Updated help text to include version option
#### **Dependency Cleanup**
- **Removed Deprecated Crypto**: Eliminated `crypto@1.0.1` dependency (now uses built-in Node.js crypto)
- **Security Improvement**: No more npm warnings about deprecated packages
- **Cleaner Dependencies**: Reduced package footprint
#### **Bug Fixes**
- **Merge Conflicts**: Resolved conflicts between main and develop branches
- **CLI Argument Parsing**: Fixed unknown option error for `--version` flag
### 📋 **Migration Notes**
- No breaking changes
- Existing installations will benefit from cleaner dependencies
- New `--version` flag available immediately after update
---
## [1.1.0] - 2025-07-31
### 🎉 Major Features Added
#### 🔧 **Complete RAG System Refactoring**
- **Atomic JSON Generation**: Eliminated streaming-based approach that caused JSON corruption
- **100% Thread-Safe Processing**: All files processed in memory before writing
- **Robust Error Handling**: No more duplicate keys or malformed JSON output
- **Performance Boost**: ~107 more chunks generated with improved stability
#### 📊 **Precision Offset Index System**
- **Complete fileOffsets**: Format `fileId -> [start, end]` for rapid file seeking
- **Detailed chunkOffsets**: Individual chunk positions with `jsonStart`, `jsonEnd`, `contentStart`, `contentEnd`
- **99.8% Precision**: 509/510 chunks with valid byte-accurate offsets
- **RAG-Optimized**: Enables high-performance vector database operations
#### 🧠 **Enhanced Token Estimation Engine**
- **Multi-Heuristic Algorithm**: Replaces simple `ceil(length/4)` with sophisticated analysis
- **Language-Aware Processing**: Specialized calculations for JavaScript, Python, Java, C++, etc.
- **Syntax Analysis**: Accounts for brackets, operators, and language-specific tokens
- **20% More Accurate**: Example: 100 chars JavaScript goes from 25 → 30 tokens
#### 📈 **Complete Processing Statistics**
- **Real-Time Metrics**: Processing time, throughput, bytes written
- **Quality Assurance**: Empty files count, chunks with valid offsets
- **Performance Tracking**: `bytesPerSecond`, `avgFileSize`, `avgChunksPerFile`
- **Error Collection**: Detailed error tracking and reporting
#### 🔄 **Future-Proof Schema System**
- **Schema Versioning**: `schemaVersion: "1.0"` for migration management
- **Method Tracking**: `tokenEstimationMethod: "enhanced_heuristic_v1.0"`
- **Schema URL**: Links to official schema definition for validation
- **Backward Compatibility**: Maintains compatibility with existing consumers
### 🛠️ **Technical Improvements**
#### **Code Quality & Architecture**
- Eliminated 5+ problematic streaming methods (`streamingGeneration`, `writeMainBody`, etc.)
- Consolidated to single `generate()` method for clarity
- Removed global state variables that caused race conditions
- Enhanced function detection regex for better semantic chunking
#### **Performance Optimizations**
- **Processing Speed**: 510 chunks generated in 56ms (vs previous inconsistent timing)
- **Memory Efficiency**: 18.4 MB/s throughput with atomic processing
- **Output Size**: Optimized JSON structure - 1.03 MB for comprehensive indexing
- **Validation**: Built-in JSON structure validation with detailed reporting
#### **Enhanced ScriptHandler**
- Improved regex patterns for TypeScript interfaces, enums, class methods
- Better support for `const enum`, `implements`, access modifiers
- Enhanced arrow function detection with `let`, `var` support
- More precise function boundary detection with brace matching
### 🐛 **Bugs Fixed**
#### **Critical JSON Corruption Issues**
- ❌ **Fixed**: Duplicate `index` sections in output JSON
- ❌ **Fixed**: Negative `processingTimeMs` values
- ❌ **Fixed**: Inconsistent chunk counts between sections
- ❌ **Fixed**: Missing or incorrect byte offsets
- ❌ **Fixed**: Malformed JSON due to concurrent writes
- ❌ **Fixed**: Stream truncation issues with large files
#### **Data Integrity Issues**
- ❌ **Fixed**: Inconsistent statistics across different JSON sections
- ❌ **Fixed**: Incorrect `totalBytes` calculations
- ❌ **Fixed**: Missing `chunkOffsets` for seek operations
- ❌ **Fixed**: Race conditions in multi-file processing
### 📊 **Performance Metrics (Before vs After)**
| Metric | v1.0.2 | v1.1.0 | Improvement |
|--------|--------|--------|-------------|
| JSON Validity | ❌ Corrupted | ✅ 100% Valid | +100% |
| Chunk Generation | ~400 chunks | 510 chunks | +27% |
| Processing Time | Inconsistent | 56ms stable | Consistent |
| Offset Precision | ~60% valid | 99.8% valid | +66% |
| Memory Safety | Race conditions | Thread-safe | Stable |
| Output Size | Bloated/corrupt | 1.03 MB optimized | Efficient |
### 🔍 **API Changes**
#### **New JSON Structure Fields**
```json
{
"metadata": {
"schemaVersion": "1.0",
"schemaUrl": "https://github.com/skamoll/CodeSummary/schemas/rag-output.json",
"config": {
"tokenEstimationMethod": "enhanced_heuristic_v1.0"
}
},
"index": {
"chunkOffsets": {
"chunk_id": {
"jsonStart": 1234,
"jsonEnd": 5678,
"contentStart": 2000,
"contentEnd": 4000,
"filePath": "src/file.js"
}
},
"fileOffsets": {
"file_id": [startByte, endByte]
},
"statistics": {
"processingTimeMs": 56,
"bytesPerSecond": 18404786,
"chunksWithValidOffsets": 509,
"emptyFiles": 0
}
}
}
```
### 🎯 **Use Cases Enabled**
#### **RAG/Vector Database Applications**
- **Rapid Content Retrieval**: Use `chunkOffsets` for instant chunk access
- **Efficient File Processing**: `fileOffsets` enable selective file loading
- **Quality Metrics**: Statistics help optimize chunk size and processing
#### **Code Analysis Tools**
- **Semantic Navigation**: Enhanced function detection for better code understanding
- **Token Budget Planning**: Accurate token estimation for LLM interactions
- **Processing Monitoring**: Detailed metrics for pipeline optimization
### 🔗 **Migration Guide**
#### **From v1.0.x to v1.1.0**
1. **JSON Structure**: New `index` section with detailed offsets - update parsers
2. **Token Estimates**: Values may be ~20% higher due to improved accuracy
3. **Statistics**: New fields available in `index.statistics`
4. **Schema**: Check `metadata.schemaVersion` for compatibility
#### **Backward Compatibility**
- ✅ All existing `metadata` and `files` sections unchanged
- ✅ Chunk structure remains the same
- ✅ CLI interface identical
- ⚠️ New `index` section - consumers should handle gracefully
---
## [1.0.2] - 2025-07-29
### Fixed
- Bug fixes and stability improvements
- Enhanced cross-platform compatibility
## [1.0.1] - 2025-07-28
### Added
- Initial RAG functionality
- Basic PDF generation
## [1.0.0] - 2025-07-27
### Added
- Initial release
- Core PDF generation functionality
- Multi-language support