codesummary

Version:

Cross-platform CLI tool that generates professional PDF documentation and RAG-optimized JSON outputs from project source code. Perfect for code reviews, audits, documentation, and AI/ML applications with semantic chunking and precision offsets.

github.com/skamoll/CodeSummary

skamoll/CodeSummary

191 lines (154 loc) • 7.63 kB

Markdown

# Changelog All notable changes to this project will be documented in this file. The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/), and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). ## [1.1.1] - 2025-07-31 ### 🔧 **Fixes & Improvements** #### **CLI Enhancements** - **Added Version Flag**: New `--version` and `-v` flags to display current version - **Cross-Platform Compatibility**: Fixed Windows path resolution for version detection - **Help Documentation**: Updated help text to include version option #### **Dependency Cleanup** - **Removed Deprecated Crypto**: Eliminated `crypto@1.0.1` dependency (now uses built-in Node.js crypto) - **Security Improvement**: No more npm warnings about deprecated packages - **Cleaner Dependencies**: Reduced package footprint #### **Bug Fixes** - **Merge Conflicts**: Resolved conflicts between main and develop branches - **CLI Argument Parsing**: Fixed unknown option error for `--version` flag ### 📋 **Migration Notes** - No breaking changes - Existing installations will benefit from cleaner dependencies - New `--version` flag available immediately after update --- ## [1.1.0] - 2025-07-31 ### 🎉 Major Features Added #### 🔧 **Complete RAG System Refactoring** - **Atomic JSON Generation**: Eliminated streaming-based approach that caused JSON corruption - **100% Thread-Safe Processing**: All files processed in memory before writing - **Robust Error Handling**: No more duplicate keys or malformed JSON output - **Performance Boost**: ~107 more chunks generated with improved stability #### 📊 **Precision Offset Index System** - **Complete fileOffsets**: Format `fileId -> [start, end]` for rapid file seeking - **Detailed chunkOffsets**: Individual chunk positions with `jsonStart`, `jsonEnd`, `contentStart`, `contentEnd` - **99.8% Precision**: 509/510 chunks with valid byte-accurate offsets - **RAG-Optimized**: Enables high-performance vector database operations #### 🧠 **Enhanced Token Estimation Engine** - **Multi-Heuristic Algorithm**: Replaces simple `ceil(length/4)` with sophisticated analysis - **Language-Aware Processing**: Specialized calculations for JavaScript, Python, Java, C++, etc. - **Syntax Analysis**: Accounts for brackets, operators, and language-specific tokens - **20% More Accurate**: Example: 100 chars JavaScript goes from 25 → 30 tokens #### 📈 **Complete Processing Statistics** - **Real-Time Metrics**: Processing time, throughput, bytes written - **Quality Assurance**: Empty files count, chunks with valid offsets - **Performance Tracking**: `bytesPerSecond`, `avgFileSize`, `avgChunksPerFile` - **Error Collection**: Detailed error tracking and reporting #### 🔄 **Future-Proof Schema System** - **Schema Versioning**: `schemaVersion: "1.0"` for migration management - **Method Tracking**: `tokenEstimationMethod: "enhanced_heuristic_v1.0"` - **Schema URL**: Links to official schema definition for validation - **Backward Compatibility**: Maintains compatibility with existing consumers ### 🛠️ **Technical Improvements** #### **Code Quality & Architecture** - Eliminated 5+ problematic streaming methods (`streamingGeneration`, `writeMainBody`, etc.) - Consolidated to single `generate()` method for clarity - Removed global state variables that caused race conditions - Enhanced function detection regex for better semantic chunking #### **Performance Optimizations** - **Processing Speed**: 510 chunks generated in 56ms (vs previous inconsistent timing) - **Memory Efficiency**: 18.4 MB/s throughput with atomic processing - **Output Size**: Optimized JSON structure - 1.03 MB for comprehensive indexing - **Validation**: Built-in JSON structure validation with detailed reporting #### **Enhanced ScriptHandler** - Improved regex patterns for TypeScript interfaces, enums, class methods - Better support for `const enum`, `implements`, access modifiers - Enhanced arrow function detection with `let`, `var` support - More precise function boundary detection with brace matching ### 🐛 **Bugs Fixed** #### **Critical JSON Corruption Issues** - ❌ **Fixed**: Duplicate `index` sections in output JSON - ❌ **Fixed**: Negative `processingTimeMs` values - ❌ **Fixed**: Inconsistent chunk counts between sections - ❌ **Fixed**: Missing or incorrect byte offsets - ❌ **Fixed**: Malformed JSON due to concurrent writes - ❌ **Fixed**: Stream truncation issues with large files #### **Data Integrity Issues** - ❌ **Fixed**: Inconsistent statistics across different JSON sections - ❌ **Fixed**: Incorrect `totalBytes` calculations - ❌ **Fixed**: Missing `chunkOffsets` for seek operations - ❌ **Fixed**: Race conditions in multi-file processing ### 📊 **Performance Metrics (Before vs After)** | Metric | v1.0.2 | v1.1.0 | Improvement | |--------|--------|--------|-------------| | JSON Validity | ❌ Corrupted | ✅ 100% Valid | +100% | | Chunk Generation | ~400 chunks | 510 chunks | +27% | | Processing Time | Inconsistent | 56ms stable | Consistent | | Offset Precision | ~60% valid | 99.8% valid | +66% | | Memory Safety | Race conditions | Thread-safe | Stable | | Output Size | Bloated/corrupt | 1.03 MB optimized | Efficient | ### 🔍 **API Changes** #### **New JSON Structure Fields** ```json { "metadata": { "schemaVersion": "1.0", "schemaUrl": "https://github.com/skamoll/CodeSummary/schemas/rag-output.json", "config": { "tokenEstimationMethod": "enhanced_heuristic_v1.0" } }, "index": { "chunkOffsets": { "chunk_id": { "jsonStart": 1234, "jsonEnd": 5678, "contentStart": 2000, "contentEnd": 4000, "filePath": "src/file.js" } }, "fileOffsets": { "file_id": [startByte, endByte] }, "statistics": { "processingTimeMs": 56, "bytesPerSecond": 18404786, "chunksWithValidOffsets": 509, "emptyFiles": 0 } } } ``` ### 🎯 **Use Cases Enabled** #### **RAG/Vector Database Applications** - **Rapid Content Retrieval**: Use `chunkOffsets` for instant chunk access - **Efficient File Processing**: `fileOffsets` enable selective file loading - **Quality Metrics**: Statistics help optimize chunk size and processing #### **Code Analysis Tools** - **Semantic Navigation**: Enhanced function detection for better code understanding - **Token Budget Planning**: Accurate token estimation for LLM interactions - **Processing Monitoring**: Detailed metrics for pipeline optimization ### 🔗 **Migration Guide** #### **From v1.0.x to v1.1.0** 1. **JSON Structure**: New `index` section with detailed offsets - update parsers 2. **Token Estimates**: Values may be ~20% higher due to improved accuracy 3. **Statistics**: New fields available in `index.statistics` 4. **Schema**: Check `metadata.schemaVersion` for compatibility #### **Backward Compatibility** - ✅ All existing `metadata` and `files` sections unchanged - ✅ Chunk structure remains the same - ✅ CLI interface identical - ⚠️ New `index` section - consumers should handle gracefully --- ## [1.0.2] - 2025-07-29 ### Fixed - Bug fixes and stability improvements - Enhanced cross-platform compatibility ## [1.0.1] - 2025-07-28 ### Added - Initial RAG functionality - Basic PDF generation ## [1.0.0] - 2025-07-27 ### Added - Initial release - Core PDF generation functionality - Multi-language support