UNPKG

@mertdeveci55/univer-import-export

Version:

Excel/CSV import and export library for Univer spreadsheets with full format preservation

348 lines (268 loc) 11.8 kB
# Univer Import/Export Library - Project Documentation ## Project Overview This library provides Excel/CSV import and export functionality for Univer spreadsheets, preserving formulas, formatting, charts, conditional formatting, and other Excel features. The library acts as a bridge between Excel files and Univer's spreadsheet format. ## Architecture & Core Principles ### 1. Import Pipeline Architecture The import process follows a multi-stage pipeline: ``` Excel File (.xlsx) JSZip Extraction XML Parsing LuckySheet Format Univer Format ``` #### Key Components: - **JSZip**: Handles unzipping of Excel files (XLSX is a ZIP archive) - **XML Parser**: Custom XML parsing with special character handling - **LuckySheet**: Intermediate format that bridges Excel and Univer - **Univer Converter**: Final transformation to Univer's data structure ### 2. Critical Design Decisions #### XML Parsing with Special Character Handling - **Problem**: Sheet names containing special characters (like ">>>") break standard XML regex patterns - **Solution**: Escape/unescape mechanism that temporarily replaces problematic characters during parsing - **Implementation**: `ReadXml.ts` contains `escapeXmlAttributes()` and `unescapeXmlAttributes()` methods #### Empty Sheet Preservation - **Principle**: ALL sheets must be preserved, even if completely empty - **Implementation**: Sheets without corresponding XML files are still created with default structure - **Never filter out sheets based on content ### 3. File Structure ``` src/  ToLuckySheet/ # Excel to LuckySheet conversion   LuckyFile.ts # Main file handler, orchestrates sheet processing   LuckySheet.ts # Individual sheet processor   ReadXml.ts # XML parsing with special character handling   LuckyCell.ts # Cell data processing  LuckyToUniver/ # LuckySheet to Univer conversion   UniverWorkBook.ts # Workbook structure conversion   UniverSheet.ts # Sheet data conversion  HandleZip.ts # ZIP file handling using JSZip  main.ts # Entry point with public API methods ``` ## Import Process Flow ### Stage 1: File Extraction ```javascript // HandleZip.ts - Unzips Excel file // Extracts: workbook.xml, worksheets/*.xml, sharedStrings.xml, styles.xml, etc. ``` ### Stage 2: XML Parsing ```javascript // ReadXml.ts - Parses XML with special character handling // Key: Escapes ">" in attribute values to "__GT__" before regex parsing // Then unescapes back to original after parsing ``` ### Stage 3: Sheet Discovery ```javascript // LuckyFile.ts - getSheetsFull() // 1. Reads all sheets from workbook.xml // 2. Maps sheet names to worksheet files // 3. Preserves ALL sheets including empty ones // 4. No hardcoded sheet additions ``` ### Stage 4: Data Processing ```javascript // LuckySheet.ts - Processes each sheet // Handles: cells, formulas, styles, merges, conditional formatting // Special handling for TRANSPOSE and array formulas ``` ### Stage 5: Univer Conversion ```javascript // UniverWorkBook.ts & UniverSheet.ts // Converts LuckySheet format to Univer's IWorkbookData structure ``` ## Publishing Workflow ### publish.sh Script The automated publishing script ensures consistency: ```bash #!/bin/bash # 1. Builds the project (gulp build) # 2. Increments version (npm version patch) # 3. Commits changes with descriptive message # 4. Pushes to GitHub # 5. Publishes to npm registry ``` **Usage**: Always use `./publish.sh` for releases. Never manually publish. ### Version History Management - Each version must fix specific issues - Never introduce hardcoded solutions for specific files - All fixes must be generic and work for any Excel file ## Key Methods & APIs ### Main Entry Points ```typescript // Transform Excel to Univer format LuckyExcel.transformExcelToUniver( file: File, callback: (data: IWorkbookData) => void, errorHandler: (error: Error) => void ): Promise<void> // Transform CSV to Univer format LuckyExcel.transformCsvToUniver( file: File, callback: (data: IWorkbookData) => void, errorHandler: (error: Error) => void ): void // Transform Univer to Excel LuckyExcel.transformUniverToExcel(params: { snapshot: any, fileName?: string, success?: (buffer?: Buffer) => void, error?: (err: Error) => void }): Promise<void> ``` ### Critical Internal Methods #### ReadXml.ts - `escapeXmlAttributes(xml: string)`: Escapes special characters in XML attributes - `unescapeXmlAttributes(xml: string)`: Restores original characters - `getElementsByOneTag(tag: string, file: string)`: Parses XML elements with escaping #### LuckyFile.ts - `getSheetsFull()`: Discovers and processes all sheets from workbook.xml - `getSheetFileBysheetId(rid: string)`: Maps sheet references to worksheet files - **NO hardcoded sheet additions** - removed in v0.1.24 #### LuckySheet.ts - `generateConfigRowLenAndHiddenAddCell()`: Processes rows and cells - `generateCellData()`: Converts cell data to LuckySheet format - Handles formulas, including TRANSPOSE array formulas ## Error Handling Principles 1. **Never silently fail**: All errors must be logged with context 2. **Preserve data integrity**: If parsing fails for one element, don't corrupt others 3. **Defensive coding**: Always check for undefined/null before accessing properties 4. **Detailed logging**: Use console.log liberally during development (terser config: drop_console: false) ## Testing & Debugging ### Debug Logging Extensive logging throughout the codebase: - `= [LuckyFile]` - File processing logs - `= [ReadXml]` - XML parsing and escaping logs - `=� [PACKAGE]` - Main process flow logs - `` Success indicators - `L` Error indicators ### Common Issues & Solutions 1. **Missing sheets with special characters** - Solution: Escape/unescape mechanism in ReadXml.ts (v0.1.23+) 2. **AttributeList undefined errors** - Solution: Defensive checks and proper initialization (v0.1.21+) 3. **Duplicate sheets** - Solution: Removed hardcoded sheet additions (v0.1.24) 4. **TRANSPOSE formulas not working** - Solution: Array formula handling in cell processing ## Development Guidelines ### Do's -  Use generic solutions that work for all Excel files -  Preserve all Excel features during import -  Log extensively during development -  Use publish.sh for all releases -  Test with various Excel files including edge cases -  Handle special characters properly ### Don'ts - L Never hardcode solutions for specific files - L Never filter out "empty" sheets - L Never manually publish to npm - L Never assume sheet names are simple strings - L Never skip error handling ## Dependencies ### Core Dependencies - `@progress/jszip-esm`: ZIP file handling - `@zwight/exceljs`: Excel file structure (used for export) - `@univerjs/core`: Univer core types and interfaces - `dayjs`: Date manipulation for Excel date formats - `papaparse`: CSV parsing - `xlsx`: Additional Excel format handling ### Build Tools - `gulp`: Build orchestration - `rollup`: Module bundling - `typescript`: Type safety - `terser`: Minification (configured to keep console.logs) ## Backend Post-Processing Integration ### Architecture Overview The system now supports **surgical post-processing** to fix ExcelJS limitations without rebuilding existing functionality: ``` Univer Data ExcelJS Export (working features) openpyxl Post-Processing (fixes) Perfect Excel ``` #### Key Components 1. **Frontend Export (ExcelJS)**: Handles 95% of features perfectly 2. **Backend Post-Processing (openpyxl)**: Fixes the remaining 5% that ExcelJS can't handle 3. **Dual-Mode System**: Automatic fallback ensures exports never fail ### Implementation Details #### Backend Integration (`spreadsheets/import_export/`) ```python # excel_post_processor.py - Surgical fixes for ExcelJS limitations class ExcelPostProcessor: def process_excel_buffer(self, excel_buffer: bytes, univer_metadata: Dict) -> bytes: # Load existing Excel file (from working ExcelJS export) workbook = load_workbook(BytesIO(excel_buffer)) # Fix 1: Add missing defined names self._fix_defined_names(workbook, univer_metadata) # Fix 2: Fix array formula XML attributes (future) self._fix_array_formulas(workbook, univer_metadata) # Return enhanced file return self._to_buffer(workbook) ``` #### Service Layer Integration ```python # services.py - Enhanced export with post-processing @staticmethod def export_spreadsheet_to_excel(workbook_data, enable_post_processing=True): # Step 1: Use existing working export excel_bytes = UniverToExcelConverter().convert(workbook_data).getvalue() # Step 2: Apply surgical fixes if enabled if enable_post_processing: post_processor = ExcelPostProcessor() excel_bytes = post_processor.process_excel_buffer(excel_bytes, workbook_data) return excel_bytes, export_stats ``` #### Frontend Integration ```typescript // Dual-mode export system const result = await exportToExcel({ workbookData, useBackendExport: true, // Prefer backend for compatibility enablePostProcessing: true, // Fix defined names exportSpreadsheetToExcel: api.exportSpreadsheetToExcel }); // Automatic fallback to frontend if backend fails if (!result.success && useBackendExport) { // Retry with frontend-only export return exportToExcel({ ...options, useBackendExport: false }); } ``` ### Fixed Issues #### Defined Names (v0.1.39+) - **Problem**: ExcelJS `definedNames.add()` API is broken - doesn't persist names - **Solution**: openpyxl `workbook.defined_names.add(DefinedName(...))` works perfectly - **Result**: All named ranges now work in Excel (capexswitch, circ, etc.) #### Array Formulas (Planned) - **Problem**: Missing `t="array"` and `ref="range"` XML attributes - **Solution**: openpyxl has native array formula support with proper XML generation - **Status**: Architecture ready, implementation pending ### Performance & Safety #### Performance Metrics - **Base export**: ~2-3ms (ExcelJS - unchanged) - **Post-processing**: ~5-7ms (openpyxl fixes) - **Total overhead**: ~7ms for complete Excel compatibility - **File size**: Identical to original export #### Safety Guarantees - **Zero regressions**: Post-processing only adds missing features - **Fallback protection**: Frontend export always available - **Selective processing**: Only applies fixes when needed - **Error isolation**: Post-processing failures don't break base export ### Testing & Validation #### Comprehensive Test Coverage ```python def test_post_processing_safety(): # 1. Export with current system (baseline) original = export_current_system(test_data) # 2. Apply post-processing enhanced = post_processor.fix(original, metadata) # 3. Validate no regressions assert_styles_identical(original, enhanced) assert_formulas_identical(original, enhanced) assert_structure_identical(original, enhanced) # 4. Validate fixes applied assert_defined_names_work(enhanced) # NEW functionality ``` #### Real-World Validation - **test.xlsx**: 13 sheets, 6 defined names, special characters (`>>>`) - **Before**: 0 defined names exported - **After**: 6/6 defined names working perfectly in Excel - **Compatibility**: Opens correctly in Excel 365, Excel 2021, Excel 2019 ## Future Improvements 1. **Array Formula XML Attributes**: Complete the openpyxl-based array formula fixes 2. **Performance**: Stream processing for huge datasets 3. **Features**: Support for more Excel features (macros, pivot tables) via backend 4. **Caching**: Cache post-processing results for identical exports 5. **Testing**: Automated integration tests with real Excel files