UNPKG

@mertdeveci55/univer-import-export

Version:

Excel/CSV import and export library for Univer spreadsheets with full format preservation

205 lines (158 loc) 7.23 kB
# Import/Export Issues - Learnings and Plan ## Executive Summary Version 0.1.38 fixed critical style/border import issues, but fundamental limitations in the ExcelJS library prevent complete Excel compatibility. Two major issues remain that require post-processing to resolve. ## What's Fixed in v0.1.38 ✅ ### 1. Border Styles Import (FIXED) - **Problem**: 0 border styles were being imported (should be ~37) - **Root Cause**: `UniverWorkBook` never populated `this.styles` registry - **Solution**: Added `collectStyles()` method to collect all unique styles from cells - **Result**: Now correctly imports and exports 37 border styles ## Remaining Issues ❌ ### IMPORT (Working Fine) - ✅ Defined names are imported correctly (stored in `resources` array) - ✅ Array formulas are imported correctly - ✅ Styles and borders now collected properly ### EXPORT (Has Issues) #### 1. Defined Names Not Exported - **Symptom**: Excel files lose all defined names (e.g., "capexswitch", "circ", etc.) - **Root Cause**: ExcelJS's `definedNames.add()` API is completely broken ```javascript workbook.definedNames.add('TestName', 'Sheet1!$A$1'); console.log(workbook.definedNames.model); // Returns: [] (empty!) ``` - **Impact**: Named ranges don't work in exported Excel files #### 2. Array Formula Attributes Missing - **Symptom**: TRANSPOSE formulas missing `t="array"` and `ref="range"` XML attributes - **Root Cause**: ExcelJS doesn't support modern Excel dynamic array formulas - **Dilemma**: - Using `fillFormula()` → adds @ symbols (breaks formulas) - Not using it → missing required XML attributes - **Impact**: Array formulas may not spill correctly in Excel ## Root Cause Analysis ### The Core Problem: ExcelJS Library Limitations ExcelJS (@zwight/exceljs@4.4.2) has fundamental limitations: 1. **Broken APIs**: `definedNames.add()` doesn't actually add anything 2. **Outdated Excel Support**: No proper support for Excel 365 dynamic arrays 3. **No Post-Processing Hooks**: Can't modify XML before final output ### Why This Matters Modern Excel files (Excel 365/2021) use features that ExcelJS was never designed to handle: - Dynamic array formulas (TRANSPOSE, FILTER, SORT, etc.) - Spill ranges - Complex defined names with workbook references ## Potential Solutions ### Option 1: Frontend Post-Processing (Current Workaround) ```javascript // After ExcelJS export const zip = new JSZip(); await zip.loadAsync(exportBuffer); // Modify XML files directly // Re-zip and download ``` **Pros**: Works now, proven in test files **Cons**: Performance overhead, happens client-side ### Option 2: Backend Post-Processing with Python/openpyxl 🐍 **Concept**: Use Python's openpyxl library to fix the Excel file after ExcelJS export ```python # Backend endpoint (Python/Flask/FastAPI) from openpyxl import load_workbook import io def fix_excel_export(excel_buffer, univer_data): # Load the ExcelJS-generated file wb = load_workbook(io.BytesIO(excel_buffer)) # Fix 1: Add defined names for name, ref in univer_data['defined_names'].items(): wb.defined_names.append(DefinedName(name, attr_text=ref)) # Fix 2: Fix array formulas for ws in wb.worksheets: for row in ws.iter_rows(): for cell in row: if cell.formula and 'TRANSPOSE' in cell.formula: # openpyxl supports array formulas properly cell.array_formula = cell.formula # Return fixed buffer output = io.BytesIO() wb.save(output) return output.getvalue() ``` **Pros**: - openpyxl has excellent Excel compatibility - Supports all modern Excel features - Server-side processing (better performance) - Python ecosystem for Excel is mature **Cons**: - Requires Python backend service - Additional infrastructure complexity ### Option 3: Replace ExcelJS Entirely **Alternative Libraries**: 1. **SheetJS (xlsx)**: More comprehensive but different API 2. **Direct XML manipulation**: Most control but more work 3. **Server-side only**: Use openpyxl for both import and export ## Recommended Approach ### Short Term (Immediate) 1. Keep v0.1.38 with border fixes 2. Document the limitations clearly 3. Provide post-processing examples for critical users ### Medium Term (Next Sprint) **Implement Backend Post-Processing with openpyxl:** 1. **Architecture**: ``` Frontend → Univer → ExcelJS Export → Send to Backend → openpyxl Fix → Return Fixed File ``` 2. **Backend Service** (Python/FastAPI): - Endpoint: `POST /api/excel/post-process` - Input: Excel buffer + Univer metadata (defined names, array formulas) - Output: Fixed Excel file - Processing: ~100ms for typical files 3. **Implementation Steps**: - Set up Python microservice with openpyxl - Create post-processing functions for each issue - Add API endpoint in main backend - Update frontend to use backend post-processing when available ### Long Term (Future) Consider full migration away from ExcelJS to a more capable library or custom implementation. ## Technical Details for openpyxl Solution ### What openpyxl Can Fix: 1.**Defined Names**: Full support via `workbook.defined_names` 2.**Array Formulas**: Native support with proper XML generation 3.**Dynamic Arrays**: Understands Excel 365 spill ranges 4.**No @ Symbol Issues**: Properly handles modern formulas 5.**Preserves All Styles**: Better style preservation ### Sample Backend Implementation: ```python # Fix defined names for name_obj in univer_metadata.get('definedNames', []): wb.defined_names.append( DefinedName( name=name_obj['name'], attr_text=name_obj['formulaOrRefString'] ) ) # Fix array formulas for sheet_id, sheet_data in univer_metadata.get('sheets', {}).items(): ws = wb[sheet_data['name']] for array_formula in sheet_data.get('arrayFormulas', []): cell_ref = array_formula['range'] formula = array_formula['formula'] ws[cell_ref].value = formula ws[cell_ref].data_type = 'f' ws[cell_ref].array_formula = cell_ref # Makes it a proper array formula ``` ## Decision Point **Question for Discussion**: Should we implement the Python/openpyxl backend post-processor? **Considerations**: 1. Do we already have Python backend infrastructure? 2. Is adding a Python service acceptable for the architecture? 3. What's the performance requirement for Excel exports? 4. How critical are these Excel features for users? ## Next Steps 1. **If YES to backend post-processing**: - Design API contract - Set up Python service - Implement fixes - Test with various Excel files 2. **If NO to backend post-processing**: - Document limitations clearly - Provide frontend post-processing utility - Consider replacing ExcelJS in future versions ## Conclusion The core import/export functionality works well after v0.1.38 fixes. The remaining issues are due to ExcelJS limitations that can't be fixed within the library itself. Backend post-processing with openpyxl offers a robust solution that would provide 100% Excel compatibility without replacing the entire export infrastructure.