@mertdeveci55/univer-import-export
Version:
Excel/CSV import and export library for Univer spreadsheets with full format preservation
205 lines (158 loc) • 7.23 kB
Markdown
# Import/Export Issues - Learnings and Plan
## Executive Summary
Version 0.1.38 fixed critical style/border import issues, but fundamental limitations in the ExcelJS library prevent complete Excel compatibility. Two major issues remain that require post-processing to resolve.
## What's Fixed in v0.1.38 ✅
### 1. Border Styles Import (FIXED)
- **Problem**: 0 border styles were being imported (should be ~37)
- **Root Cause**: `UniverWorkBook` never populated `this.styles` registry
- **Solution**: Added `collectStyles()` method to collect all unique styles from cells
- **Result**: Now correctly imports and exports 37 border styles
## Remaining Issues ❌
### IMPORT (Working Fine)
- ✅ Defined names are imported correctly (stored in `resources` array)
- ✅ Array formulas are imported correctly
- ✅ Styles and borders now collected properly
### EXPORT (Has Issues)
#### 1. Defined Names Not Exported
- **Symptom**: Excel files lose all defined names (e.g., "capexswitch", "circ", etc.)
- **Root Cause**: ExcelJS's `definedNames.add()` API is completely broken
```javascript
workbook.definedNames.add('TestName', 'Sheet1!$A$1');
console.log(workbook.definedNames.model); // Returns: [] (empty!)
```
- **Impact**: Named ranges don't work in exported Excel files
#### 2. Array Formula Attributes Missing
- **Symptom**: TRANSPOSE formulas missing `t="array"` and `ref="range"` XML attributes
- **Root Cause**: ExcelJS doesn't support modern Excel dynamic array formulas
- **Dilemma**:
- Using `fillFormula()` → adds @ symbols (breaks formulas)
- Not using it → missing required XML attributes
- **Impact**: Array formulas may not spill correctly in Excel
## Root Cause Analysis
### The Core Problem: ExcelJS Library Limitations
ExcelJS (@zwight/exceljs@4.4.2) has fundamental limitations:
1. **Broken APIs**: `definedNames.add()` doesn't actually add anything
2. **Outdated Excel Support**: No proper support for Excel 365 dynamic arrays
3. **No Post-Processing Hooks**: Can't modify XML before final output
### Why This Matters
Modern Excel files (Excel 365/2021) use features that ExcelJS was never designed to handle:
- Dynamic array formulas (TRANSPOSE, FILTER, SORT, etc.)
- Spill ranges
- Complex defined names with workbook references
## Potential Solutions
### Option 1: Frontend Post-Processing (Current Workaround)
```javascript
// After ExcelJS export
const zip = new JSZip();
await zip.loadAsync(exportBuffer);
// Modify XML files directly
// Re-zip and download
```
**Pros**: Works now, proven in test files
**Cons**: Performance overhead, happens client-side
### Option 2: Backend Post-Processing with Python/openpyxl 🐍
**Concept**: Use Python's openpyxl library to fix the Excel file after ExcelJS export
```python
# Backend endpoint (Python/Flask/FastAPI)
from openpyxl import load_workbook
import io
def fix_excel_export(excel_buffer, univer_data):
# Load the ExcelJS-generated file
wb = load_workbook(io.BytesIO(excel_buffer))
# Fix 1: Add defined names
for name, ref in univer_data['defined_names'].items():
wb.defined_names.append(DefinedName(name, attr_text=ref))
# Fix 2: Fix array formulas
for ws in wb.worksheets:
for row in ws.iter_rows():
for cell in row:
if cell.formula and 'TRANSPOSE' in cell.formula:
# openpyxl supports array formulas properly
cell.array_formula = cell.formula
# Return fixed buffer
output = io.BytesIO()
wb.save(output)
return output.getvalue()
```
**Pros**:
- openpyxl has excellent Excel compatibility
- Supports all modern Excel features
- Server-side processing (better performance)
- Python ecosystem for Excel is mature
**Cons**:
- Requires Python backend service
- Additional infrastructure complexity
### Option 3: Replace ExcelJS Entirely
**Alternative Libraries**:
1. **SheetJS (xlsx)**: More comprehensive but different API
2. **Direct XML manipulation**: Most control but more work
3. **Server-side only**: Use openpyxl for both import and export
## Recommended Approach
### Short Term (Immediate)
1. Keep v0.1.38 with border fixes
2. Document the limitations clearly
3. Provide post-processing examples for critical users
### Medium Term (Next Sprint)
**Implement Backend Post-Processing with openpyxl:**
1. **Architecture**:
```
Frontend → Univer → ExcelJS Export → Send to Backend → openpyxl Fix → Return Fixed File
```
2. **Backend Service** (Python/FastAPI):
- Endpoint: `POST /api/excel/post-process`
- Input: Excel buffer + Univer metadata (defined names, array formulas)
- Output: Fixed Excel file
- Processing: ~100ms for typical files
3. **Implementation Steps**:
- Set up Python microservice with openpyxl
- Create post-processing functions for each issue
- Add API endpoint in main backend
- Update frontend to use backend post-processing when available
### Long Term (Future)
Consider full migration away from ExcelJS to a more capable library or custom implementation.
## Technical Details for openpyxl Solution
### What openpyxl Can Fix:
1. ✅ **Defined Names**: Full support via `workbook.defined_names`
2. ✅ **Array Formulas**: Native support with proper XML generation
3. ✅ **Dynamic Arrays**: Understands Excel 365 spill ranges
4. ✅ **No @ Symbol Issues**: Properly handles modern formulas
5. ✅ **Preserves All Styles**: Better style preservation
### Sample Backend Implementation:
```python
# Fix defined names
for name_obj in univer_metadata.get('definedNames', []):
wb.defined_names.append(
DefinedName(
name=name_obj['name'],
attr_text=name_obj['formulaOrRefString']
)
)
# Fix array formulas
for sheet_id, sheet_data in univer_metadata.get('sheets', {}).items():
ws = wb[sheet_data['name']]
for array_formula in sheet_data.get('arrayFormulas', []):
cell_ref = array_formula['range']
formula = array_formula['formula']
ws[cell_ref].value = formula
ws[cell_ref].data_type = 'f'
ws[cell_ref].array_formula = cell_ref # Makes it a proper array formula
```
## Decision Point
**Question for Discussion**: Should we implement the Python/openpyxl backend post-processor?
**Considerations**:
1. Do we already have Python backend infrastructure?
2. Is adding a Python service acceptable for the architecture?
3. What's the performance requirement for Excel exports?
4. How critical are these Excel features for users?
## Next Steps
1. **If YES to backend post-processing**:
- Design API contract
- Set up Python service
- Implement fixes
- Test with various Excel files
2. **If NO to backend post-processing**:
- Document limitations clearly
- Provide frontend post-processing utility
- Consider replacing ExcelJS in future versions
## Conclusion
The core import/export functionality works well after v0.1.38 fixes. The remaining issues are due to ExcelJS limitations that can't be fixed within the library itself. Backend post-processing with openpyxl offers a robust solution that would provide 100% Excel compatibility without replacing the entire export infrastructure.