UNPKG

mathpix-markdown-it

Version:

Mathpix-markdown-it is an open source implementation of the mathpix-markdown spec written in Typescript. It relies on the following open source libraries: MathJax v3 (to render math with SVGs), markdown-it (for standard Markdown parsing)

150 lines (108 loc) 4.58 kB
# PR: Add itemize/enumerate support inside tabular cells Status: Implemented Owner: @OlgaRedozubova Issue: #17328 --- ## Context LaTeX list environments (`\begin{itemize}` / `\begin{enumerate}`) are block-level constructs. The tabular cell parser in mathpix-markdown-it treated cell content as inline-only, causing list environments inside table cells to be ignored or rendered as plain text. **Impact:** - Lists in table cells did not render in HTML - Export formats (TSV, CSV, Markdown) lost list structure entirely - Nested tables were sometimes double-processed during export ## Goal - Render `itemize` and `enumerate` lists correctly inside tabular cells - Support nested lists with proper marker styles per level - Preserve custom markers (`\item[X]`) and empty markers (`\item[]`) - Export list structure to all formats (HTML, Markdown, TSV, CSV) - Prevent nested tables from being exported as separate top-level elements ## Non-Goals - Changing list rendering outside tabular context - Supporting other LaTeX block environments inside cells (e.g., `figure`, `verbatim`) - API or interface changes ## Example ### Input LaTeX ```latex \begin{tabular}{l} \begin{itemize} \item First item \item Second item \end{itemize} \end{tabular} ``` ### Before (broken) - **HTML**: List content rendered as plain text or dropped - **Markdown export**: Empty or malformed cell - **TSV/CSV export**: Missing list content ### After (fixed) **HTML output** (conceptual): ```html <table> <tr> <td> <ul class="itemize"> <li><span class="li_level">•</span>First item</li> <li><span class="li_level">•</span>Second item</li> </ul> </td> </tr> </table> ``` **Markdown export**: ``` | • First item<br>• Second item | ``` **TSV/CSV export**: ``` "• First item • Second item" ``` ### Nested list markers | Level | Itemize | Enumerate | |-------|---------|-----------| | 1 | • | 1. 2. 3. | | 2 | – | a. b. c. | | 3 | * | i. ii. iii. | | 4 | · | A. B. C. | ## Why This Approach ### Why conditional block parsing? LaTeX lists are block-level constructs that require the block parser to run. Tabular cells normally use inline-only parsing for performance and simplicity. When a list environment is detected in a cell, we switch to block parsing for that cell only. ### Why placeholder + newline injection? Before parsing a cell, we replace complex nested content (math, sub-tables, code blocks) with UUID placeholders to prevent them from breaking row/column splitting. When re-injecting this content, if it contains block-level LaTeX (like a list), we must ensure it's surrounded by newlines so the block parser recognizes it correctly. ### Why nested table filtering? When exporting via `parseMarkdownByElement()`, nested `.table_tabular` elements were being collected alongside their parents, causing duplicate processing. We now filter out any table that has a `.table_tabular` ancestor. ## Approach 1. **Detect list environments in cells** — scan cell content for `\begin{itemize}` or `\begin{enumerate}` 2. **Switch to block parsing** — if detected, parse the cell with block rules enabled 3. **Placeholder safety** — inject newlines around block content placeholders before parsing 4. **Render lists** — produce proper `<ul>`/`<ol>` HTML with level-appropriate markers 5. **Export handling** — format list items with `<br>` (Markdown) or `\n` (TSV/CSV) separators 6. **Filter nested tables** — exclude nested `.table_tabular` from top-level export collection ## Bug Fixes Included - `<br>` escaping when preceding text ends with backslash - Underline/strikeout formatting inside tabular cells - Nested tabular appearing on same line as list content - Custom enumerate markers (e.g., `\textbf{1.}`) now preserved - Removed problematic `text-indent` CSS for empty markers ## Constraints - List rendering outside tabular must remain unchanged - Existing tabular tests must pass - Placeholder mechanism must not break table structure - Custom markers must be preserved exactly ## Testing **New test coverage:** - Lists inside tabular cells (itemize, enumerate, nested) - Custom and empty markers - Export format validation (TSV, CSV, Markdown) - Nested tabular with lists **Commands:** ```bash npm test npm run build ``` ## Risk / Rollback **Risk**: Medium - Modifies cell parsing and placeholder handling - May affect edge cases in complex nested tables **Rollback**: Revert PR or pin to version 2.0.31 ## Related - Issue: #17328 — [mmd][mmd-converter] Add itemize support inside tabular