confluence-to-markdown
Version:
Convert Confluence Pages to Markdown
92 lines (59 loc) • 3.61 kB
Markdown
# Confluence to Markdown converter which is actually working
Convert [Confluence HTML export](#conflhowto) to Markdown
## Requirements
You must have [pandoc] command line tool installed. Check it by running:
```
pandoc --version
```
Install all project dependencies:
```
npm install
```
## Usage
In the converter's directory:
```
npm run start <pathResource> <pathResult>
```
### Parameters
parameter | description
--- | ---
`<pathResource>` | File or directory to convert with extracted Confluence export
`<pathResult>` | Directory to where the output will be generated to. Defaults to current working directory
## Process description<a name="process-description"></a>
- Confluence page IDs in HTML file names and links are replaced with that pages' heading
- overall index.md is created linking all Confluence spaces - their indexes
- images and other inserted attachments are linked to generated markdown
- whole `images` and `attachments` directories are copied to resulting directory
- there is no checking done whether perticular file/image is used or not
- markdown links to internal pages are generated without the trailing **.md** extension to comply to [gitit] expectations
- this can be changed by finding all occurances of `gitit requires link to pages without .md extension` in the `.coffee` files and adding the extension there.
- or you can send a PR ;)
- the pandoc utility can accept quite a few options to alter its default behavior
- those can be passed to it by adding them to `@outputTypesAdd`, `@outputTypesRemove`, `@extraOptions` properties in the [`App.coffee`](src/App.coffee) file
- or you can send a PR ;)
- here is the [list of options][pandoc-options] pandoc can accept
- throughout the application a single console logger is used, its default verbosity is set to INFO
- you can change the verbosity to one of DEBUG, INFO, WARNING, ERROR levels in the [`Logger.coffee`](src/App.coffee) file
- or you can send a PR ;)
- a series of formatter rules is applied to the HTML text of Confluence page for it to be converted properly
- you can view and/or change them in the [`Page.coffee`](src/Page.coffee) file
- the rules themselves are located in the [`Formatter.coffee`](src/Formatter.coffee) file
### Room for improvement
If you happen to find something not to your liking, you are welcome to send a PR. Some good starting points are mentioned in the [Process description](#process-description) section above.
### Export to HTML
Note that if the converter does not know how to handle a style, HTML to Markdown typically just leaves the HTML untouched (Markdown does allow for HTML tags).
## Step by step guide for Confluence data export<a name="conflhowto"></a>
1. Go to the space and choose `Space tools > Content Tools on the sidebar`.
2. Choose Export. This option will only be visible if you have the **Export Space** permission.
3. Select HTML then choose Next.
4. Decide whether you need to customize the export:
- Select Normal Export to produce an HTML file containing all the pages that you have permission to view.
- Select Custom Export if you want to export a subset of pages, or to exclude comments from the export.
5. Extract zip
**WARNING**
Please note that Blog will **NOT** be exported to HTML. You have to copy it manually or export it to XML or PDF. But those format cannot be processed by this utility.
# Attribution
Thanks to Eric White for a starting point.
[pandoc]: http://pandoc.org/installing.html
[pandoc-options]: http://hackage.haskell.org/package/pandoc
[gitit]: https://github.com/jgm/gitit/