node-email-reply-parser
Version:
Parses emails to remove replies and other potentially unwanted data
141 lines (92 loc) • 3.57 kB
Markdown
# node-email-reply-parser
[](https://www.npmjs.com/package/node-email-reply-parser)
A port of willdurand/EmailReplyParser (which is a port of github/email_reply_parser) to nodejs
willdurand's port of the Github library supports multiple spoken languages and has some enhanced parsing and therefore was used as a template for this library.
# Installing
This package can be found on [npm](https://www.npmjs.com/):
```
npm install node-email-reply-parser
```
**Note**: This package is ES6 compatible and does *not* transpile automatically.
# Usage
```
var replyParser = require("node-email-reply-parser");
var email = replyParser(emailContent);
```
`email` has two primary methods:
* `getFragments()`: returns all the fragments of the email
* `getVisibleText()`: returns the text that is considered 'visible'
`getVisibleText()` accepts an optional options object:
```
getVisibleText({aggressive: true})
```
Setting `aggressive` to true will make the parser assume that any fragment which is not hidden, but which is both preceded and followed by a hidden fragment, should not be visible. This works around the issue of clients breaking quoted text into multiple lines (e.g. Gmail).
Using aggressive mode runs the risk of losing visible lines which are interspersed with quoted lines, but is useful when parsing e.g. emails from a 'reply by email' feature which contain a large block of quoted text.
A fragment has the following functions:
* `getContent()`: returns the content of the fragment
* `isSignature()`: returns whether or not the fragment is likely a signature
* `isQuoted()`: returns whether or not the fragment is likely a quote
* `isHidden()`: returns whether or not the text is considered 'hidden'
* `isEmpty()`: returns whether or not the fragment has any content
Passing `true` as the second argument to `replyParser` will have the return value be just the visible text:
```
$visibleText = replyParser($emailContent, true);
```
# Known Issues
The parser is not able to pick up some of the edge cases. They are outlined below.
## Quoted Headers
Quoted headers aren't picked up if there's an extra line break:
```
On <date>, <author> wrote:
> blah
```
They also aren't picked up if the email client breaks it up into multiple lines, like gmail and it's 80 column automatic limit:
```
On <date>, <author>
wrote:
> blah
```
## Weird Signatures
Lines starting with `-` or `_` sometimes mark the beginning of signatures:
```
Hello
--
Rick
```
Not everyone follows this convention:
```
Hello
Mr Rick Olson
Galactic President Superstar Mc Awesomeville
GitHub
**********************DISCLAIMER***********************************
* Note: blah blah blah *
**********************DISCLAIMER***********************************
```
## Strange Quoting
Apparently, prefixing lines with `>` isn't universal either:
```
Hello
--
Rick
________________________________________
From: Bob [reply@reply.github.com]
Sent: Monday, March 14, 2011 6:16 PM
To: Rick
```
# Unit Tests
Install the required dependencies:
```
npm install --dev
```
Run the tests:
```
npm test
```
# Contributing
Please see the CONTRIBUTING.md file.
# Credits
* GitHub - Amazing Ruby gem for parsing emails
* William Durand [william.durand1@gmail.com](mailto:william.durand1@gmail.com) - Extensive PHP version of GitHub's library
# License
node-email-reply-parser is released under the MIT Licenses. Please see the bundled LICENSE file for details.