sentence-parse
Version:
A simple utility to parse text into sentences
106 lines (80 loc) • 3.13 kB
Markdown
# 📄 Sentence Parse
A simple utility to parse text into sentences.

## Installation
```bash
npm install sentence-parse
```
## Usage
The parser can be used to split text into sentences with various options. Here's a basic example:
```javascript
import { parseSentences } from 'sentence-parse';
// Parse from string
const text = "Hello world! This is a test.";
const sentences = await parseSentences(text);
console.log(sentences);
// Output: ["Hello world!", "This is a test."]
// Parse from file
import { readFile } from 'fs/promises';
import { join } from 'path';
const fileText = await readFile(join(process.cwd(), 'text-file.txt'), 'utf8');
const fileSentences = await parseSentences(fileText);
console.log(fileSentences);
```
### Options
- **observeMultipleLineBreaks**: Treats two or more consecutive line breaks as separate sentences. Default is `false`.
- **removeStartLineSequences**: Removes specified sequences at the start of each line. Default is an empty array `[]`.
- **preserveHTMLBreaks**: Preserves HTML `<br>` and `<p>` tags as line breaks in the text. Default is `true`.
- **preserveListItems**: Preserves list items by adding a prefix to each `<li>` element. Default is `true`.
- **listItemPrefix**: Specifies the prefix to use for list items when `preserveListItems` is `true`. Default is `'- '`.
- **excludeNonLetterSentences**: Excludes segments that contain no letters (only numbers, symbols, etc). Default is `false`.
## Examples
### Using observeMultipleLineBreaks
```javascript
import { parseSentences } from 'sentence-parse';
const text = "Hello world!\n\nThis is a test.";
const sentences = await parseSentences(text, { observeMultipleLineBreaks: true });
console.log(sentences);
// Output: ["Hello world!", "This is a test."]
```
### Using removeStartLineSequences
```javascript
import { parseSentences } from 'sentence-parse';
const text = "> Hello world!\n> This is a test.";
const sentences = await parseSentences(text, { removeStartLineSequences: ['>'] });
console.log(sentences);
// Output: ["Hello world!", "This is a test."]
```
### Using HTML Options
```javascript
import { parseSentences } from 'sentence-parse';
const htmlText = `
<p>Hello world!<br>This is a test.</p>
<ul>
<li>First item</li>
<li>Second item</li>
</ul>
`;
const sentences = await parseSentences(htmlText, {
preserveHTMLBreaks: true,
preserveListItems: true,
listItemPrefix: '* '
});
console.log(sentences);
// Output: ["Hello world!", "This is a test.", "* First item", "* Second item"]
```
### Using excludeNonLetterSentences
```javascript
import { parseSentences } from 'sentence-parse';
const text = "Hello world! $4,000,000. This is a test.";
const sentences = await parseSentences(text, { excludeNonLetterSentences: true });
console.log(sentences);
// Output: ["Hello world!", "This is a test."]
```
## Example
Check out `example/example.js` for a working example that parses sentences from a text file.
Run the example:
```bash
cd example
node example
```