UNPKG

free-text-json-parser

Version:

High-performance parser for extracting JSON objects and arrays from free text

274 lines (200 loc) 6.96 kB
# free-text-json-parser [![npm version](https://badge.fury.io/js/free-text-json-parser.svg)](https://badge.fury.io/js/free-text-json-parser) [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT) A robust, high-performance parser for extracting JSON objects and arrays from free text. Built with Nearley.js and the Moo lexer, this parser can find and extract valid JSON structures embedded anywhere in text documents. ## Features - **🔍 Smart JSON extraction** - Finds all valid JSON objects and arrays in any text - **⚡ High performance** - Handles deeply nested structures (5000+ levels) efficiently - **🛡️ Battle-tested** - Comprehensive test suite with 100+ tests covering edge cases - **📦 Zero runtime dependencies** - Lightweight, standalone package - **🎯 Flexible API** - Multiple methods for different extraction needs - **💻 CLI tool included** - Use from command line or as a module - **🔧 Production ready** - Used in production environments processing millions of documents ## Installation ```bash npm install free-text-json-parser ``` Or using yarn/pnpm: ```bash yarn add free-text-json-parser pnpm add free-text-json-parser ``` ## Quick Start ```javascript import FreeTextJsonParser from 'free-text-json-parser'; const parser = new FreeTextJsonParser(); const text = 'User {"name": "Alice", "age": 30} logged in at ["10:30", "10:45"]'; const jsonData = parser.extractJson(text); console.log(jsonData); // Output: [{"name": "Alice", "age": 30}, ["10:30", "10:45"]] ``` ## Usage ### As a Module ```javascript import FreeTextJsonParser from 'free-text-json-parser'; const parser = new FreeTextJsonParser(); // Example text with embedded JSON const input = 'Hello world {"name": "Alice", "age": 30} and [1, 2, 3] more text'; // Get only JSON values const jsonOnly = parser.extractJson(input); console.log(jsonOnly); // [{"name": "Alice", "age": 30}, [1, 2, 3]] // Get structured output const structured = parser.parseStructured(input); console.log(structured); // { // elements: [...], // All parsed elements // json: [...], // JSON values only // text: [...], // Text segments only // summary: {textElements: 4, jsonElements: 2} // } ``` ### As a CLI Tool ```bash # Install globally npm install -g free-text-json-parser # Parse from arguments free-text-json 'Text with {"json": true} data' # Parse from stdin echo 'Some {"json": true} content' | free-text-json # Parse files cat document.txt | free-text-json ``` ## API Reference ### `parse(input: string): Array<Element>` Returns raw parsed array of elements with type information. ```javascript const result = parser.parse('Text {"json": true} more'); // [{type: 'text', value: 'Text'}, {type: 'json', value: {json: true}}, ...] ``` ### `extractJson(input: string): Array<any>` Returns only the JSON values found in the text. ```javascript const json = parser.extractJson('Found: {"id": 1} and [1,2,3]'); // [{id: 1}, [1, 2, 3]] ``` ### `extractText(input: string): Array<string>` Returns only the text segments, excluding JSON. ```javascript const text = parser.extractText('Hello {"hidden": true} world'); // ['Hello', 'world'] ``` ### `parseStructured(input: string): StructuredResult` Returns comprehensive structured output. ```javascript const result = parser.parseStructured('Text {"data": 123} more [4,5]'); // { // elements: [...], // All elements with types // text: ['Text', 'more'], // Text only // json: [{data: 123}, [4, 5]], // JSON only // summary: {textElements: 2, jsonElements: 2} // } ``` ## Advanced Examples ### Complex Nested Structures ```javascript const complexText = ` API Response: { "user": { "profile": { "name": "Jane", "settings": { "theme": "dark", "notifications": true } } }, "timestamp": "2024-01-01T00:00:00Z" } Status: Success `; const data = parser.extractJson(complexText); // Extracts complete nested structure ``` ### Handling Mixed Content ```javascript const htmlWithJson = ` <div data-config='{"enabled": true, "level": 5}'> Script data: {"userId": 123, "permissions": ["read", "write"]} </div> `; const extracted = parser.extractJson(htmlWithJson); // Finds and extracts embedded JSON from HTML ``` ### Processing Logs ```javascript const logText = ` 2024-01-01 10:00:00 INFO Starting process 2024-01-01 10:00:01 DATA {"event": "user_login", "userId": 42} 2024-01-01 10:00:02 ERROR {"error": "Connection timeout", "code": 500} `; const events = parser.extractJson(logText); // Extracts all JSON event data from logs ``` ## Performance The parser is highly optimized and battle-tested: | Scenario | Performance | |----------|------------| | 1,000 simple JSON objects | ~1-2ms | | 100-level deep nesting | <1ms | | 5,000-level deep nesting | ~3ms | | 50 complex objects with HTML | ~2-3ms | | 10,000 character strings | <1ms | ### Capabilities - ✅ Handles deeply nested objects (tested up to 5,000 levels) - ✅ Processes large documents with 50+ JSON objects - ✅ Manages objects with 1,000+ keys - ✅ Handles strings with special characters, HTML, escaped JSON - ✅ Thread-safe for concurrent parsing ## Development ### Setup ```bash # Clone repository git clone https://github.com/artpar/text-free-json.git cd free-text-json-parser # Install dependencies pnpm install # Build parser pnpm run build ``` ### Testing ```bash # Run all tests pnpm test # Run specific test file pnpm test:run tests/parser.test.js # Run with coverage pnpm test:coverage # Watch mode for development pnpm test:watch ``` ### Building ```bash # Compile grammar pnpm run build # Create production bundle pnpm run build:bundle # Development build with examples pnpm run dev ``` ## Use Cases - **Log Analysis** - Extract structured data from application logs - **Data Migration** - Parse mixed format documents - **API Response Processing** - Extract JSON from HTML/text responses - **Chat/LLM Processing** - Extract structured data from conversational text - **Configuration Parsing** - Find JSON configs in documentation - **Web Scraping** - Extract JSON-LD and embedded data from HTML ## Contributing Contributions are welcome! Please feel free to submit a Pull Request. 1. Fork the repository 2. Create your feature branch (`git checkout -b feature/amazing-feature`) 3. Commit your changes (`git commit -m 'Add amazing feature'`) 4. Push to the branch (`git push origin feature/amazing-feature`) 5. Open a Pull Request ## License MIT © Artpar ## Acknowledgments Built with: - [Nearley.js](https://nearley.js.org/) - Parser toolkit for JavaScript - [Moo](https://github.com/no-context/moo) - Friendly lexer generator ## Support For issues, questions, or suggestions, please [open an issue](https://github.com/artpar/text-free-json/issues).