UNPKG

smart-csv-delimiter

Version:

Intelligent CSV delimiter auto-detection for Node.js - lightweight, fast, and reliable

182 lines (129 loc) • 5.58 kB
# smart-csv-delimiter šŸŽÆ **Intelligent CSV delimiter auto-detection for Node.js** - lightweight, fast, and reliable. [![npm version](https://img.shields.io/npm/v/smart-csv-delimiter.svg)](https://www.npmjs.com/package/smart-csv-delimiter) [![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](https://opensource.org/licenses/MIT) [![TypeScript](https://img.shields.io/badge/TypeScript-Ready-blue.svg)](https://www.typescriptlang.org/) ## ✨ Features - šŸš€ **Fast & Lightweight** - Zero dependencies, pure Node.js - šŸŽÆ **Smart Detection** - Analyzes consistency across multiple lines - šŸ“Š **Confidence Scores** - Get detailed detection results with confidence metrics - šŸ”§ **Highly Configurable** - Custom delimiters, sample size, and more - šŸ’Ŗ **TypeScript First** - Full type definitions included - āœ… **Well Tested** - Comprehensive test suite with 80%+ coverage ## šŸ“¦ Installation ```bash npm install smart-csv-delimiter ``` ## šŸš€ Quick Start ```typescript import { detectDelimiter } from 'smart-csv-delimiter'; const delimiter = await detectDelimiter('./data.csv'); console.log(`Detected delimiter: ${delimiter}`); // ',' or ';' or '|' or '\t' ``` ## šŸ“– Usage ### Basic Detection ```typescript import { detectDelimiter } from 'smart-csv-delimiter'; // Simple detection const delimiter = await detectDelimiter('./file.csv'); // Returns: ',' | ';' | '|' | '\t' | null ``` ### Detailed Detection with Confidence ```typescript import { detectDelimiterWithDetails } from 'smart-csv-delimiter'; const result = await detectDelimiterWithDetails('./file.csv'); console.log(result); // { // delimiter: ',', // confidence: 0.95, // occurrences: 150, // allScores: { // ',': 15.2, // ';': 0, // '|': 0, // '\t': 0 // } // } ``` ### Advanced Configuration ```typescript import { CsvDelimiterDetector } from 'smart-csv-delimiter'; const detector = new CsvDelimiterDetector({ sampleSize: 20, // Analyze first 20 lines (default: 10) encoding: 'latin1', // File encoding (default: 'utf-8') customDelimiters: ['#', '@'], // Additional delimiters to test customOnly: false, // If true, only test custom delimiters }); const delimiter = await detector.detect('./file.csv'); ``` ## šŸŽ“ API Reference ### `detectDelimiter(filePath, options?)` Detects the CSV delimiter and returns the most likely one. **Parameters:** - `filePath` (string): Path to the CSV file - `options` (DetectionOptions, optional): Configuration options **Returns:** `Promise<string | null>` - The detected delimiter or null ### `detectDelimiterWithDetails(filePath, options?)` Detects the delimiter and returns detailed analysis. **Parameters:** - `filePath` (string): Path to the CSV file - `options` (DetectionOptions, optional): Configuration options **Returns:** `Promise<DetectionResult>` ```typescript interface DetectionResult { delimiter: string | null; // The detected delimiter confidence: number; // Confidence score (0-1) occurrences: number; // Total occurrences found allScores: Record<string, number>; // Scores for all tested delimiters } ``` ### `CsvDelimiterDetector` Class-based API for reusable detection with custom configuration. **Constructor Options:** ```typescript interface DetectionOptions { sampleSize?: number; // Lines to analyze (default: 10) customDelimiters?: string[]; // Additional delimiters to test customOnly?: boolean; // Only test custom delimiters (default: false) encoding?: BufferEncoding; // File encoding (default: 'utf-8') } ``` **Methods:** - `detect(filePath: string): Promise<string | null>` - `detectWithDetails(filePath: string): Promise<DetectionResult>` ## šŸ” Supported Delimiters By default, `smart-csv-delimiter` detects these common delimiters: | Delimiter | Character | Name | |-----------|-----------|------| | `,` | Comma | Standard CSV | | `;` | Semicolon | European CSV | | `\|` | Pipe | Log files | | `\t` | Tab | TSV files | You can add custom delimiters using the `customDelimiters` option. ## šŸ’” How It Works The detector uses a smart algorithm that: 1. **Samples Multiple Lines** - Reads the first N lines (configurable) 2. **Counts Occurrences** - Counts each delimiter in every line 3. **Checks Consistency** - Ensures the delimiter count is consistent across lines 4. **Calculates Score** - Combines frequency and consistency for a confidence score 5. **Returns Best Match** - Selects the delimiter with the highest score This approach is more reliable than single-line detection and handles edge cases like: - Delimiters appearing in data values - Inconsistent formatting - Mixed content ## šŸŽÆ Use Cases - **Data Import Tools** - Automatically handle various CSV formats - **ETL Pipelines** - Process CSV files without manual configuration - **Data Analysis** - Prepare datasets from unknown sources - **File Validation** - Verify CSV structure before parsing - **Multi-Format Support** - Build tools that work with any CSV variant ## šŸ¤ Contributing Contributions are welcome! Please feel free to submit a Pull Request. ## šŸ“„ License MIT Ā© [ReDodge](https://github.com/ReDodge) ## šŸ”— Links - [GitHub Repository](https://github.com/ReDodge/smart-csv-delimiter) - [NPM Package](https://www.npmjs.com/package/smart-csv-delimiter) - [Report Issues](https://github.com/ReDodge/smart-csv-delimiter/issues) ## ⭐ Show Your Support If this package helped you, please give it a ⭐ on [GitHub](https://github.com/ReDodge/smart-csv-delimiter)!