smart-csv-delimiter
Version:
Intelligent CSV delimiter auto-detection for Node.js - lightweight, fast, and reliable
182 lines (129 loc) ⢠5.58 kB
Markdown
# smart-csv-delimiter
šÆ **Intelligent CSV delimiter auto-detection for Node.js** - lightweight, fast, and reliable.
[](https://www.npmjs.com/package/smart-csv-delimiter)
[](https://opensource.org/licenses/MIT)
[](https://www.typescriptlang.org/)
## ⨠Features
- š **Fast & Lightweight** - Zero dependencies, pure Node.js
- šÆ **Smart Detection** - Analyzes consistency across multiple lines
- š **Confidence Scores** - Get detailed detection results with confidence metrics
- š§ **Highly Configurable** - Custom delimiters, sample size, and more
- šŖ **TypeScript First** - Full type definitions included
- ā
**Well Tested** - Comprehensive test suite with 80%+ coverage
## š¦ Installation
```bash
npm install smart-csv-delimiter
```
## š Quick Start
```typescript
import { detectDelimiter } from 'smart-csv-delimiter';
const delimiter = await detectDelimiter('./data.csv');
console.log(`Detected delimiter: ${delimiter}`); // ',' or ';' or '|' or '\t'
```
## š Usage
### Basic Detection
```typescript
import { detectDelimiter } from 'smart-csv-delimiter';
// Simple detection
const delimiter = await detectDelimiter('./file.csv');
// Returns: ',' | ';' | '|' | '\t' | null
```
### Detailed Detection with Confidence
```typescript
import { detectDelimiterWithDetails } from 'smart-csv-delimiter';
const result = await detectDelimiterWithDetails('./file.csv');
console.log(result);
// {
// delimiter: ',',
// confidence: 0.95,
// occurrences: 150,
// allScores: {
// ',': 15.2,
// ';': 0,
// '|': 0,
// '\t': 0
// }
// }
```
### Advanced Configuration
```typescript
import { CsvDelimiterDetector } from 'smart-csv-delimiter';
const detector = new CsvDelimiterDetector({
sampleSize: 20, // Analyze first 20 lines (default: 10)
encoding: 'latin1', // File encoding (default: 'utf-8')
customDelimiters: ['#', '@'], // Additional delimiters to test
customOnly: false, // If true, only test custom delimiters
});
const delimiter = await detector.detect('./file.csv');
```
## š API Reference
### `detectDelimiter(filePath, options?)`
Detects the CSV delimiter and returns the most likely one.
**Parameters:**
- `filePath` (string): Path to the CSV file
- `options` (DetectionOptions, optional): Configuration options
**Returns:** `Promise<string | null>` - The detected delimiter or null
### `detectDelimiterWithDetails(filePath, options?)`
Detects the delimiter and returns detailed analysis.
**Parameters:**
- `filePath` (string): Path to the CSV file
- `options` (DetectionOptions, optional): Configuration options
**Returns:** `Promise<DetectionResult>`
```typescript
interface DetectionResult {
delimiter: string | null; // The detected delimiter
confidence: number; // Confidence score (0-1)
occurrences: number; // Total occurrences found
allScores: Record<string, number>; // Scores for all tested delimiters
}
```
### `CsvDelimiterDetector`
Class-based API for reusable detection with custom configuration.
**Constructor Options:**
```typescript
interface DetectionOptions {
sampleSize?: number; // Lines to analyze (default: 10)
customDelimiters?: string[]; // Additional delimiters to test
customOnly?: boolean; // Only test custom delimiters (default: false)
encoding?: BufferEncoding; // File encoding (default: 'utf-8')
}
```
**Methods:**
- `detect(filePath: string): Promise<string | null>`
- `detectWithDetails(filePath: string): Promise<DetectionResult>`
## š Supported Delimiters
By default, `smart-csv-delimiter` detects these common delimiters:
| Delimiter | Character | Name |
|-----------|-----------|------|
| `,` | Comma | Standard CSV |
| `;` | Semicolon | European CSV |
| `\|` | Pipe | Log files |
| `\t` | Tab | TSV files |
You can add custom delimiters using the `customDelimiters` option.
## š” How It Works
The detector uses a smart algorithm that:
1. **Samples Multiple Lines** - Reads the first N lines (configurable)
2. **Counts Occurrences** - Counts each delimiter in every line
3. **Checks Consistency** - Ensures the delimiter count is consistent across lines
4. **Calculates Score** - Combines frequency and consistency for a confidence score
5. **Returns Best Match** - Selects the delimiter with the highest score
This approach is more reliable than single-line detection and handles edge cases like:
- Delimiters appearing in data values
- Inconsistent formatting
- Mixed content
## šÆ Use Cases
- **Data Import Tools** - Automatically handle various CSV formats
- **ETL Pipelines** - Process CSV files without manual configuration
- **Data Analysis** - Prepare datasets from unknown sources
- **File Validation** - Verify CSV structure before parsing
- **Multi-Format Support** - Build tools that work with any CSV variant
## š¤ Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
## š License
MIT Ā© [ReDodge](https://github.com/ReDodge)
## š Links
- [GitHub Repository](https://github.com/ReDodge/smart-csv-delimiter)
- [NPM Package](https://www.npmjs.com/package/smart-csv-delimiter)
- [Report Issues](https://github.com/ReDodge/smart-csv-delimiter/issues)
## ā Show Your Support
If this package helped you, please give it a ā on [GitHub](https://github.com/ReDodge/smart-csv-delimiter)!