octocode-data-masker
Version:
A TypeScript library for masking sensitive data in strings, including PII, tokens, API keys, and more
488 lines (379 loc) • 15.1 kB
Markdown
# sensitive-data-masker
A high-performance TypeScript library for detecting and masking sensitive data in strings. Protect PII, API keys, tokens, credentials, and other confidential information with intelligent masking algorithms and configurable accuracy levels.
[](https://www.npmjs.com/package/sensitive-data-masker)
[](https://opensource.org/licenses/MIT)
[](https://www.typescriptlang.org/)
[](https://nodejs.org/)
## Features
- 🛡️ **200+ Detection Patterns**: Comprehensive coverage for modern security needs
- ⚡ **High Performance**: Optimized regex engine with pattern caching
- 🎯 **Accuracy Control**: Configure detection sensitivity (high/medium/low)
- 🔧 **Flexible Masking**: Smart partial masking that preserves readability
- 📦 **Zero Dependencies**: Lightweight and secure
- 🌍 **International Support**: Handles US, UK, Canadian, and international formats
- 🔍 **Pattern Filtering**: Include or exclude specific pattern types
- 📊 **Detailed Results**: Get match counts, positions, and masked values
## Installation
```bash
npm install sensitive-data-masker
```
```bash
yarn add sensitive-data-masker
```
## Quick Start
```typescript
import { mask, hasSensitiveContent, getPatternMatches } from 'sensitive-data-masker';
// Basic usage - intelligent partial masking
const text = 'My email is john@example.com and my SSN is 123-45-6789';
const result = mask(text);
console.log(result.output);
// "My email is **hn@example.c** and my SSN is **3-45-67**"
console.log(result.found);
// { email: 1, ssn: 1 }
// Check if content contains sensitive data
const isSensitive = hasSensitiveContent(text);
console.log(isSensitive); // true
// Get detailed pattern matches with positions
const matches = getPatternMatches(text);
console.log(matches);
// [
// {
// pattern: 'email',
// matches: [{ match: 'john@example.com', startIndex: 12, endIndex: 27 }]
// },
// {
// pattern: 'ssn',
// matches: [{ match: '123-45-6789', startIndex: 44, endIndex: 54 }]
// }
// ]
```
## API Reference
### `mask(input: string, options?: MaskingOptions): MaskResult`
Masks sensitive content in a string using intelligent partial masking.
#### Options
```typescript
interface MaskingOptions {
maskChar?: string; // Character used for masking (default: '*')
preserveLength?: boolean; // Preserve original length (default: false)
excludePatterns?: string[]; // Patterns to exclude from masking
onlyPatterns?: string[]; // Only mask these patterns
matchAccuracy?: 'high' | 'medium' | 'low'; // Detection sensitivity
}
```
#### Returns
```typescript
interface MaskResult {
output: string; // Masked string
found: { [name: string]: number }; // Count of each pattern found
matches: string[]; // Original matched values
masked: string[]; // Masked versions of matches
}
```
### `hasSensitiveContent(input: string, options?): boolean`
Quickly check if a string contains sensitive data without performing masking.
```typescript
import { hasSensitiveContent } from 'sensitive-data-masker';
hasSensitiveContent('user@example.com'); // true
hasSensitiveContent('hello world'); // false
// With options
hasSensitiveContent('sk-1234567890abcdef', {
matchAccuracy: 'high',
excludePatterns: ['genericId']
}); // true
```
### `getPatternMatches(input: string, options?): PatternMatch[]`
Get detailed information about all pattern matches including their positions.
```typescript
import { getPatternMatches } from 'sensitive-data-masker';
const matches = getPatternMatches('Contact: admin@test.com and key: sk-123abc');
console.log(matches);
// [
// {
// pattern: 'email',
// matches: [{ match: 'admin@test.com', startIndex: 9, endIndex: 22 }]
// },
// {
// pattern: 'openaiApiKey',
// matches: [{ match: 'sk-123abc', startIndex: 33, endIndex: 41 }]
// }
// ]
```
## Advanced Usage
### Custom Masking Options
```typescript
import { mask } from 'sensitive-data-masker';
// Custom masking character
const result = mask('API key: sk-1234567890abcdef', { maskChar: '#' });
console.log(result.output);
// "API key: ##-1234567890ab##"
// Preserve original length
const result2 = mask('secret123', { preserveLength: true });
console.log(result2.output);
// "*********" (full length masked)
// Use high accuracy mode (fewer false positives)
const result3 = mask('sk-1234567890abcdef', { matchAccuracy: 'high' });
console.log(result3.output);
// "##-1234567890ab##"
```
### Pattern Filtering
```typescript
// Only mask specific patterns
const result = mask('Email: user@test.com, API: sk-123', {
onlyPatterns: ['email', 'openaiApiKey']
});
// Exclude certain patterns
const result2 = mask('Email: user@test.com, UUID: 123e4567-e89b-12d3-a456-426614174000', {
excludePatterns: ['uuid', 'genericId']
});
// Combine with accuracy control
const result3 = mask(sensitiveText, {
matchAccuracy: 'high',
excludePatterns: ['uuid']
});
```
## Supported Pattern Categories
The library detects sensitive data across **25 categories** with **200+ patterns**:
### 🆔 Personal Identifiable Information (PII)
- Email addresses (multiple formats)
- Phone numbers (US, International, E.164)
- Social Security Numbers (US with various formats)
- Driver's license numbers, Medical record numbers
- Tax IDs (TIN/EIN), Canadian SIN, UK National Insurance Numbers
### ☁️ Cloud Provider Credentials
- **AWS**: Access keys, secret keys, session tokens, account IDs
- **AWS Resources**: EC2, S3, RDS, Lambda ARNs, VPC IDs
- **Azure**: Subscription IDs, client secrets, resource IDs
- **Google Cloud**: API keys, service account keys, project IDs
### 💳 Financial & Payment Services
- Credit card numbers (Visa, MasterCard, Amex, Discover)
- **Stripe**: Secret keys, publishable keys, webhook secrets
- **PayPal**: Access tokens, client IDs
- **Square**: Access tokens, application IDs
- Bank account numbers (US routing numbers, IBAN)
### 🤖 AI Provider Credentials
- **OpenAI**: API keys, organization IDs
- **Anthropic/Claude**: API keys
- **Google AI**: Gemini API keys, Vertex AI tokens
- **Hugging Face**: Access tokens, API keys
- **Other AI**: Groq, Perplexity, Replicate, Together AI
### 🔐 Authentication & Security
- JWT tokens, Bearer tokens
- OAuth access tokens, refresh tokens
- API keys in headers (`X-API-Key`, `Authorization`)
- Session IDs, CSRF tokens
- Generic secret patterns in environment variables
### 🔧 Developer Tools & Services
- **GitHub**: Personal access tokens, app tokens
- **Slack**: Bot tokens, webhook URLs, app secrets
- **Discord**: Bot tokens, webhook URLs
- **Analytics**: Google Analytics, Mixpanel, Amplitude
- **Monitoring**: Datadog, New Relic, Sentry keys
### 🗄️ Database & Storage
- Database connection strings (PostgreSQL, MySQL, MongoDB)
- **File Storage**: S3 bucket URLs, Azure Blob Storage
- **CDN**: CloudFront URLs, Azure CDN
- Redis connection strings, Elasticsearch URLs
### 🔑 Cryptographic Materials
- RSA private keys, SSH private keys
- EC private keys, DSA private keys
- X.509 certificates, PGP private key blocks
- JSON Web Keys (JWK), PKCS#8 keys
### 🌐 Network & Location
- IPv4/IPv6 addresses, MAC addresses
- Geographic coordinates (latitude/longitude)
- Private network ranges, subnet masks
- URL patterns with embedded secrets
### 📱 Communication Services
- **Messaging**: Twilio, SendGrid, Mailgun keys
- **Social Media**: Twitter, Facebook, Instagram tokens
- **Email Services**: Mailchimp, Postmark, SparkPost
- **SMS/Voice**: Nexmo, Plivo, MessageBird
### 🛠️ Infrastructure & DevOps
- **Container Registries**: Docker Hub, ECR, GCR tokens
- **CI/CD**: Jenkins, GitLab CI, CircleCI tokens
- **Deployment**: Vercel, Netlify, Heroku tokens
- **Monitoring**: PagerDuty, Datadog, New Relic
### 🏢 Enterprise & Business
- **CRM**: Salesforce, HubSpot tokens
- **E-commerce**: Shopify, WooCommerce keys
- **Business Tools**: Slack, Microsoft Teams tokens
- **Analytics**: Google Analytics, Adobe Analytics
### 🎯 Generic Patterns
- UUID v4, Generic IDs
- Base64 encoded secrets
- Hex-encoded keys (32, 64, 128 bit)
- Custom secret patterns in configuration files
### 🔍 URL & Reference Patterns
- URLs with embedded tokens
- Database connection URIs
- API endpoints with keys
- Webhook URLs with secrets
### 💾 Version Control & Code
- Git repository URLs with tokens
- Package manager tokens (npm, PyPI)
- Container registry credentials
- Code hosting platform tokens
## Pattern Accuracy Levels
Control detection sensitivity to balance between security and false positives:
### High Accuracy
- Most specific patterns with minimal false positives
- Examples: AWS access keys with `AKIA` prefix, specific API key formats
- Best for production environments
### Medium Accuracy (Default)
- Balanced detection with reasonable false positive rates
- Examples: Generic API keys, common secret patterns
- Good for most use cases
### Low Accuracy
- Broadest detection, may have higher false positive rates
- Examples: Generic IDs, loose pattern matching
- Useful for comprehensive scanning
```typescript
// Use high accuracy for production
const prodResult = mask(text, { matchAccuracy: 'high' });
// Use medium accuracy for development
const devResult = mask(text, { matchAccuracy: 'medium' });
// Use low accuracy for comprehensive scanning
const scanResult = mask(text, { matchAccuracy: 'low' });
```
## TypeScript Support
Full TypeScript support with complete type definitions:
```typescript
import { mask, hasSensitiveContent, getPatternMatches } from 'sensitive-data-masker';
import type { MaskResult, MaskingOptions } from 'sensitive-data-masker';
// Type-safe masking options
const options: MaskingOptions = {
maskChar: '#',
matchAccuracy: 'high',
excludePatterns: ['uuid']
};
const result: MaskResult = mask(text, options);
```
## Real-World Examples
### Log File Sanitization
```typescript
import { mask } from 'sensitive-data-masker';
const logEntry = `
[2024-01-15 10:30:45] INFO User john@company.com logged in
[2024-01-15 10:31:12] DEBUG API call with key sk-1234567890abcdef
[2024-01-15 10:31:15] ERROR Payment failed for card 4111-1111-1111-1111
[2024-01-15 10:31:20] WARN SSN in request: 123-45-6789
`;
const sanitized = mask(logEntry);
console.log(sanitized.output);
// [2024-01-15 10:30:45] INFO User **hn@company.c** logged in
// [2024-01-15 10:31:12] DEBUG API call with key **-1234567890ab**
// [2024-01-15 10:31:15] ERROR Payment failed for card **11-1111-1111-11**
// [2024-01-15 10:31:20] WARN SSN in request: **3-45-67**
console.log(sanitized.found);
// { email: 1, openaiApiKey: 1, creditCard: 1, ssn: 1 }
```
### Configuration File Security
```typescript
const config = `
DATABASE_URL=postgresql://user:password123@localhost:5432/db
OPENAI_API_KEY=sk-1234567890abcdef1234567890abcdef
STRIPE_SECRET_KEY=sk_live_abcdef123456
ADMIN_EMAIL=admin@company.com
JWT_SECRET=super-secret-key-123
`;
const result = mask(config);
console.log(result.output);
// DATABASE_URL=postgresql://user:**ssword1** @localhost:5432/db
// OPENAI_API_KEY=**-1234567890abcdef1234567890ab**
// STRIPE_SECRET_KEY=**_live_abcdef12**
// ADMIN_EMAIL=**min@company.c**
// JWT_SECRET=**per-secret-key-1**
```
### Multi-Environment Setup
```typescript
import { mask } from 'sensitive-data-masker';
// Production: Mask everything with high accuracy
const prodResult = mask(sensitiveData, { matchAccuracy: 'high' });
// Development: Allow test emails but mask real API keys
const devResult = mask(sensitiveData, {
matchAccuracy: 'medium',
excludePatterns: ['email']
});
// Testing: Only mask financial data
const testResult = mask(sensitiveData, {
onlyPatterns: ['creditCard', 'bankAccount', 'ssn'],
matchAccuracy: 'high'
});
```
### Data Pipeline Processing
```typescript
import { hasSensitiveContent, mask } from 'sensitive-data-masker';
// Check if data needs processing
function processBatch(records: string[]) {
const results = records.map(record => {
if (hasSensitiveContent(record)) {
const masked = mask(record, { matchAccuracy: 'high' });
return {
data: masked.output,
hadSensitiveData: true,
patternsFound: Object.keys(masked.found)
};
}
return { data: record, hadSensitiveData: false };
});
return results;
}
```
## Performance Considerations
- **Optimized Regex Engine**: Patterns are compiled and cached on first use
- **Single-Pass Processing**: Efficient string traversal with minimal overhead
- **Memory Efficient**: No unnecessary string copies or allocations
- **Pattern Filtering**: Use `onlyPatterns` when you know which types to look for
- **Accuracy Optimization**: Higher accuracy modes are faster due to more specific patterns
```typescript
// Optimize for specific use cases
const emailsOnly = mask(text, { onlyPatterns: ['email'] }); // Faster
const highAccuracy = mask(text, { matchAccuracy: 'high' }); // Faster, fewer false positives
const comprehensive = mask(text, { matchAccuracy: 'low' }); // Slower, more thorough
```
## Security Best Practices
1. **Always mask before logging**: Ensure sensitive data is masked before writing to logs
2. **Use appropriate accuracy**: Higher accuracy for production, lower for development/testing
3. **Store results securely**: The `matches` array contains original sensitive values
4. **Regular updates**: Keep the library updated for new pattern definitions
5. **Test your patterns**: Verify masking works correctly with your specific data formats
6. **Environment-specific config**: Use different settings for dev/staging/production
## Development
### Prerequisites
- Node.js >= 18.12.0
- Yarn or npm
### Setup
```bash
git clone https://github.com/bgauryy/sensitive-data-mask.git
cd sensitive-data-mask
yarn install
```
### Commands
```bash
yarn build # Build the library
yarn dev # Build in watch mode
yarn lint # Run ESLint
yarn test # Run tests
yarn typecheck # Run TypeScript compiler checks
```
## Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
### Adding New Patterns
1. Choose the appropriate category file in `src/regexes/`
2. Add your pattern following the existing structure:
```typescript
{
name: 'myPattern',
regex: /your-regex-here/gi,
description: 'Description of what this detects',
matchAccuracy: 'medium' // optional: 'high', 'medium', or 'low'
}
```
3. Run tests to ensure no regressions
4. Submit a PR with a clear description
## License
MIT © [guybary](https://github.com/bgauryy)
## Security
If you discover a security vulnerability, please email guybary@wix.com instead of using the issue tracker.
---
**Made with ❤️ for developers who care about data security**