@webpage-quality-analyzer/core
Version:
High-performance webpage quality analyzer with 115 comprehensive metrics - powered by Rust + WebAssembly
451 lines (341 loc) • 13.4 kB
Markdown
# Webpage Quality Analyzer
> High-performance webpage quality analysis with 124 comprehensive metrics - powered by Rust + WebAssembly
[](https://www.npmjs.com/package/@webpage-quality-analyzer/core)
[](https://github.com/NotGyashu/webpage-quality-analyzer)
[](https://webassembly.org/)
[](https://www.rust-lang.org/)
## ✨ Features
- **124 Comprehensive Metrics**: Content quality, SEO, accessibility, performance, and more
- **92 HTML-Only Metrics**: Fast analysis without network requests (WASM optimized)
- **⚡ Blazing Fast**: Powered by Rust + WebAssembly for near-native performance
- **🌐 Universal**: Works in Node.js and all modern browsers
- **📊 8 Built-in Profiles**: Optimized scoring for news, blogs, products, portfolios, etc.
- **🎯 TypeScript Support**: Full type definitions included
- **🔧 Zero Dependencies**: Self-contained WASM bundle (2.1 MB)
- **🎨 Flexible Output**: Customizable field selectors for minimal payloads (98.8% reduction)
- **⚙️ Advanced Customization**: Metric weights, thresholds, penalties, bonuses
**WASM Limitations:**
- ✅ 92 HTML-only metrics (74% of total)
- ❌ Readability feature not available (uses blocking I/O)
- ❌ NLP features not available in browser
- ✅ HeuristicExtractor for content extraction (pure Rust)
## 📦 Installation
```bash
npm install @webpage-quality-analyzer/core
```
or with yarn:
```bash
yarn add @webpage-quality-analyzer/core
```
or with pnpm:
```bash
pnpm add @webpage-quality-analyzer/core
```
## 🚀 Quick Start
### Basic Usage (Node.js)
```javascript
import init, { WasmAnalyzer } from '@webpage-quality-analyzer/core';
// Initialize WASM module (required once)
await init();
// Create analyzer instance
const analyzer = new WasmAnalyzer();
// Analyze HTML content
const html = `
<!DOCTYPE html>
<html>
<head><title>Example Page</title></head>
<body><h1>Hello World</h1><p>Content here...</p></body>
</html>
`;
const report = await analyzer.analyze(html);
console.log(`Score: ${report.score}`);
console.log(`Quality: ${report.verdict}`); // Excellent, Good, Fair, Poor, VeryPoor
console.log(`Metrics: ${JSON.stringify(report.metrics, null, 2)}`);
```
### With Profile Selection
```javascript
const analyzer = new WasmAnalyzer();
// Use 'news' profile optimized for news articles
const report = await analyzer.analyze_with_profile(html, 'news');
// Available profiles (8 built-in):
// - content_article: Long-form articles (80% content weight)
// - blog: Personal and professional blogs (75% content)
// - news: News articles (40% content, 30% SEO)
// - general: Default/versatile (35% content)
// - homepage: Landing pages (25% balanced)
// - product: Product pages (35% media, 25% SEO)
// - portfolio: Creative showcases (50% media)
// - login_page: Authentication pages (50% technical)
```
### Browser Usage
```html
<!DOCTYPE html>
<html>
<head>
<meta charset="UTF-8">
<title>Webpage Quality Analyzer Demo</title>
</head>
<body>
<h1>Analyze Webpage Quality</h1>
<textarea id="html-input" rows="10" cols="80"></textarea>
<button onclick="analyzeHTML()">Analyze</button>
<div id="results"></div>
<script type="module">
import init, { WasmAnalyzer } from './node_modules/@webpage-quality-analyzer/core/webpage_quality_analyzer.js';
let analyzer;
// Initialize on page load
init().then(() => {
analyzer = new WasmAnalyzer();
console.log('✅ Analyzer ready');
});
window.analyzeHTML = async function() {
const html = document.getElementById('html-input').value;
const report = await analyzer.analyze(html);
document.getElementById('results').innerHTML = `
<h2>Results</h2>
<p><strong>Score:</strong> ${report.score.toFixed(2)}</p>
<p><strong>Quality:</strong> ${report.verdict}</p>
<pre>${JSON.stringify(report.metrics, null, 2)}</pre>
`;
};
</script>
</body>
</html>
```
## 📊 Metrics Categories
The analyzer provides **124 metrics** (92 available in WASM without network) across **20 categories**:
### Core Categories (WASM Compatible)
- **Content (14 metrics)** - Word count, readability, text quality, content density
- **SEO (11 metrics)** - Title, meta description, Open Graph, structured data
- **Technical (7 metrics)** - HTML size, scripts, styles, validation
- **Accessibility (7 metrics)** - WCAG compliance, ARIA labels, contrast
- **Media (8 metrics)** - Images, videos, audio with optimization analysis
- **Links (8 metrics)** - Internal/external links, anchor text analysis
- **Structure (5 metrics)** - Headings, paragraphs, document hierarchy
- **Forms (6 metrics)** - Form elements, validation, labels
- **User Experience (5 metrics)** - Interactive elements, CTAs
### Additional Categories
- **Performance (11)** - LCP, FCP, CLS metrics (when available)
- **Security (6)** - HTTPS, CSP, HSTS detection
- **Structured Data (4)** - JSON-LD, Microdata, RDFa, Schema.org
- **Mobile (4)** - Viewport, touch targets, mobile optimization
- **Branding (4)** - Logo, colors, fonts, brand elements
- **Error Handling (3)** - Redirects, error detection
- **Business (3)** - Contact info, business hours
- **Authority (3)** - Author info, publication dates
- **Analytics (3)** - Google Analytics, Tag Manager detection
- **Internationalization (2)** - hreflang, language tags
- **Language (1)** - Language detection (when NLP enabled)
## 🎯 Advanced Usage
### Custom Field Selection
```javascript
const analyzer = new WasmAnalyzer();
// Analyze with minimal output (only scores)
const report = await analyzer.run_compact(html);
// Custom field selection
const report = await analyzer.run_with_fields(html, [
'score',
'verdict',
'metrics.content.word_count',
'metrics.seo.title_len'
]);
```
### Batch Analysis
```javascript
const analyzer = new WasmAnalyzer();
const htmlPages = [
'<html>Page 1...</html>',
'<html>Page 2...</html>',
'<html>Page 3...</html>'
];
const reports = await Promise.all(
htmlPages.map(html => analyzer.analyze(html))
);
const averageScore = reports.reduce((sum, r) => sum + r.score, 0) / reports.length;
console.log(`Average score: ${averageScore.toFixed(2)}`);
```
### Builder Pattern
```javascript
const analyzer = new WasmAnalyzer();
// Configure analyzer with custom settings
analyzer.with_profile('news');
analyzer.set_metric_weight('word_count', 1.5);
analyzer.disable_metric('grammar_score');
const report = await analyzer.run(html);
```
### Penalties & Bonuses
```javascript
const analyzer = new WasmAnalyzer();
// Add custom penalty for low word count
analyzer.add_penalty_below('word_count', 300, 10, 'Content too short');
// Add bonus for excellent readability
analyzer.add_bonus_above('readability_fk', 80, 5, 'Highly readable');
const report = await analyzer.run(html);
```
### Custom Thresholds
```javascript
const analyzer = new WasmAnalyzer();
// Set custom scoring curve for word count
analyzer.set_metric_threshold(
'word_count',
100, // min
800, // optimal_min
2000, // optimal_max
5000 // max
);
const report = await analyzer.run(html);
```
## 📖 API Reference
### `WasmAnalyzer` Class
#### Methods
- `analyze(html: string): Promise<PageQualityReport>`
- Analyze HTML with default settings
- `analyze_with_profile(html: string, profile: string): Promise<PageQualityReport>`
- Analyze with specific profile
- `run(html: string): Promise<PageQualityReport>`
- Run analysis with current configuration
- `run_compact(html: string): Promise<PageQualityReport>`
- Run with minimal output fields
- `run_with_fields(html: string, fields: string[]): Promise<PageQualityReport>`
- Run with custom field selection
- `with_profile(profile: string): Promise<void>`
- Set analysis profile
- `set_metric_weight(metric: string, weight: number): Promise<void>`
- Customize metric weight
- `disable_metric(metric: string): Promise<void>`
- Disable specific metric
- `enable_metric(metric: string): Promise<void>`
- Enable specific metric
- `add_penalty_below(metric: string, threshold: number, deduction: number, reason: string): Promise<void>`
- Add custom penalty condition
- `add_bonus_above(metric: string, threshold: number, addition: number, reason: string): Promise<void>`
- Add custom bonus condition
### `PageQualityReport` Interface
```typescript
interface PageQualityReport {
url: string;
fetched_at: string;
score: number; // 0-100
verdict: QualityBand; // Excellent | Good | Fair | Poor | VeryPoor
metrics: PageMetrics; // 92 WASM-compatible metrics
metadata: PageMetadata;
processed_document: ProcessedDocument;
notes: string[];
version: string;
phase3_scoring?: Phase3ScoringResult; // Category-based scores
}
interface Phase3ScoringResult {
category_scores: Record<string, number>; // Content, SEO, Technical, etc.
profile_used: string;
total_weighted_score: number;
}
```
**Available Profiles:**
- `content_article` - 80% content weight (long-form articles)
- `blog` - 75% content weight (blog posts)
- `news` - 40% content, 30% SEO (news articles)
- `general` - 35% content (default/versatile)
- `homepage` - 25% balanced (landing pages)
- `product` - 35% media, 25% SEO (product pages)
- `portfolio` - 50% media (creative showcases)
- `login_page` - 50% technical, 20% accessibility
## 🎨 React Integration Example
```typescript
import { useState, useEffect } from 'react';
import init, { WasmAnalyzer } from '@webpage-quality-analyzer/core';
function WebpageAnalyzer() {
const [analyzer, setAnalyzer] = useState<WasmAnalyzer | null>(null);
const [html, setHtml] = useState('');
const [report, setReport] = useState<any>(null);
const [loading, setLoading] = useState(false);
useEffect(() => {
// Initialize WASM
init().then(() => {
setAnalyzer(new WasmAnalyzer());
});
}, []);
const handleAnalyze = async () => {
if (!analyzer || !html) return;
setLoading(true);
try {
const result = await analyzer.analyze(html);
setReport(result);
} catch (error) {
console.error('Analysis failed:', error);
} finally {
setLoading(false);
}
};
return (
<div>
<h1>Webpage Quality Analyzer</h1>
<textarea
value={html}
onChange={(e) => setHtml(e.target.value)}
placeholder="Paste HTML here..."
rows={10}
cols={80}
/>
<button onClick={handleAnalyze} disabled={loading || !analyzer}>
{loading ? 'Analyzing...' : 'Analyze'}
</button>
{report && (
<div>
<h2>Results</h2>
<p><strong>Score:</strong> {report.score.toFixed(2)}</p>
<p><strong>Quality:</strong> {report.verdict}</p>
<p><strong>Word Count:</strong> {report.metrics.content.word_count}</p>
</div>
)}
</div>
);
}
```
## 📈 Performance
- **WASM Size**: 3.1 MB (gzipped: ~800 KB)
- **Init Time**: < 100ms (first load), instant (cached)
- **Analysis Time**: 50-200ms for typical webpage
- **Memory**: < 10 MB RAM usage
- **Throughput**: 100+ analyses/second (batch mode)
## 🛠️ TypeScript Support
Full TypeScript definitions are included:
```typescript
import init, { WasmAnalyzer, PageQualityReport, QualityBand } from '@webpage-quality-analyzer/core';
const analyzer: WasmAnalyzer = new WasmAnalyzer();
const report: PageQualityReport = await analyzer.analyze(html);
const verdict: QualityBand = report.verdict;
```
## 🔧 Troubleshooting
### Issue: "Module not found"
Make sure to await the `init()` call before creating the analyzer:
```javascript
await init(); // ← Required
const analyzer = new WasmAnalyzer();
```
### Issue: "Cannot read property 'analyze' of undefined"
Ensure WASM module is initialized before use:
```javascript
let analyzer;
init().then(() => {
analyzer = new WasmAnalyzer();
// Now safe to use
});
```
### Issue: Large bundle size
The WASM file is ~3.1 MB uncompressed. Enable gzip/brotli compression on your server to reduce to ~800 KB.
## 📚 Documentation
- [Full Documentation](https://github.com/NotGyashu/webpage-quality-analyzer/tree/main/docs)
- [Metrics Reference](https://github.com/NotGyashu/webpage-quality-analyzer/blob/main/docs/metrics_reference.md)
- [CLI Tool](https://github.com/NotGyashu/webpage-quality-analyzer#cli-tool)
- [Examples](https://github.com/NotGyashu/webpage-quality-analyzer/tree/main/examples)
## 🤝 Contributing
Contributions are welcome! Please see [CONTRIBUTING.md](https://github.com/NotGyashu/webpage-quality-analyzer/blob/main/CONTRIBUTING.md)
## 📄 License
Dual-licensed under MIT OR Apache-2.0
## 🔗 Links
- [GitHub Repository](https://github.com/NotGyashu/webpage-quality-analyzer)
- [NPM Package](https://www.npmjs.com/package/@webpage-quality-analyzer/core)
- [Issue Tracker](https://github.com/NotGyashu/webpage-quality-analyzer/issues)
- [Changelog](https://github.com/NotGyashu/webpage-quality-analyzer/blob/main/CHANGELOG.md)
---
**Made with ❤️ and Rust 🦀**