UNPKG

@webpage-quality-analyzer/core

Version:

High-performance webpage quality analyzer with 115 comprehensive metrics - powered by Rust + WebAssembly

451 lines (341 loc) 13.4 kB
# Webpage Quality Analyzer > High-performance webpage quality analysis with 124 comprehensive metrics - powered by Rust + WebAssembly [![npm version](https://img.shields.io/npm/v/@webpage-quality-analyzer/core)](https://www.npmjs.com/package/@webpage-quality-analyzer/core) [![License](https://img.shields.io/badge/license-MIT%20OR%20Apache--2.0-blue)](https://github.com/NotGyashu/webpage-quality-analyzer) [![WebAssembly](https://img.shields.io/badge/WebAssembly-654FF0?logo=webassembly&logoColor=white)](https://webassembly.org/) [![Rust](https://img.shields.io/badge/Rust-000000?logo=rust&logoColor=white)](https://www.rust-lang.org/) ## ✨ Features - **124 Comprehensive Metrics**: Content quality, SEO, accessibility, performance, and more - **92 HTML-Only Metrics**: Fast analysis without network requests (WASM optimized) - **⚡ Blazing Fast**: Powered by Rust + WebAssembly for near-native performance - **🌐 Universal**: Works in Node.js and all modern browsers - **📊 8 Built-in Profiles**: Optimized scoring for news, blogs, products, portfolios, etc. - **🎯 TypeScript Support**: Full type definitions included - **🔧 Zero Dependencies**: Self-contained WASM bundle (2.1 MB) - **🎨 Flexible Output**: Customizable field selectors for minimal payloads (98.8% reduction) - **⚙️ Advanced Customization**: Metric weights, thresholds, penalties, bonuses **WASM Limitations:** - ✅ 92 HTML-only metrics (74% of total) - ❌ Readability feature not available (uses blocking I/O) - ❌ NLP features not available in browser - ✅ HeuristicExtractor for content extraction (pure Rust) ## 📦 Installation ```bash npm install @webpage-quality-analyzer/core ``` or with yarn: ```bash yarn add @webpage-quality-analyzer/core ``` or with pnpm: ```bash pnpm add @webpage-quality-analyzer/core ``` ## 🚀 Quick Start ### Basic Usage (Node.js) ```javascript import init, { WasmAnalyzer } from '@webpage-quality-analyzer/core'; // Initialize WASM module (required once) await init(); // Create analyzer instance const analyzer = new WasmAnalyzer(); // Analyze HTML content const html = ` <!DOCTYPE html> <html> <head><title>Example Page</title></head> <body><h1>Hello World</h1><p>Content here...</p></body> </html> `; const report = await analyzer.analyze(html); console.log(`Score: ${report.score}`); console.log(`Quality: ${report.verdict}`); // Excellent, Good, Fair, Poor, VeryPoor console.log(`Metrics: ${JSON.stringify(report.metrics, null, 2)}`); ``` ### With Profile Selection ```javascript const analyzer = new WasmAnalyzer(); // Use 'news' profile optimized for news articles const report = await analyzer.analyze_with_profile(html, 'news'); // Available profiles (8 built-in): // - content_article: Long-form articles (80% content weight) // - blog: Personal and professional blogs (75% content) // - news: News articles (40% content, 30% SEO) // - general: Default/versatile (35% content) // - homepage: Landing pages (25% balanced) // - product: Product pages (35% media, 25% SEO) // - portfolio: Creative showcases (50% media) // - login_page: Authentication pages (50% technical) ``` ### Browser Usage ```html <!DOCTYPE html> <html> <head> <meta charset="UTF-8"> <title>Webpage Quality Analyzer Demo</title> </head> <body> <h1>Analyze Webpage Quality</h1> <textarea id="html-input" rows="10" cols="80"></textarea> <button onclick="analyzeHTML()">Analyze</button> <div id="results"></div> <script type="module"> import init, { WasmAnalyzer } from './node_modules/@webpage-quality-analyzer/core/webpage_quality_analyzer.js'; let analyzer; // Initialize on page load init().then(() => { analyzer = new WasmAnalyzer(); console.log('✅ Analyzer ready'); }); window.analyzeHTML = async function() { const html = document.getElementById('html-input').value; const report = await analyzer.analyze(html); document.getElementById('results').innerHTML = ` <h2>Results</h2> <p><strong>Score:</strong> ${report.score.toFixed(2)}</p> <p><strong>Quality:</strong> ${report.verdict}</p> <pre>${JSON.stringify(report.metrics, null, 2)}</pre> `; }; </script> </body> </html> ``` ## 📊 Metrics Categories The analyzer provides **124 metrics** (92 available in WASM without network) across **20 categories**: ### Core Categories (WASM Compatible) - **Content (14 metrics)** - Word count, readability, text quality, content density - **SEO (11 metrics)** - Title, meta description, Open Graph, structured data - **Technical (7 metrics)** - HTML size, scripts, styles, validation - **Accessibility (7 metrics)** - WCAG compliance, ARIA labels, contrast - **Media (8 metrics)** - Images, videos, audio with optimization analysis - **Links (8 metrics)** - Internal/external links, anchor text analysis - **Structure (5 metrics)** - Headings, paragraphs, document hierarchy - **Forms (6 metrics)** - Form elements, validation, labels - **User Experience (5 metrics)** - Interactive elements, CTAs ### Additional Categories - **Performance (11)** - LCP, FCP, CLS metrics (when available) - **Security (6)** - HTTPS, CSP, HSTS detection - **Structured Data (4)** - JSON-LD, Microdata, RDFa, Schema.org - **Mobile (4)** - Viewport, touch targets, mobile optimization - **Branding (4)** - Logo, colors, fonts, brand elements - **Error Handling (3)** - Redirects, error detection - **Business (3)** - Contact info, business hours - **Authority (3)** - Author info, publication dates - **Analytics (3)** - Google Analytics, Tag Manager detection - **Internationalization (2)** - hreflang, language tags - **Language (1)** - Language detection (when NLP enabled) ## 🎯 Advanced Usage ### Custom Field Selection ```javascript const analyzer = new WasmAnalyzer(); // Analyze with minimal output (only scores) const report = await analyzer.run_compact(html); // Custom field selection const report = await analyzer.run_with_fields(html, [ 'score', 'verdict', 'metrics.content.word_count', 'metrics.seo.title_len' ]); ``` ### Batch Analysis ```javascript const analyzer = new WasmAnalyzer(); const htmlPages = [ '<html>Page 1...</html>', '<html>Page 2...</html>', '<html>Page 3...</html>' ]; const reports = await Promise.all( htmlPages.map(html => analyzer.analyze(html)) ); const averageScore = reports.reduce((sum, r) => sum + r.score, 0) / reports.length; console.log(`Average score: ${averageScore.toFixed(2)}`); ``` ### Builder Pattern ```javascript const analyzer = new WasmAnalyzer(); // Configure analyzer with custom settings analyzer.with_profile('news'); analyzer.set_metric_weight('word_count', 1.5); analyzer.disable_metric('grammar_score'); const report = await analyzer.run(html); ``` ### Penalties & Bonuses ```javascript const analyzer = new WasmAnalyzer(); // Add custom penalty for low word count analyzer.add_penalty_below('word_count', 300, 10, 'Content too short'); // Add bonus for excellent readability analyzer.add_bonus_above('readability_fk', 80, 5, 'Highly readable'); const report = await analyzer.run(html); ``` ### Custom Thresholds ```javascript const analyzer = new WasmAnalyzer(); // Set custom scoring curve for word count analyzer.set_metric_threshold( 'word_count', 100, // min 800, // optimal_min 2000, // optimal_max 5000 // max ); const report = await analyzer.run(html); ``` ## 📖 API Reference ### `WasmAnalyzer` Class #### Methods - `analyze(html: string): Promise<PageQualityReport>` - Analyze HTML with default settings - `analyze_with_profile(html: string, profile: string): Promise<PageQualityReport>` - Analyze with specific profile - `run(html: string): Promise<PageQualityReport>` - Run analysis with current configuration - `run_compact(html: string): Promise<PageQualityReport>` - Run with minimal output fields - `run_with_fields(html: string, fields: string[]): Promise<PageQualityReport>` - Run with custom field selection - `with_profile(profile: string): Promise<void>` - Set analysis profile - `set_metric_weight(metric: string, weight: number): Promise<void>` - Customize metric weight - `disable_metric(metric: string): Promise<void>` - Disable specific metric - `enable_metric(metric: string): Promise<void>` - Enable specific metric - `add_penalty_below(metric: string, threshold: number, deduction: number, reason: string): Promise<void>` - Add custom penalty condition - `add_bonus_above(metric: string, threshold: number, addition: number, reason: string): Promise<void>` - Add custom bonus condition ### `PageQualityReport` Interface ```typescript interface PageQualityReport { url: string; fetched_at: string; score: number; // 0-100 verdict: QualityBand; // Excellent | Good | Fair | Poor | VeryPoor metrics: PageMetrics; // 92 WASM-compatible metrics metadata: PageMetadata; processed_document: ProcessedDocument; notes: string[]; version: string; phase3_scoring?: Phase3ScoringResult; // Category-based scores } interface Phase3ScoringResult { category_scores: Record<string, number>; // Content, SEO, Technical, etc. profile_used: string; total_weighted_score: number; } ``` **Available Profiles:** - `content_article` - 80% content weight (long-form articles) - `blog` - 75% content weight (blog posts) - `news` - 40% content, 30% SEO (news articles) - `general` - 35% content (default/versatile) - `homepage` - 25% balanced (landing pages) - `product` - 35% media, 25% SEO (product pages) - `portfolio` - 50% media (creative showcases) - `login_page` - 50% technical, 20% accessibility ## 🎨 React Integration Example ```typescript import { useState, useEffect } from 'react'; import init, { WasmAnalyzer } from '@webpage-quality-analyzer/core'; function WebpageAnalyzer() { const [analyzer, setAnalyzer] = useState<WasmAnalyzer | null>(null); const [html, setHtml] = useState(''); const [report, setReport] = useState<any>(null); const [loading, setLoading] = useState(false); useEffect(() => { // Initialize WASM init().then(() => { setAnalyzer(new WasmAnalyzer()); }); }, []); const handleAnalyze = async () => { if (!analyzer || !html) return; setLoading(true); try { const result = await analyzer.analyze(html); setReport(result); } catch (error) { console.error('Analysis failed:', error); } finally { setLoading(false); } }; return ( <div> <h1>Webpage Quality Analyzer</h1> <textarea value={html} onChange={(e) => setHtml(e.target.value)} placeholder="Paste HTML here..." rows={10} cols={80} /> <button onClick={handleAnalyze} disabled={loading || !analyzer}> {loading ? 'Analyzing...' : 'Analyze'} </button> {report && ( <div> <h2>Results</h2> <p><strong>Score:</strong> {report.score.toFixed(2)}</p> <p><strong>Quality:</strong> {report.verdict}</p> <p><strong>Word Count:</strong> {report.metrics.content.word_count}</p> </div> )} </div> ); } ``` ## 📈 Performance - **WASM Size**: 3.1 MB (gzipped: ~800 KB) - **Init Time**: < 100ms (first load), instant (cached) - **Analysis Time**: 50-200ms for typical webpage - **Memory**: < 10 MB RAM usage - **Throughput**: 100+ analyses/second (batch mode) ## 🛠️ TypeScript Support Full TypeScript definitions are included: ```typescript import init, { WasmAnalyzer, PageQualityReport, QualityBand } from '@webpage-quality-analyzer/core'; const analyzer: WasmAnalyzer = new WasmAnalyzer(); const report: PageQualityReport = await analyzer.analyze(html); const verdict: QualityBand = report.verdict; ``` ## 🔧 Troubleshooting ### Issue: "Module not found" Make sure to await the `init()` call before creating the analyzer: ```javascript await init(); // ← Required const analyzer = new WasmAnalyzer(); ``` ### Issue: "Cannot read property 'analyze' of undefined" Ensure WASM module is initialized before use: ```javascript let analyzer; init().then(() => { analyzer = new WasmAnalyzer(); // Now safe to use }); ``` ### Issue: Large bundle size The WASM file is ~3.1 MB uncompressed. Enable gzip/brotli compression on your server to reduce to ~800 KB. ## 📚 Documentation - [Full Documentation](https://github.com/NotGyashu/webpage-quality-analyzer/tree/main/docs) - [Metrics Reference](https://github.com/NotGyashu/webpage-quality-analyzer/blob/main/docs/metrics_reference.md) - [CLI Tool](https://github.com/NotGyashu/webpage-quality-analyzer#cli-tool) - [Examples](https://github.com/NotGyashu/webpage-quality-analyzer/tree/main/examples) ## 🤝 Contributing Contributions are welcome! Please see [CONTRIBUTING.md](https://github.com/NotGyashu/webpage-quality-analyzer/blob/main/CONTRIBUTING.md) ## 📄 License Dual-licensed under MIT OR Apache-2.0 ## 🔗 Links - [GitHub Repository](https://github.com/NotGyashu/webpage-quality-analyzer) - [NPM Package](https://www.npmjs.com/package/@webpage-quality-analyzer/core) - [Issue Tracker](https://github.com/NotGyashu/webpage-quality-analyzer/issues) - [Changelog](https://github.com/NotGyashu/webpage-quality-analyzer/blob/main/CHANGELOG.md) --- **Made with ❤️ and Rust 🦀**