@trap_stevo/linkscope

Version:

Unleash legendary link intelligence—instantly extract Open Graph, Twitter Card, and metadata from any URL using adaptive scraping logic and real browser simulation. Feed raw text, batch URLs, or scoped domains—parse, adapt, and reveal the hidden web behin

151 lines (105 loc) • 6.35 kB

Markdown

# 🔍 LinkScope **Reveal the Web Behind the Link.** Unleash legendary link intelligence—instantly extract Open Graph, Twitter Card, and metadata from any URL using adaptive scraping logic and real browser simulation. Feed raw text, batch URLs, or scoped domains—parse, adapt, and reveal the hidden web behind any link. Designed for creators, analysts, bots, and platforms seeking total control over link understanding at scale. --- ## ✨ Features - ⚡ **Instant Metadata Extraction** – Pull Open Graph, Twitter Card, favicon, charset, and raw meta tags - 🌐 **Multi-Link Support** – Scope many URLs or raw text blocks with concurrency - 🕵️‍♂️ **Stealth Mode** – Built-in browser fingerprint obfuscation - 🌍 **Proxy Support** – Route requests through your own proxy server - 🔧 **Protocol Filtering** – Support for custom protocol allowlists - 🧠 **Raw Text Extraction** – Automatically detect links in unstructured messages - 🔄 **Fallback Precision** – Automatically fall back to full-page render scraping when blocked - 🧩 **Modular Logic** – Integrate with any pipeline or platform needing metadata insights --- ## ⚙️ System Requirements | Requirement | Version | |----------------|--------------------| | **Node.js** | ≥ 19.x | | **npm** | ≥ 9.x (recommended)| | **OS** | Windows, macOS, Linux | --- ## ⚙️ Configurations | Option | Type | Default | Description | |--------------------|------------|------------------------|-------------------------------------------------------| | `allowedProtocols` | `string[]` | `[ "http:", "https:" ]`| Protocol allowlist (e.g., support for `ipfs:` etc.) | | `proxy` | `string` | `undefined` | Proxy server URL | | `concurrency` | `number` | `5` | Max parallel requests (used in `scopeMany`) | --- ## 📘 API Specifications | Method | Description | Async | |-----------------------------------|-----------------------------------------------------------------------------|--------| | `scopeLink(url, options?)` | Extracts metadata from a single URL, with fallback browser scraping logic | ✅ | | `scopeMany(urls, options?)` | Scopes multiple URLs concurrently with retry and normalization support | ✅ | | `scopeText(rawText, options?)` | Extracts URLs from raw text and runs them through `scopeMany` | ✅ | ### Options All methods accept the following optional configuration: | Option | Type | Description | |---------------------|------------|-----------------------------------------------------------------------------| | `proxy` | `string` | Route all scraping and browser requests through a proxy | | `allowedProtocols` | `string[]` | Whitelist specific protocols (defaults to `http:` and `https:`) | | `concurrency` | `number` | Max number of concurrent scrapes in `scopeMany` or `scopeText` | ### Return Types #### `scopeLink(url, options?) → MetadataResultObject` Returns a rich object with Open Graph, Twitter Card, favicon, charset, raw meta tags, and request metadata. #### `scopeMany(urls, options?) → Array<{ url, status, data | error }>` Returns an array of results, preserving the input order: - `status`: `"fulfilled"` or `"rejected"` - `data`: `MetadataResultObject` (on success) - `error`: Error string (on failure) #### `scopeText(rawText, options?) → Same as scopeMany` Extracts all valid URLs from a block of text and delegates to `scopeMany`. --- ## 🧪 MetadataResultObject Each fulfilled result includes: | Field | Description | |----------------------|--------------------------------------------| | `scopeTarget` | Final resolved URL after redirects | | `ogTitle` | Open Graph title | | `ogDescription` | Open Graph description | | `ogType` | Open Graph type | | `ogUrl` | Open Graph canonical URL | | `ogImage` | Array of Open Graph image URLs | | `ogVideo` | Open Graph video URL | | `ogAudio` | Open Graph audio URL | | `twitterCard` | Twitter Card type | | `twitterTitle` | Twitter title | | `twitterDescription` | Twitter description | | `twitterImage` | Twitter image URL | | `twitterPlayer` | Twitter video player URL | | `favicon` | Page favicon URL | | `charset` | Page character set | | `requestHeaders` | Headers used in successful fetch | | `allRawMetaTags` | Key-value map of all detected meta tags | | `success` | Whether the scrape succeeded | | `error` | Error message if failed | --- ## 🚀 Getting Started ### 📦 Installation ```bash npm install @trap_stevo/linkscope ``` ### 🔗 Single URL ```js const LinkScope = require("@sclpowerful/linkscope"); const data = await LinkScope.scopeLink("https://example.com"); console.log(data); ``` ### 📄 Raw Text with Multiple URLs ```js const rawText = ` Check these: https://openai.com and https://instagram.com `; const results = await LinkScope.scopeText(rawText); ``` ### 🌐 Multiple URLs with Concurrency + Proxy ```js const urls = ["https://example.com", "https://another.com"]; const results = await LinkScope.scopeMany(urls, { concurrency: 5, proxy: "http://your-proxy:8080" }); ``` --- ## 📜 License See License in [LICENSE.md](./LICENSE.md)