@knowcode/screenshotfetch
Version:
Web application spider with screenshot capture and customer journey documentation. Automate user flow documentation with authentication support.
404 lines (313 loc) ⢠11.5 kB
Markdown
# @knowcode/screenshotfetch
[](https://www.npmjs.com/package/@knowcode/screenshotfetch)
[](https://www.npmjs.com/package/@knowcode/screenshotfetch)
[](https://opensource.org/licenses/MIT)
[](https://nodejs.org/)
A comprehensive web application spider with screenshot capture and customer journey documentation, built on Puppeteer. Automatically crawl web applications, handle authentication, and generate step-by-step user flow documentation with screenshots.
## š Features
- šø **High-quality screenshot capture** - Crystal clear screenshots with full customization
- š·ļø **Web application spidering** - Intelligent crawling with flow documentation
- š **Automated authentication** - Username/password login with smart form detection
- š **Customer journey mapping** - Step-by-step flows with visual documentation
- š **URL tracking & cross-referencing** - Complete traceability for all screenshots
- šŖ **Advanced cookie consent handling** - Multiple strategies for banner removal
- š **Fast and reliable** - Built on Puppeteer with robust error handling
- š» **CLI and programmatic API** - Use via command line or integrate into your code
- š¦ **Batch processing** - Process multiple URLs efficiently
- šÆ **Smart filtering** - Avoids destructive actions like logout/delete
## š¦ Installation
### Global Installation (CLI Usage)
```bash
npm install -g @knowcode/screenshotfetch
```
### Local Installation (Programmatic Usage)
```bash
npm install @knowcode/screenshotfetch
```
## ā” Quick Start
### Spider a Web Application
```bash
# Spider with authentication
screenshotfetch spider https://app.example.com/login -u username -p password
# Generate documentation in custom directory
screenshotfetch spider https://app.example.com -u user@example.com -p pass123 -o ./my-docs
```
### Single Screenshot
```bash
# Basic screenshot
screenshotfetch capture https://example.com -o screenshot.png
# Full page screenshot
screenshotfetch capture https://example.com -o fullpage.png --fullpage
```
## CLI Usage
### Web Application Spider (NEW)
```bash
# Spider a web application with authentication
screenshotfetch spider https://app.example.com/login -u username -p password
# Custom output directory and flow limits
screenshotfetch spider https://app.example.com -u user@example.com -p pass123 -o ./my-docs --max-flows 3
# Debug mode (visible browser)
screenshotfetch spider https://app.example.com -u username -p password --headless false
# Custom viewport and timing
screenshotfetch spider https://app.example.com -u user -p pass -w 1280 -h 720 --wait 3000
```
### Capture Single Screenshot
```bash
# Basic usage
screenshotfetch capture https://example.com -o ./screenshot.png
# Full page capture
screenshotfetch capture https://example.com -o ./fullpage.png --fullpage
# Custom viewport
screenshotfetch capture https://example.com -w 1280 -h 720
# Different cookie strategies
screenshotfetch capture https://example.com -s none # No cookie handling
screenshotfetch capture https://example.com -s click # Only try clicking
screenshotfetch capture https://example.com -s remove # Only remove banners
screenshotfetch capture https://example.com -s block # Only block services
screenshotfetch capture https://example.com -s all # Try all strategies (default)
```
### Batch Capture
Create a JSON file with your screenshots:
```json
[
{
"url": "https://example.com",
"output": "./screenshots/home.png",
"fullPage": false
},
{
"url": "https://example.com/about",
"output": "./screenshots/about.png",
"fullPage": true
}
]
```
Then run:
```bash
screenshotfetch batch screenshots.json -d 3000
```
### View Example Format
```bash
screenshotfetch example
```
## Programmatic Usage
### Web Application Spider API
```javascript
const { ApplicationSpider } = require('@knowcode/screenshotfetch');
async function spiderApplication() {
const spider = new ApplicationSpider({
viewport: { width: 1920, height: 1080 },
cookieStrategy: 'all',
maxFlows: 5,
maxDepth: 10,
outputDir: './docs'
});
// Spider with authentication
const result = await spider.spiderApplication(
'https://app.example.com/login',
'username',
'password'
);
console.log(`Discovered ${result.summary.completedFlows} customer journeys`);
console.log(`Generated ${result.summary.screenshotCount} screenshots`);
console.log(`Documentation in: ${result.summary.outputDirectory}`);
}
```
### Screenshot Capture API
```javascript
const { ScreenshotCapture } = require('@knowcode/screenshotfetch');
async function captureScreenshots() {
const capture = new ScreenshotCapture({
viewport: { width: 1920, height: 1080 },
cookieStrategy: 'all',
waitTime: 3000
});
await capture.init();
// Single screenshot
const result = await capture.captureScreenshot(
'https://example.com',
'./screenshot.png',
{ fullPage: false }
);
// Batch screenshots
const screenshots = [
{ url: 'https://example.com', output: './home.png' },
{ url: 'https://example.com/about', output: './about.png' }
];
const results = await capture.captureMultiple(screenshots);
await capture.close();
}
```
## š Generated Documentation Structure
When using the spider functionality, the tool generates comprehensive documentation:
```
docs/ # Output directory
āāā index.md # Overview with flow summary
āāā flows/ # Customer journey documentation
ā āāā flow-1/
ā ā āāā flow-1.md # Step-by-step journey with screenshots
ā ā āāā _images/ # Screenshots for this flow
ā ā ā āāā 01-login.png
ā ā ā āāā 02-dashboard.png
ā ā ā āāā 03-settings.png
ā ā āāā flow-1.json # Metadata and URL mappings
ā āāā flow-2/
ā āāā flow-2.md # Another customer journey
ā āāā _images/ # Flow-specific screenshots
ā āāā flow-2.json # Flow metadata
āāā metadata/
āāā url-index.json # Complete URL-to-screenshot mapping
āāā flow-summary.json # Summary of all discovered flows
```
Each flow markdown file contains:
- Step-by-step customer journey with screenshots
- URL tracking for each step
- Action descriptions and navigation flow
- Metadata for programmatic access
## Cookie Handling Strategies
The tool provides multiple strategies for handling cookie consent banners:
1. **`all`** (default) - Tries all strategies in sequence
2. **`click`** - Attempts to click accept/agree buttons
3. **`remove`** - Removes banner elements from DOM
4. **`block`** - Blocks cookie consent service requests
5. **`none`** - No cookie handling
## Advanced Options
### Constructor Options
```javascript
{
viewport: { width: 1920, height: 1080, deviceScaleFactor: 1 },
timeout: 30000, // Navigation timeout
headless: 'new', // 'new' or false
cookieStrategy: 'all', // Cookie handling strategy
waitTime: 3000 // Wait after page load
}
```
### Screenshot Options
```javascript
{
fullPage: false, // Capture full scrollable page
clip: { // Capture specific region
x: 0,
y: 0,
width: 800,
height: 600
},
omitBackground: false, // Transparent background
encoding: 'binary', // 'base64' or 'binary'
type: 'png', // 'png' or 'jpeg'
quality: 90 // JPEG quality (0-100)
}
```
## Examples
### Capture Competitor Screenshots
```javascript
const { ScreenshotCapture } = require('screenshotfetch');
async function captureCompetitors() {
const capture = new ScreenshotCapture({
cookieStrategy: 'all'
});
await capture.init();
const competitors = [
'https://mailchimp.com',
'https://activecampaign.com',
'https://convertkit.com'
];
for (const url of competitors) {
const name = new URL(url).hostname.replace('www.', '');
await capture.captureScreenshot(
url,
`./competitors/${name}.png`
);
}
await capture.close();
}
```
### Custom Cookie Handler
```javascript
const { ScreenshotCapture, CookieHandler } = require('screenshotfetch');
const cookieHandler = new CookieHandler();
// Add custom selectors for specific sites
cookieHandler.addCustomSelectors([
'button[id="my-custom-accept"]',
'.my-site-cookie-accept'
]);
// Add domains to block
cookieHandler.addBlockedDomains([
'mycookieservice.com'
]);
```
## Troubleshooting
### Navigation Timeout
If you're getting timeout errors, try:
- Using `domcontentloaded` instead of `networkidle2`
- Reducing the timeout value
- Using headless: false to see what's happening
### Cookie Banners Still Visible
Try different strategies:
- Use `-s all` to try all methods
- Add custom selectors for specific sites
- Use headless: false to debug
### Memory Issues
For large batches:
- Increase delay between captures
- Process in smaller batches
- Monitor system resources
## š¤ Use Cases
### UX Research & Documentation
- Document user flows for analysis and improvement
- Create visual user journey maps
- Generate training materials automatically
### Quality Assurance & Testing
- Automate UI regression testing with screenshots
- Document application state changes
- Validate user experience flows
### Competitive Analysis
- Document competitor application flows
- Compare user experience patterns
- Generate competitive intelligence reports
### Process Documentation
- Create step-by-step user guides
- Document internal workflows
- Generate training materials
## š Requirements
- **Node.js**: 16.0.0 or higher
- **Operating System**: Windows, macOS, or Linux
- **Memory**: 512MB+ available RAM
- **Disk Space**: Varies based on screenshot quantity
## š§ Advanced Configuration
### Spider Options
```javascript
{
maxFlows: 5, // Maximum customer journeys to discover
maxDepth: 10, // Maximum steps per journey
maxPages: 100, // Maximum total pages to visit
waitTime: 2000, // Wait between actions (ms)
includeQueryParams: true, // Include URL query parameters
cookieStrategy: 'all' // Cookie consent handling strategy
}
```
### Screenshot Options
```javascript
{
fullPage: false, // Capture full scrollable page
viewport: { // Browser viewport size
width: 1920,
height: 1080,
deviceScaleFactor: 1
},
type: 'png', // Image format ('png' or 'jpeg')
quality: 90 // JPEG quality (0-100)
}
```
## š Contributing
We welcome contributions! Please see our contributing guidelines for details on how to:
- Report bugs and request features
- Submit pull requests
- Improve documentation
## š License
MIT Ā© [KnowCode](https://github.com/knowcode)
## š Links
- [NPM Package](https://www.npmjs.com/package/@knowcode/screenshotfetch)
- [GitHub Repository](https://github.com/knowcode/screenshotfetch)
- [Issue Tracker](https://github.com/knowcode/screenshotfetch/issues)
- [Documentation](https://github.com/knowcode/screenshotfetch#readme)