puppeteer-extends
Version:
Modern, factory-based management for Puppeteer with multiple browser instances and enhanced navigation
527 lines (410 loc) • 16.8 kB
Markdown
# puppeteer-extends
[](https://opensource.org/licenses/Apache-2.0)
[](https://github.com/devalexanderdaza/puppeteer-extends/actions/workflows/ci.yml)
[](https://npmjs.org/package/puppeteer-extends)
> Modern, factory-based management for Puppeteer with advanced navigation features. Ideal for web scraping, testing, and automation.
## Features
- 🏭 **Browser Factory** - Multiple browser instances with intelligent reuse
- 🧭 **Enhanced Navigation** - CloudFlare bypass, retry mechanisms, and more
- 📝 **TypeScript Support** - Full type definitions for better DX
- 🔄 **Singleton Compatibility** - Maintains backward compatibility
- 🛡️ **Stealth Mode** - Integrated puppeteer-extra-plugin-stealth
## Installation
```bash
pnpm install puppeteer-extends
```
## Basic Usage
```typescript
import { PuppeteerExtends, Logger } from 'puppeteer-extends';
import path from 'path';
const main = async () => {
// Get browser instance with options
const browser = await PuppeteerExtends.getBrowser({
isHeadless: true,
isDebug: true
});
// Create new page
const page = await browser.newPage();
// Navigate with enhanced features (CloudFlare bypass, retries)
await PuppeteerExtends.goto(page, 'https://example.com', {
maxRetries: 2,
timeout: 30000
});
// Take screenshot
await page.screenshot({
path: path.join(__dirname, 'screenshot.png')
});
// Clean up
await PuppeteerExtends.closePage(page);
await PuppeteerExtends.closeBrowser();
};
main().catch(console.error);
```
## Multiple Browser Instances
```typescript
import { PuppeteerExtends } from 'puppeteer-extends';
const main = async () => {
// Create two browser instances with different configurations
const mainBrowser = await PuppeteerExtends.getBrowser({
instanceId: 'main',
isHeadless: true
});
const proxyBrowser = await PuppeteerExtends.getBrowser({
instanceId: 'proxy',
isHeadless: true,
customArguments: [
'--proxy-server=http://my-proxy.com:8080',
'--no-sandbox'
]
});
// Work with both browsers
const mainPage = await mainBrowser.newPage();
const proxyPage = await proxyBrowser.newPage();
// Clean up specific browser
await PuppeteerExtends.closeBrowser('proxy');
// Or close all browsers at once
await PuppeteerExtends.closeAllBrowsers();
};
```
## Advanced Navigation Features
```typescript
import { PuppeteerExtends } from 'puppeteer-extends';
const scrape = async () => {
const browser = await PuppeteerExtends.getBrowser();
const page = await browser.newPage();
// Navigation with retries and custom options
const success = await PuppeteerExtends.goto(page, 'https://example.com', {
waitUntil: ['networkidle0'],
timeout: 60000,
maxRetries: 3,
retryDelay: 5000,
isDebug: true
});
if (success) {
// Wait for specific element with timeout
const elementVisible = await PuppeteerExtends.waitForSelector(
page,
'.content-loaded',
10000
);
if (elementVisible) {
// Extract data
const data = await page.evaluate(() => {
return {
title: document.title,
content: document.querySelector('.content')?.textContent
};
});
console.log('Scraped data:', data);
}
}
await PuppeteerExtends.closePage(page);
await PuppeteerExtends.closeBrowser();
};
```
## Customizing Logger
```typescript
import { Logger } from 'puppeteer-extends';
import path from 'path';
// Set custom log directory
Logger.configure(path.join(__dirname, 'custom-logs'));
// Use logger
Logger.info('Starting browser');
Logger.debug('Debug information');
Logger.warn('Warning message');
Logger.error('Error occurred', new Error('Something went wrong'));
```
## Plugin System
puppeteer-extends v1.7.0 introduces a powerful plugin system that allows extending functionality through lifecycle hooks:
```typescript
import { PuppeteerExtends, StealthPlugin, ProxyPlugin } from 'puppeteer-extends';
// Register stealth plugin for improved anti-bot protection
await PuppeteerExtends.registerPlugin(new StealthPlugin({
hideWebDriver: true,
hideChromeRuntime: true
}));
// Register proxy rotation plugin
await PuppeteerExtends.registerPlugin(new ProxyPlugin({
proxies: [
{ host: '123.45.67.89', port: 8080, username: 'user', password: 'pass' },
{ host: '98.76.54.32', port: 8080 },
],
rotationStrategy: 'sequential',
rotateOnNavigation: true,
rotateOnError: true
}));
// Continue with normal usage - plugins will be active
const browser = await PuppeteerExtends.getBrowser();
const page = await browser.newPage();
// ...
```
### Built-in Plugins
| Plugin | Description |
|--------|-------------|
| `StealthPlugin` | Enhances anti-bot detection protection |
| `ProxyPlugin` | Provides proxy rotation with multiple strategies |
### Creating Custom Plugins
You can create custom plugins by implementing the `PuppeteerPlugin` interface:
```typescript
import { PuppeteerPlugin, PluginContext } from 'puppeteer-extends';
class MyCustomPlugin implements PuppeteerPlugin {
name = 'my-custom-plugin';
// Called when plugin is registered
async initialize(options?: any): Promise<void> {
console.log('Plugin initialized with options:', options);
}
// Called before browser launch
async onBeforeBrowserLaunch(options: any, context: PluginContext): Promise<any> {
// Modify launch options if needed
return {
...options,
args: [...options.args, '--disable-features=site-per-process']
};
}
// Called when a new page is created
async onPageCreated(page: Page, context: PluginContext): Promise<void> {
await page.evaluateOnNewDocument(() => {
// Execute code in the page context
});
}
// Called before navigation
async onBeforeNavigation(page: Page, url: string, options: any, context: PluginContext): Promise<void> {
console.log(`Navigating to: ${url}`);
}
// Handle errors
async onError(error: Error, context: PluginContext): Promise<boolean> {
console.error('Plugin caught error:', error);
return false; // Return true if error was handled
}
// Called when plugin is unregistered
async cleanup(): Promise<void> {
console.log('Plugin cleanup');
}
}
// Register the custom plugin
await PuppeteerExtends.registerPlugin(new MyCustomPlugin(), {
customOption: 'value'
});
```
### Available Plugin Hooks
| Hook | Description |
|------|-------------|
| `initialize` | Called when plugin is registered |
| `cleanup` | Called when plugin is unregistered |
| `onBeforeBrowserLaunch` | Before browser instance is created |
| `onAfterBrowserLaunch` | After browser instance is created |
| `onBeforeBrowserClose` | Before browser instance is closed |
| `onPageCreated` | When a new page is created |
| `onBeforePageClose` | Before a page is closed |
| `onBeforeNavigation` | Before navigation to a URL |
| `onAfterNavigation` | After navigation completes (success or failure) |
| `onError` | When an error occurs in any operation |
## Session Management
puppeteer-extends v1.7.0 introduces powerful session management capabilities, allowing you to persist cookies, localStorage, and other browser state between runs:
```typescript
import { PuppeteerExtends, SessionManager } from 'puppeteer-extends';
// Create a session manager
const sessionManager = new SessionManager({
name: 'my-session',
sessionDir: './sessions',
persistCookies: true,
persistLocalStorage: true,
domains: ['example.com'] // Only store cookies for these domains
});
// Get browser and create page
const browser = await PuppeteerExtends.getBrowser();
const page = await browser.newPage();
// Apply session data to page (cookies, localStorage, etc.)
await sessionManager.applySession(page);
// Navigate somewhere
await PuppeteerExtends.goto(page, 'https://example.com/login');
// Perform login or other actions that modify session
await page.type('#username', 'myuser');
await page.type('#password', 'mypassword');
await page.click('#login-button');
// Wait for login to complete
await PuppeteerExtends.waitForNavigation(page);
// Extract and save session data (cookies, localStorage, etc.)
await sessionManager.extractSession(page);
// Next time you run your script, the session will be loaded automatically
```
### Using the SessionPlugin
For more integrated session management, use the built-in SessionPlugin:
```typescript
import { PuppeteerExtends } from 'puppeteer-extends';
import { SessionPlugin } from 'puppeteer-extends/plugins';
// Register the session plugin
await PuppeteerExtends.registerPlugin(new SessionPlugin({
name: 'my-session',
persistCookies: true,
persistLocalStorage: true,
extractAfterNavigation: true, // Auto-save after each navigation
applyBeforeNavigation: true // Auto-apply before each navigation
}));
// Session management is now automatic
const browser = await PuppeteerExtends.getBrowser();
const page = await browser.newPage();
// Navigate anywhere - session will be applied/extracted automatically
await PuppeteerExtends.goto(page, 'https://example.com');
```
### Session Options
| Option | Type | Default | Description |
|--------|------|---------|-------------|
| `sessionDir` | string | `"./tmp/puppeteer-extends/sessions"` | Directory to store session files |
| `name` | string | `"default"` | Session name (used for file name) |
| `persistCookies` | boolean | `true` | Whether to save/load cookies |
| `persistLocalStorage` | boolean | `true` | Whether to save/load localStorage |
| `persistSessionStorage` | boolean | `false` | Whether to save/load sessionStorage |
| `persistUserAgent` | boolean | `true` | Whether to save/load userAgent |
| `domains` | string[] | `undefined` | Domains to include when saving cookies |
| `extractAfterNavigation`* | boolean | `true` | Save session after each navigation |
| `applyBeforeNavigation`* | boolean | `true` | Apply session before each navigation |
*Only available in SessionPlugin
## Event System
puppeteer-extends v1.7.0 includes a powerful event system that allows you to react to various lifecycle events:
```typescript
import { PuppeteerExtends, Events, PuppeteerEvents } from 'puppeteer-extends';
// Listen for browser creation
Events.on(PuppeteerEvents.BROWSER_CREATED, (params) => {
console.log(`Browser created with ID: ${params.instanceId}`);
});
// Listen for navigation events
Events.on(PuppeteerEvents.NAVIGATION_STARTED, (params) => {
console.log(`Navigation started to: ${params.url}`);
});
Events.on(PuppeteerEvents.NAVIGATION_SUCCEEDED, (params) => {
console.log(`Successfully navigated to: ${params.url}`);
});
// Listen for errors
Events.on(PuppeteerEvents.ERROR, (params) => {
console.error(`Error in ${params.source}: ${params.error.message}`);
});
// Use one-time listeners
Events.once(PuppeteerEvents.PAGE_CREATED, (params) => {
console.log('First page created!');
});
// Normal usage of puppeteer-extends will trigger these events
const browser = await PuppeteerExtends.getBrowser();
const page = await browser.newPage();
await PuppeteerExtends.goto(page, 'https://example.com');
```
### Supported Events
| Event Type | Description |
|------------|-------------|
| `BROWSER_CREATED` | Browser instance created |
| `BROWSER_CLOSED` | Browser instance closed |
| `PAGE_CREATED` | New page created |
| `PAGE_CLOSED` | Page closed |
| `NAVIGATION_STARTED` | Navigation to URL started |
| `NAVIGATION_SUCCEEDED` | Navigation completed successfully |
| `NAVIGATION_FAILED` | Navigation failed after retries |
| `ERROR` | Generic error occurred |
| `NAVIGATION_ERROR` | Error during navigation |
| `BROWSER_ERROR` | Error in browser operations |
| `PAGE_ERROR` | Error in page operations |
| `SESSION_APPLIED` | Session data applied to page |
| `SESSION_EXTRACTED` | Session data extracted from page |
| `SESSION_CLEARED` | Session data cleared |
| `PLUGIN_REGISTERED` | Plugin registered |
| `PLUGIN_UNREGISTERED` | Plugin unregistered |
### Asynchronous Event Handling
You can use asynchronous event handlers and wait for them:
```typescript
// Register async event handler
Events.on(PuppeteerEvents.NAVIGATION_SUCCEEDED, async (params) => {
// Do something asynchronous
await saveToDatabase(params.url);
});
// Emit event and wait for all handlers to complete
await Events.emitAsync(PuppeteerEvents.CUSTOM_EVENT, { data: 'value' });
```
## Captcha Handling
puppeteer-extends v1.7.0 introduces sophisticated captcha detection and solving capabilities:
```typescript
import { PuppeteerExtends, CaptchaService } from 'puppeteer-extends';
import { CaptchaPlugin } from 'puppeteer-extends/plugins';
// Register the captcha plugin with automatic handling
await PuppeteerExtends.registerPlugin(new CaptchaPlugin({
service: CaptchaService.TWOCAPTCHA, // or CaptchaService.ANTICAPTCHA
apiKey: 'your-api-key',
autoDetect: true, // Auto-detect captchas on page load
autoSolve: true // Auto-solve detected captchas
}));
// Get browser and create page
const browser = await PuppeteerExtends.getBrowser();
const page = await browser.newPage();
// Navigate to a page - captchas will be automatically detected and solved
await PuppeteerExtends.goto(page, 'https://example.com/with-captcha');
// You can also handle captchas manually
const plugin = PuppeteerExtends.getPlugin('captcha-plugin');
const captchaHelper = plugin.getCaptchaHelper();
// Detect captchas
const captchas = await captchaHelper.detectCaptchas(page);
// Solve a specific captcha
if (captchas.recaptchaV2.length > 0) {
const solution = await captchaHelper.solveRecaptchaV2(page, captchas.recaptchaV2[0]);
console.log(`Solved captcha: ${solution.token}`);
}
```
### Supported Captcha Types
| Type | Description |
|------|-------------|
| `RECAPTCHA_V2` | Standard and invisible reCAPTCHA v2 |
| `RECAPTCHA_V3` | Score-based reCAPTCHA v3 |
| `HCAPTCHA` | hCaptcha challenges |
| `IMAGE_CAPTCHA` | Traditional text/image captchas |
| `FUNCAPTCHA` | Arkose Labs FunCaptcha |
| `TURNSTILE` | Cloudflare Turnstile |
### Supported Services
| Service | Description |
|---------|-------------|
| `TWOCAPTCHA` | [2Captcha](https://2captcha.com/) service |
| `ANTICAPTCHA` | [Anti-Captcha](https://anti-captcha.com/) service |
### CaptchaPlugin Options
| Option | Type | Default | Description |
|--------|------|---------|-------------|
| `service` | CaptchaService | - | Captcha solving service to use |
| `apiKey` | string | - | API key for the service |
| `autoDetect` | boolean | `true` | Auto-detect captchas on page load |
| `autoSolve` | boolean | `true` | Auto-solve detected captchas |
| `timeout` | number | `120` | Timeout for solving (seconds) |
| `ignoreUrls` | string[] | `[]` | URLs to ignore when detecting captchas |
## API Reference
### PuppeteerExtends
| Method | Description |
|--------|-------------|
| `getBrowser(options?)` | Get or create browser instance |
| `closeBrowser(instanceId?)` | Close specific browser |
| `closeAllBrowsers()` | Close all browser instances |
| `goto(page, url, options?)` | Navigate to URL with enhanced features |
| `waitForNavigation(page, options?)` | Wait for navigation to complete |
| `waitForSelector(page, selector, timeout?)` | Wait for element to be visible |
| `closePage(page)` | Close page safely |
### BrowserOptions
| Option | Type | Default | Description |
|--------|------|---------|-------------|
| `isHeadless` | boolean | `true` | Launch browser in headless mode |
| `isDebug` | boolean | `false` | Enable debug logging |
| `customArguments` | string[] | `DEFAULT_BROWSER_ARGS` | Custom browser launch arguments |
| `userDataDir` | string | `./tmp/puppeteer-extends` | User data directory path |
| `instanceId` | string | `"default"` | Browser instance identifier |
### NavigationOptions
| Option | Type | Default | Description |
|--------|------|---------|-------------|
| `waitUntil` | string[] | `["load", "networkidle0"]` | Navigation completion events |
| `timeout` | number | `30000` | Navigation timeout (ms) |
| `maxRetries` | number | `1` | Maximum retry attempts |
| `retryDelay` | number | `5000` | Delay between retries (ms) |
| `isDebug` | boolean | `false` | Enable debug logging |
| `headers` | object | `{}` | Custom request headers |
## Testing
```bash
# Run all tests
pnpm run test
# Run with coverage
pnpm run test:coverage
# Run in watch mode
pnpm run test:watch
```
## License
[Apache-2.0](https://opensource.org/licenses/Apache-2.0)