scrapegraph-js
Version:
Scrape and extract structured data from a webpage using ScrapeGraphAI's APIs. Supports cookies for authentication, infinite scrolling, and pagination.
245 lines (187 loc) • 6.21 kB
Markdown
in a single request. This is particularly useful for e-commerce sites, search results, news feeds, and other paginated content.
```javascript
import { smartScraper } from 'scrapegraph-js';
const apiKey = process.env.SGAI_APIKEY;
const url = 'https://example.com/products';
const prompt = 'Extract all product information';
const totalPages = 5; // Scrape 5 pages
const result = await smartScraper(apiKey, url, prompt, null, null, totalPages);
```
```javascript
import { smartScraper } from 'scrapegraph-js';
import { z } from 'zod';
const ProductSchema = z.object({
products: z.array(z.object({
name: z.string(),
price: z.string(),
rating: z.string().optional(),
})),
});
const result = await smartScraper(
apiKey,
url,
prompt,
ProductSchema,
null,
3 // 3 pages
);
```
```javascript
const result = await smartScraper(
apiKey,
url,
prompt,
null,
10, // 10 scrolls per page
2 // 2 pages
);
```
```javascript
const result = await smartScraper(
apiKey,
url,
prompt,
ProductSchema,
5, // numberOfScrolls
3 // totalPages
);
```
```javascript
smartScraper(apiKey, url, prompt, schema, numberOfScrolls, totalPages)
```
### Parameters
- `apiKey` (string): Your ScrapeGraph AI API key
- `url` (string): The URL of the webpage to scrape
- `prompt` (string): Natural language prompt describing what data to extract
- `schema` (Object, optional): Zod schema object defining the output structure
- `numberOfScrolls` (number, optional): Number of times to scroll the page (0-100)
- `totalPages` (number, optional): Number of pages to scrape (1-10)
### Parameter Validation
- `totalPages` must be an integer between 1 and 10
- `numberOfScrolls` must be an integer between 0 and 100
- Both parameters are optional and default to `null`
## Examples
### E-commerce Product Scraping
```javascript
import { smartScraper } from 'scrapegraph-js';
import { z } from 'zod';
const ProductSchema = z.object({
products: z.array(z.object({
name: z.string(),
price: z.string(),
rating: z.string().optional(),
image_url: z.string().optional(),
})),
});
const result = await smartScraper(
process.env.SGAI_APIKEY,
'https://www.amazon.com/s?k=laptops',
'Extract all laptop products with name, price, rating, and image',
ProductSchema,
null,
5 // Scrape 5 pages of results
);
```
```javascript
const NewsSchema = z.object({
articles: z.array(z.object({
title: z.string(),
summary: z.string(),
author: z.string().optional(),
date: z.string().optional(),
})),
});
const result = await smartScraper(
process.env.SGAI_APIKEY,
'https://news.example.com',
'Extract all news articles with title, summary, author, and date',
NewsSchema,
3, // Scroll 3 times per page
4 // Scrape 4 pages
);
```
The function will throw an error if:
- `totalPages` is not an integer between 1 and 10
- `numberOfScrolls` is not an integer between 0 and 100
- API key is invalid
- Network request fails
```javascript
try {
const result = await smartScraper(apiKey, url, prompt, null, null, totalPages);
console.log('Success:', result);
} catch (error) {
if (error.message.includes('totalPages')) {
console.error('Pagination error:', error.message);
} else {
console.error('Other error:', error.message);
}
}
```
The pagination feature is fully backward compatible. All existing function calls will continue to work:
```javascript
// These all work as before
await smartScraper(apiKey, url, prompt);
await smartScraper(apiKey, url, prompt, schema);
await smartScraper(apiKey, url, prompt, schema, numberOfScrolls);
```
- Pagination requests may take significantly longer than single-page requests
- Consider using smaller `totalPages` values for testing
- Some websites may not support pagination
- Rate limiting may apply for large pagination requests
Run the pagination tests:
```bash
npm test
```
Or run specific examples:
```bash
node examples/smartScraper_pagination_example.js
node examples/smartScraper_pagination_enhanced_example.js
node examples/smartScraper_pagination_with_scroll_example.js
```
1. **Start Small**: Begin with 1-2 pages for testing
2. **Use Schemas**: Define clear schemas for structured data extraction
3. **Error Handling**: Always wrap calls in try-catch blocks
4. **Rate Limiting**: Be mindful of API rate limits with large pagination requests
5. **Website Compatibility**: Not all websites support pagination - test thoroughly
6. **Performance**: Monitor request times and adjust parameters accordingly
1. **Validation Error**: Ensure `totalPages` is between 1-10
2. **Timeout**: Try reducing `totalPages` or `numberOfScrolls`
3. **No Results**: Some websites may not support pagination
4. **Rate Limiting**: Reduce request frequency or pagination size
```javascript
console.log('Starting pagination request...');
console.log('URL:', url);
console.log('Total Pages:', totalPages);
console.log('Number of Scrolls:', numberOfScrolls);
const startTime = Date.now();
const result = await smartScraper(apiKey, url, prompt, schema, numberOfScrolls, totalPages);
const duration = Date.now() - startTime;
console.log('Request completed in:', duration, 'ms');
console.log('Result type:', typeof result);
```
For issues or questions about pagination functionality:
1. Check the examples in the `examples/` directory
2. Run the test suite with `npm test`
3. Review the error messages for specific guidance
4. Check the main SDK documentation
---
*This pagination feature is designed to work with the existing ScrapeGraph AI API and maintains full backward compatibility with existing code.*
This document describes the pagination functionality added to the ScrapeGraph JavaScript SDK.
The `smartScraper` function now supports pagination, allowing you to scrape multiple pages of content