puppeteer-infinite-scroller
Version:
Provides a simple and efficient solution for scraping data loaded through infinite scrolling on web pages using Puppeteer.
106 lines (80 loc) • 3.35 kB
Markdown
# Puppeteer Infinite Scroller
[](https://www.npmjs.com/package/puppeteer-infinite-scroller)
[](https://github.com/DulanHewage/puppeteer-infinite-scroller/issues)
Puppeteer-Infinite-Scroller provides a simple and efficient solution for scraping data loaded through infinite scrolling on web pages using Puppeteer.
## Installation
You can install the package using npm:
```bash
npm install puppeteer-infinite-scroller
```
## Usage
Import the puppeteerInfiniteScroller function from the package and use it to scrape data from infinite scrolling web pages.
```javascript
const puppeteer = require("puppeteer");
const puppeteerInfiniteScroller = require('puppeteer-infinite-scroller');
(async () => {
const pageUrl = "https://infiniteajaxscroll.com/examples/blocks/";
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.setViewport({
width: 1200,
height: 800,
});
await page.goto(pageUrl);
await page.waitForSelector(".blocks .blocks__block");
const options = {
scrollDelay: 1000, // Milliseconds between scrolls
itemCount: 50, // Number of items to scrape
selector: '.blocks .blocks__block', // CSS selector for items
// OR
// pageFunction: () => { /* Custom page function for scraping */ }
};
const scrapedData = await puppeteerInfiniteScroller(page, options);
console.log(scrapedData);
await browser.close();
})();
```
## Options
The following options can be configured when using the `puppeteerInfiniteScroller` function:
- `scrollDelay` (optional): Milliseconds between scrolls. Default is 1000ms.
- `itemCount` (optional): Number of items to scrape. Default is 10.
- `selector` (optional): CSS selector for the items. Either this or `pageFunction` must be provided.
- `pageFunction` (optional): Custom function for scraping data from the page. Either this or `selector` must be provided.
## Example with pageFunction option
```javascript
const puppeteer = require("puppeteer");
const puppeteerInfiniteScroller = require('puppeteer-infinite-scroller');
(async () => {
const pageUrl = "https://infiniteajaxscroll.com/examples/blocks/";
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.setViewport({
width: 1200,
height: 800,
});
await page.goto(pageUrl);
await page.waitForSelector(".blocks .blocks__block");
function extractElements() {
const items = [];
const extractedElements = document.querySelectorAll(".blocks .blocks__block");
for (let element of extractedElements) {
items.push({
class: element.getAttribute("class"),
id: element.getAttribute("id"),
tagName: element.tagName,
});
}
return items;
}
const options = {
scrollDelay: 1000, // Milliseconds between scrolls
itemCount: 50, // Number of items to scrape
pageFunction: extractElements
};
const scrapedData = await puppeteerInfiniteScroller(page, options);
console.log(scrapedData);
await browser.close();
})();
```
License
This project is licensed under the MIT License - see the LICENSE file for details.