easy_web_crawler

Version:

Web crawler wrapper around puppeteer module to simply the crawling on ajax/java script enabled pages.

github.com/vivekg13186/simple_web_crawler

54 lines (36 loc) • 1.18 kB

Markdown

# easy_web_crawler [![Gitter chat](https://badges.gitter.im/gitterHQ/gitter.png)](https://gitter.im/easy_web_crawler/Lobby) Web crawler around puppeteer to crawler ajax/java script enabled pages.Check out example folder for how to use # Features! - Support crawling of javascript/ajax pages - url filter - avoid duplicate urls - delay before page load - custom data extraction - build in spider - stop and resume the crawling - fast image download # Documentation [Read full documentation here](https://vivekg13186.github.io/easy_web_crawler/1.0.5/) ### USAGE ``` var Scraper = require("easy_web_crawler") async function main() { var scraper = new Scraper(); scraper.startWithURLs("start_url") scraper.allowIfMatches(function (url) { <<some true false logic here>> }) scraper.enableAutoCrawler(true) scraper.saveProgressInFile("hello.db") scraper.waitBetweenPageLoad(0) scraper.callbackOnPageLoad(async function (page) { <<logic here>> }); scraper.callbackOnFinish(function (result) { console.log(JSON.stringify(result,null,4)) }) await scraper.start() } main() ``` License ---- MIT