bedetheque-scraper
Version:
NodeJS script to scrap the entire database of dbgest.com / bedetheque.com (approx. 260.000+ albums)
102 lines (84 loc) • 3.15 kB
Markdown
[![NPM Version][npm-image]][npm-url]
[![NPM Downloads][downloads-image]][downloads-url]
[![Dependency Status][david-image]][david-url]
[![devDependency Status][david-dev-image]][david-dev-url]
NodeJS script to scrap the entire database of [bdgest.com](https://www.bdgest.com/) / [bedetheque.com](https://www.bedetheque.com/). (approx. 40.000+ series, 260.000+ albums)
<img src="https://www.bdgest.com/skin/logo_bdgest_250.png">
It fetches a free proxy list with low timeout, then procede to scrape all comic series letter by letter from bedetheque.com.
The DataBase will be written in the folder database/series.json
It will retry 5 times by serie until the serie is scraped. You can rerun this script after completion to scrape series that were not scraped.
```bash
npm install bedetheque-scraper --save
```
```typescript
const { Scraper } = require('bedetheque-scraper');
// or using CommonJS
import { Scraper } from 'bedetheque-scraper';
const scraper = new Scraper();
```
```json
{
"10739": {
"serieId": 10739,
"serieTitle": "Le roi des mouches",
"albums": {
"42297": {
"serieId": 10739,
"albumId": 42297,
"albumTitle": "Hallorave",
"imageCover": "Couv_42297.jpg",
"imageExtract": "roidesmouches01p.jpg",
"imageReverse": "Verso_42297.jpg",
"voteAverage": 4.4,
"voteCount": 65,
"scenario": "Pirus, Michel",
"drawing": "Mezzo",
"colors": "Ruby",
"date": "01/2005",
"editor": "Albin Michel",
"nbrOfPages": 62
},
...
}
},
"3": {
"serieId": 3,
"serieTitle": "De Cape et de Crocs",
"albums": { ... }
},
...
}
```
```typescript
// imageCoverLarge: https://www.bedetheque.com/media/Couvertures/${imageCover}
// imageCoverSmall: https://www.bedetheque.com/cache/thb_couv/${imageCover}
public imageCover: string | null;
// imageExtractLarge: https://www.bedetheque.com/media/Planches/${imageExtract}
// imageExtractSmall: https://www.bedetheque.com/cache/thb_planches/${imageExtract}
public imageExtract: string | null;
// imageReverseLarge: https://www.bedetheque.com/media/Versos/${imageReverse}
// imageReverseSmall: https://www.bedetheque.com/cache/thb_versos/${imageReverse}
public imageReverse: string | null;
```
- [ ] scrap album number
- [ ] scrap serie description
- [ ] scrap serie recommendations
- [ ] scrap serie popularity
- [ ] use async fs read / write with lock for no conflict
[](LICENSE)
[]: https://img.shields.io/npm/v/bedetheque-scraper.svg
[]: https://npmjs.com/package/bedetheque-scraper
[]: https://david-dm.org/givka/bedetheque-scraper/dev-status.svg
[]: https://david-dm.org/givka/bedetheque-scraper?type=dev
[]: https://david-dm.org/givka/bedetheque-scraper.svg
[]: https://david-dm.org/givka/bedetheque-scraper
[]: https://img.shields.io/npm/dm/bedetheque-scraper.svg
[]: https://npmjs.org/package/bedetheque-scraper