postal-code-scraper
Version:
A tool for scraping country data, including regions and their postal codes
195 lines (138 loc) ⢠5.61 kB
Markdown
**Postal Code Scraper** is an automated web scraper designed to extract postal code data from countries worldwide. It efficiently fetches postal codes and organizes them into structured JSON files for easy use in applications.
This library uses **Puppeteer** for web scraping, **Cheerio** for HTML parsing, **p-limit** for controlling concurrency, ensuring accurate and efficient data extraction.
- Scrape **postal codes** from any country
- Scrape **all countries** in one go
- Save results as **JSON** files for easy integration
- Configurable settings (concurrency, retries, headless mode, etc.) <- read more below
- Structured **postal code lookup** generation
- **Fully asynchronous** for optimized performance
## š¦ Installation
Install via npm:
```sh
npm install postal-code-scraper
```
Or with Yarn:
```sh
yarn add postal-code-scraper
```
## š Usage Guide
### 1ļøā£ **Import the Library**
#### ES Module (Recommended):
```javascript
import { PostalCodeScraper } from "postal-code-scraper";
```
```javascript
const { PostalCodeScraper } = require("postal-code-scraper");
```
```javascript
async function scrapeSingleCountry() {
await PostalCodeScraper.scrapeCountry("Canada");
}
scrapeSingleCountry();
```
š **Output Files (saved in **``**):**
- `Canada-postal-codes.json`
- `Canada-lookup.json`
### 3ļøā£ **Scrape All Countries**
```javascript
async function scrapeAllCountries() {
await PostalCodeScraper.scrapeCountries();
}
scrapeAllCountries();
```
š This will fetch postal codes for **every available country**.
```javascript
const customScraper = new PostalCodeScraper({
concurrency: 10, // Limit concurrent requests
maxRetries: 3, // Max retries per request (if a request fails -> so we don't lose data)
headless: false, // Run Puppeteer in visible mode
usePrettyName: true, // Store data using country pretty names
logger: console // Enable console logging (default is own implemented)
directory: 'src/data' // Choose the folder where you want to save the data
});
async function run() {
await customScraper.scrapeCountry("Germany");
}
run();
```
```json
{
"cluj": {
"agarbiciu": [
"407146"
],
"aghiresu": [
"407005"
],
"cluj-napoca": [
"400001",
"400002",
"400003",
"...",
],
}
```
```json
{
"postalCodeMap": {
"337563": "tamasesti_2",
"337564": "valea_4",
"400001": "cluj-napoca_1",
"400002": "cluj-napoca_1",
"400003": "cluj-napoca_1",
},
"regions": {
"cluj-napoca_1": [
"cluj",
"cluj-napoca"
],
"tamasesti_2": [
"hunedoara",
"tamasesti"
],
"valea_4": [
"hunedoara",
"valea"
],
}
}
```
| Option | Type | Default | Description |
| --------------- | ----------------------------- | -------------------------------- | ---------------------------------------------------------------------------------------------- |
| `directory` | `string` | `src/data` | The directory to save data |
| `concurrency` | `number` | `15` | Maximum concurrent requests to process |
| `maxRetries` | `number` | `5` | Number of retries for failed requests |
| `headless` | `boolean` | `true` | Run Puppeteer in headless mode |
| `usePrettyName` | `boolean` | `false` | Use country pretty names instead of default names |
| `logger` | `object` `null` | `Logger` (custom implementation) | Handles event logging, can be set to null to disable logging |
By default, they are saved in:
```
src/data/
```
Each country has two JSON files: one with raw postal codes and another with a structured lookup.
Yes, using `scrapeCountries()`, which scrapes **all countries** automatically.
Yes, by changing the `directory` attribute in configuration.
Yes! The package includes TypeScript types for better development experience.
You, by setting the `logger` attribute in configuration to `null`.
- ā
Support for exporting data as CSV
Contributions are welcome! Feel free to submit a pull request or open an issue.
MIT License Ā© 2024