@tricoteuses/assemblee
Version:
Retrieve, clean up & handle French Assemblée nationale's open data
112 lines (80 loc) • 3.91 kB
Markdown
# Tricoteuses-Assemblee
## _Retrieve, clean up & handle French Assemblée nationale's open data_
## Requirements
- Node >= 18
## Installation
```bash
git clone https://git.en-root.org/tricoteuses/tricoteuses-assemblee
cd tricoteuses-assemblee/
```
```bash
npm install
```
## Download and clean data
### Basic usage
Create a folder where the data will be downloaded and run the following command to download, reorganize and clean the data.
```bash
mkdir ../assemblee-data/
# Download and clean open data
npm run data:download ../assemblee-data
```
Data from other sources is also available :
```bash
# Retrieval of députés' pictures from Assemblée nationale's website
npm run data:retrieve_deputes_photos ../assemblee-data
# Retrieval of sénateurs' pictures from Assemblée nationale's website
npm run data:retrieve_senateurs_photos ../assemblee-data
# Retrieval of pending amendments from Assemblée nationale's website (waiting to be processed by Assemblée services)
npm run data:retrieve_pending_amendements ../assemblee-data
```
_Notes_:
- Reorganized files (generated by the *data:reorganize_data* command) are also available in [Tricoteuses / Data / Données brutes de l'Assemblée](https://git.en-root.org/tricoteuses/data/assemblee-brut). They are updated on a regular basis.
- Split & cleaned files (generated by the *data:clean_data* command) are also available in [Tricoteuses / Data / Données nettoyées de l'Assemblée](https://git.en-root.org/tricoteuses/data/assemblee-nettoye) with the `_nettoye` suffix. They are updated on a regular basis.
### Filtering options
Downloading and cleaning all the data is long and takes up a lot of disk space. It is possible to choose the type of data that you want to retrieve to reduce the load.
To download only a type of dataset, use the *--categories* option (shortcut *-k*) :
```bash
# Available options : ActeursEtOrganes, Agendas, Amendements, DossiersLegislatifs, Photos, Scrutins, Questions, ComptesRendusSeances
npm run data:download ../assemblee-data -- --categories Amendements
```
To download only a specific legislature, use the *--legislature* option (shortcut *-l*):
```bash
# Available options : 14, 15, 16, 17
npm run data:download ../assemblee-data -- --legislature 17
```
If you use such options, use them in all subsequent commands too (*data:regorganize_data* and *data:clean_data*).
## Download using Docker
A Docker image that downloads and cleans the data all at once is available. Build it locally or pull it from the container registry :
```bash
docker pull registry.en-root.org/tricoteuses/tricoteuses-assemblee:latest
```
Create a volume to download the data and use the environment variables `LEGISLATURE` and `CATEGORIES` if needed :
```bash
docker volume create assemblee-data
docker run --name tricoteuses-assemblee -v assemblee-data:/app/assemblee -e LEGISLATURE=17 -d registry.en-root.org/tricoteuses/tricoteuses-assemblee:latest
```
## Using the data
Once the data is downloaded and cleaned, you can use loaders to retrieve it.
To use loaders in your project, you can install the *@tricoteuses/assemblee* package, and import the iterator functions that you need.
```bash
npm install @tricoteuses/assemblee
```
```js
import {
iterLoadAssembleeActeurs,
iterLoadAssembleeOrganes,
iterLoadAssembleeReunions,
iterLoadAssembleeScrutins,
iterLoadAssembleeDocuments,
iterLoadAssembleeDossiersParlementaires,
iterLoadAssembleeAmendements,
iterLoadAssembleeQuestions,
iterLoadAssembleeComptesRendus,
} from "@tricoteuses/assemblee/lib/loaders";
// Pass data directory and legislature as arguments
for (const { acteur } of iterLoadAssembleeActeurs("../assemblee-data", 17)) {
console.log(acteur.uid)
}
```
## Generating schemas and documentation (for contributors only)
View instructions [here](https://git.en-root.org/tricoteuses/tricoteuses-assemblee/-/blob/master/src/types)