@tricoteuses/assemblee

Version:

Retrieve, clean up & handle French Assemblée nationale's open data

tricoteuses.fr

112 lines (80 loc) • 3.91 kB

Markdown

# Tricoteuses-Assemblee ## _Retrieve, clean up & handle French Assemblée nationale's open data_ ## Requirements - Node >= 18 ## Installation ```bash git clone https://git.en-root.org/tricoteuses/tricoteuses-assemblee cd tricoteuses-assemblee/ ``` ```bash npm install ``` ## Download and clean data ### Basic usage Create a folder where the data will be downloaded and run the following command to download, reorganize and clean the data. ```bash mkdir ../assemblee-data/ # Download and clean open data npm run data:download ../assemblee-data ``` Data from other sources is also available : ```bash # Retrieval of députés' pictures from Assemblée nationale's website npm run data:retrieve_deputes_photos ../assemblee-data # Retrieval of sénateurs' pictures from Assemblée nationale's website npm run data:retrieve_senateurs_photos ../assemblee-data # Retrieval of pending amendments from Assemblée nationale's website (waiting to be processed by Assemblée services) npm run data:retrieve_pending_amendements ../assemblee-data ``` _Notes_: - Reorganized files (generated by the *data:reorganize_data* command) are also available in [Tricoteuses / Data / Données brutes de l'Assemblée](https://git.en-root.org/tricoteuses/data/assemblee-brut). They are updated on a regular basis. - Split & cleaned files (generated by the *data:clean_data* command) are also available in [Tricoteuses / Data / Données nettoyées de l'Assemblée](https://git.en-root.org/tricoteuses/data/assemblee-nettoye) with the `_nettoye` suffix. They are updated on a regular basis. ### Filtering options Downloading and cleaning all the data is long and takes up a lot of disk space. It is possible to choose the type of data that you want to retrieve to reduce the load. To download only a type of dataset, use the *--categories* option (shortcut *-k*) : ```bash # Available options : ActeursEtOrganes, Agendas, Amendements, DossiersLegislatifs, Photos, Scrutins, Questions, ComptesRendusSeances npm run data:download ../assemblee-data -- --categories Amendements ``` To download only a specific legislature, use the *--legislature* option (shortcut *-l*): ```bash # Available options : 14, 15, 16, 17 npm run data:download ../assemblee-data -- --legislature 17 ``` If you use such options, use them in all subsequent commands too (*data:regorganize_data* and *data:clean_data*). ## Download using Docker A Docker image that downloads and cleans the data all at once is available. Build it locally or pull it from the container registry : ```bash docker pull registry.en-root.org/tricoteuses/tricoteuses-assemblee:latest ``` Create a volume to download the data and use the environment variables `LEGISLATURE` and `CATEGORIES` if needed : ```bash docker volume create assemblee-data docker run --name tricoteuses-assemblee -v assemblee-data:/app/assemblee -e LEGISLATURE=17 -d registry.en-root.org/tricoteuses/tricoteuses-assemblee:latest ``` ## Using the data Once the data is downloaded and cleaned, you can use loaders to retrieve it. To use loaders in your project, you can install the *@tricoteuses/assemblee* package, and import the iterator functions that you need. ```bash npm install @tricoteuses/assemblee ``` ```js import { iterLoadAssembleeActeurs, iterLoadAssembleeOrganes, iterLoadAssembleeReunions, iterLoadAssembleeScrutins, iterLoadAssembleeDocuments, iterLoadAssembleeDossiersParlementaires, iterLoadAssembleeAmendements, iterLoadAssembleeQuestions, iterLoadAssembleeComptesRendus, } from "@tricoteuses/assemblee/lib/loaders"; // Pass data directory and legislature as arguments for (const { acteur } of iterLoadAssembleeActeurs("../assemblee-data", 17)) { console.log(acteur.uid) } ``` ## Generating schemas and documentation (for contributors only) View instructions [here](https://git.en-root.org/tricoteuses/tricoteuses-assemblee/-/blob/master/src/types)