UNPKG

@tricoteuses/assemblee

Version:

Retrieve, clean up & handle French Assemblée nationale's open data

136 lines (98 loc) 5.73 kB
# Tricoteuses-Assemblee ## _Retrieve, clean up & handle French Assemblée nationale's open data_ _Tricoteuses Légifrance_ is free and open source software. - [software repository](https://git.tricoteuses.fr/logiciels/tricoteuses-assemblee) - [GNU Affero General Public License version 3 or greater](https://git.tricoteuses.fr/logiciels/tricoteuses-assemblee/-/tree/master/LICENSE.md) ## documentation - [Architecture](doc/architecture.md) - [Browser Usage](doc/BROWSER_USAGE.md) - Using this package in browser/Vite projects ## Installation ```bash git clone https://git.tricoteuses.fr/logiciels/tricoteuses-assemblee cd tricoteuses-assemblee/ ``` ```bash npm install ``` ## Download and clean data ### Basic usage Create a directory to store the data, then run the following command to download, reorganize and clean the data. ```bash mkdir ../assemblee-data/ npm run data:download ../assemblee-data ``` ### Available Commands - `npm run data:download <dir>`: Download, reorganize, and clean data - `npm run data:retrieve_open_data <dir>`: Download raw data files. - `npm run data:reorganize_data <dir>`: Reorganize raw files by entity. - `npm run data:clean_data <dir>`: Clean and validate reorganized files. - `npm run data:retrieve_deputes_photos <dir>`: Retrieval of députés' pictures from Assemblée nationale's website - `npm run data:retrieve_senateurs_photos <dir>`: Retrieval of sénateurs' pictures from Assemblée nationale's website - `npm run data:retrieve_documents <dir>`: Retrieval of legislative documents from Assemblée nationale's website - `npm run data:retrieve_pending_amendements <dir>`: Retrieval of pending amendments from Assemblée nationale's website (waiting to be processed by Assemblée services) _Notes_: - Reorganized files (generated by the _data:reorganize_data_ command) are also available in [Tricoteuses / Data / Données brutes de l'Assemblée](https://git.en-root.org/tricoteuses/data/assemblee-brut). They are updated on a regular basis. - Split & cleaned files (generated by the _data:clean_data_ command) are also available in [Tricoteuses / Data / Données nettoyées de l'Assemblée](https://git.en-root.org/tricoteuses/data/assemblee-nettoye) with the `_nettoye` suffix. They are updated on a regular basis. ### Filtering Options Downloading and cleaning all the data is long and takes up a lot of disk space. It is possible to choose the type of data that you want to retrieve to reduce the load. Examples: ```bash # Only download amendments npm run data:download ../assemblee-data -- -k Amendements # Only process 16th and 17th legislatures npm run data:download ../assemblee-data -- -l 16 -l 17 ``` ### Common Options - `--categories` or `-k <name>`: Filter by dataset categories (Available options : `ActeursEtOrganes`, `Agendas`, `Amendements`, `DossiersLegislatifs`, `Photos`, `Scrutins`, `Questions`, `ComptesRendus`) - `--legislature` or `-l <number>`: Specify one or more legislatures to process (e.g., `-l 15 -l 16`) - `--dataDir <path>` (Mandatory): Path to the working directory where all data is stored (required) - `--silent` or `-s`: Disable logging - `--verbose` or `-v`: Enable verbose logging - `--fetch` or `-f`: Force re-download of data even if already present - `--commit` or `-c`: Automatically commit cleaned data - `--pull` or `-p`: Pull repositories before starting - `--clone` or `-C <url>`: Clone Git repositories from a remote group or organization - `--remote` or `-r <name>`: Push commits to specified Git remote(s) - `--keepDir`: Keep Dir (Implement before cleaning data) - `--only-recent` (number): If files are already present, skip files that are above the specified number of days and skip old legislatures (e.g. `-only-recent 30`) If you use such options, use them in all subsequent commands too (_data:regorganize_data_ and _data:clean_data_). ### Options for Cleaning Data - `--dataset` or `-d <name>`: Clean a specific dataset only - `--no-reset-after-commit`: Skip Git reset after committing (useful to preserve local changes) - `--no-validate` or `-V`: Skip schema validation during cleaning - `--fetchDocuments` : Specify to retrieve documents - `--parseDocuments`: Specify to parse documents into cleaned json - `--fetchVideos`: Retrieve videos - `--fetchCrCommissions`: Retrieve and parse CR commissions ### Options for Retrieving Documents - `--full` or `-f`: Retrieve all documents, even those already downloaded - `--document-type` or `-T <type>`: Restrict to specific document types (e.g., `PION`) ## Download using Docker A Docker image that downloads and cleans the data all at once is available. Build it locally or run it from the container registry. Use the environment variables `LEGISLATURE` and `CATEGORIES` if needed. ```bash docker run --pull always --name tricoteuses-assemblee -v ../assemblee-data:/app/assemblee-data -e LEGISLATURE=17 -d git.tricoteuses.fr/logiciels/tricoteuses-assemblee:latest ``` ## Using the data Once the data is downloaded and cleaned, you can use loaders to retrieve it. To use loaders in your project, you can install the _@tricoteuses/assemblee_ package, and import the iterator functions that you need. ```bash npm install @tricoteuses/assemblee ``` ```js import { iterLoadAssembleeActeurs, iterLoadAssembleeOrganes, iterLoadAssembleeReunions, iterLoadAssembleeScrutins, iterLoadAssembleeDocuments, iterLoadAssembleeDossiersParlementaires, iterLoadAssembleeAmendements, iterLoadAssembleeQuestions, iterLoadAssembleeComptesRendus, } from "@tricoteuses/assemblee/loaders" // Pass data directory and legislature as arguments for (const { acteur } of iterLoadAssembleeActeurs("../assemblee-data", 17)) { console.log(acteur.uid) } ```