@tricoteuses/senat

Version:

Handle French Sénat's open data

tricoteuses.fr

116 lines (79 loc) • 3.5 kB

Markdown

# Tricoteuses-Senat ## _Retrieve, clean up & handle French Sénat's open data_ ## Requirements - Node >= 22 ## Installation ```bash git clone https://git.tricoteuses.fr/logiciels/tricoteuses-senat cd tricoteuses-senat/ ``` Create a `.env` file to set PostgreSQL database informations and other configuration variables (you can use `example.env` as a template). Then ```bash npm install ``` ### Database creation (not needed if downloading with Docker image) #### Using Docker ```bash docker run --name local-postgres -d -p 5432:5432 -e POSTGRES_PASSWORD=$YOUR_CUSTOM_DB_PASSWORD postgres # Default Postgres user is postgres # But scripts require an "opendata" role docker exec -it local-postgres psql -U postgres -c "CREATE ROLE opendata;" ``` ## Download data Create a folder where the data will be downloaded and run the following command to download the data and convert it into JSON files. ```bash mkdir ../senat-data/ # Available options for optional `categories` parameter : All, Ameli, Debats, DosLeg, Questions, Sens npm run data:download ../senat-data -- [--categories All] ``` Data from other sources is also available : ```bash # Retrieval of textes and rapports from Sénat's website # Available options for optional `formats` parameter : xml, html, pdf # Available options for optional `types` parameter : textes, rapports npm run data:retrieve_documents ../senat-data -- --fromSession 2022 [--formats xml pdf] [--types textes] # Retrieval & parsing (textes in xml format only for now) npm run data:retrieve_documents ../senat-data -- --fromSession 2022 --parseDocuments # Parsing only npm run data:parse_textes_lois ../senat-data # Retrieval (& parsing) of agenda from Sénat's website npm run data:retrieve_agenda ../senat-data -- --fromSession 2022 [--parseAgenda] # Retrieval (& parsing) of comptes-rendus des débats from Sénat's website npm run data:retrieve_comptes_rendus ../senat-data -- [--parseDebats] # Retrieval of sénateurs' pictures from Sénat's website npm run data:retrieve_senateurs_photos ../senat-data ``` ## Data download using Docker A Docker image that downloads and converts the data all at once is available. Build it locally or run it from the container registry. Use the environment variables `FROM_SESSION` and `CATEGORIES` if needed. ```bash docker run --pull always --name tricoteuses-senat -v ../senat-data:/app/senat-data -d git.tricoteuses.fr/logiciels/tricoteuses-senat:latest ``` Use the environment variable `CATEGORIES` and `FROM_SESSION` if needed. ## Using the data Once the data is downloaded, you can use loaders to retrieve it. To use loaders in your project, you can install the _@tricoteuses/senat_ package, and import the iterator functions that you need. ```bash npm install @tricoteuses/senat ``` ```js import { iterLoadSenatQuestions } from "@tricoteuses/senat/loaders" // Pass data directory and legislature as arguments for (const { item: question } of iterLoadSenatQuestions("../senat-data", 17)) { console.log(question.id) } ``` ## Generation of raw types from SQL schema (for contributors only) ```bash npm run data:generate_schemas ../senat-data ``` ## Publishing To publish a new version of this package onto npm, bump the package version and publish. ```bash npm version x.y.z # Bumps version in package.json and creates a new tag x.y.z npm publish ``` The Docker image will be automatically built during a CI Workflow if you push the tag to the remote repository. ```bash git push --tags ```