@tricoteuses/senat
Version:
Handle French Sénat's open data
116 lines (79 loc) • 3.5 kB
Markdown
# Tricoteuses-Senat
## _Retrieve, clean up & handle French Sénat's open data_
## Requirements
- Node >= 22
## Installation
```bash
git clone https://git.tricoteuses.fr/logiciels/tricoteuses-senat
cd tricoteuses-senat/
```
Create a `.env` file to set PostgreSQL database informations and other configuration variables (you can use `example.env` as a template). Then
```bash
npm install
```
### Database creation (not needed if downloading with Docker image)
#### Using Docker
```bash
docker run --name local-postgres -d -p 5432:5432 -e POSTGRES_PASSWORD=$YOUR_CUSTOM_DB_PASSWORD postgres
# Default Postgres user is postgres
# But scripts require an "opendata" role
docker exec -it local-postgres psql -U postgres -c "CREATE ROLE opendata;"
```
## Download data
Create a folder where the data will be downloaded and run the following command to download the data and convert it into JSON files.
```bash
mkdir ../senat-data/
# Available options for optional `categories` parameter : All, Ameli, Debats, DosLeg, Questions, Sens
npm run data:download ../senat-data -- [--categories All]
```
Data from other sources is also available :
```bash
# Retrieval of textes and rapports from Sénat's website
# Available options for optional `formats` parameter : xml, html, pdf
# Available options for optional `types` parameter : textes, rapports
npm run data:retrieve_documents ../senat-data -- --fromSession 2022 [--formats xml pdf] [--types textes]
# Retrieval & parsing (textes in xml format only for now)
npm run data:retrieve_documents ../senat-data -- --fromSession 2022 --parseDocuments
# Parsing only
npm run data:parse_textes_lois ../senat-data
# Retrieval (& parsing) of agenda from Sénat's website
npm run data:retrieve_agenda ../senat-data -- --fromSession 2022 [--parseAgenda]
# Retrieval (& parsing) of comptes-rendus des débats from Sénat's website
npm run data:retrieve_comptes_rendus ../senat-data -- [--parseDebats]
# Retrieval of sénateurs' pictures from Sénat's website
npm run data:retrieve_senateurs_photos ../senat-data
```
## Data download using Docker
A Docker image that downloads and converts the data all at once is available. Build it locally or run it from the container registry.
Use the environment variables `FROM_SESSION` and `CATEGORIES` if needed.
```bash
docker run --pull always --name tricoteuses-senat -v ../senat-data:/app/senat-data -d git.tricoteuses.fr/logiciels/tricoteuses-senat:latest
```
Use the environment variable `CATEGORIES` and `FROM_SESSION` if needed.
## Using the data
Once the data is downloaded, you can use loaders to retrieve it.
To use loaders in your project, you can install the _@tricoteuses/senat_ package, and import the iterator functions that you need.
```bash
npm install @tricoteuses/senat
```
```js
import { iterLoadSenatQuestions } from "@tricoteuses/senat/loaders"
// Pass data directory and legislature as arguments
for (const { item: question } of iterLoadSenatQuestions("../senat-data", 17)) {
console.log(question.id)
}
```
## Generation of raw types from SQL schema (for contributors only)
```bash
npm run data:generate_schemas ../senat-data
```
## Publishing
To publish a new version of this package onto npm, bump the package version and publish.
```bash
npm version x.y.z # Bumps version in package.json and creates a new tag x.y.z
npm publish
```
The Docker image will be automatically built during a CI Workflow if you push the tag to the remote repository.
```bash
git push --tags
```