UNPKG

gtfs-via-postgres

Version:
36 lines (28 loc) 1.8 kB
# importing multiple datasets into one DB Using `gtfs-via-postgres`, you can import more than one dataset into a single PostgreSQL database by importing them into separate [schemas](https://www.postgresql.org/docs/14/ddl-schemas.html). You can then run queries combine or compare data from them. As an example, let's import two datasets ([Paris](https://en.wikipedia.org/wiki/Île-de-France_Mobilités)' and [Berlin](https://en.wikipedia.org/wiki/Verkehrsverbund_Berlin-Brandenburg)'s) into separate schemas: ```shell wget -U 'gtfs-via-postgres demo' -O paris.gtfs.zip 'https://eu.ftp.opendatasoft.com/stif/GTFS/IDFM-gtfs.zip' unzip -d paris.gtfs paris.gtfs.zip gtfs-to-sql --require-dependencies \ --schema paris -- paris.gtfs/*.txt \ | sponge | psql -b wget -U 'gtfs-via-postgres demo' -O berlin.gtfs.zip 'https://www.vbb.de/vbbgtfs' unzip -d berlin.gtfs berlin.gtfs.zip gtfs-to-sql --require-dependencies \ --schema berlin -- berlin.gtfs/*.txt \ | sponge | psql -b ``` We can now do queries across both datasets, for example finding the geographically furthest 2 stops: ```sql -- warning: takes a long time to compute! SELECT paris.stop_id AS paris_stop_id, berlin.stop_id AS berlin_stop_id FROM paris.stops paris, berlin.stops berlin ORDER BY paris.stop_loc <-> berlin.stop_loc DESC LIMIT 100 ``` *Note:* During an import, a function `public.gtfs_via_postgres_import_version()` gets created that returns `gtfs-via-postgres`'s version. If that function already exists (because it has been created by a previous import), its return value is compared to `gtfs-via-postgres`'s version, and if these two versions are not equal, the second import will fail. This ensures that multiple imports into the same database can only be made using the exact same `gtfs-via-postgres` version.