UNPKG

target-clickhouse

Version:
108 lines (70 loc) 3.6 kB
# Target Clickhouse A [Singer](https://singer.io/) target for Clickhouse, for use with Singer streams generated by Singer taps, written in node js using [singer-node](https://www.npmjs.com/package/singer-node). ## Usage ### Install #### As npm package on host `npm install -g target-clickhouse` #### Docker image `docker pull ghcr.io/biron-bi/target-clickhouse` [Registry page](https://github.com/Biron-BI/singer-target-clickhouse/pkgs/container/target-clickhouse) ### Run 1. Create a [config file](#configjson) `config.json` with connection information and ingestion parameters. ```json { "host": "localhost", "port": 8123, "database": "destination_database", "username": "user", "password": "averysecurepassword" } ``` 2. Run `target-clickhouse` against a [Singer](https://singer.io) tap. *In the following exemples:* * *We echo state at the end of a 'state.jsonl' file* * *The file current_state.json contains last line of state.jsonl* * *The file config.json contains clickhouse connection informations* **Npm package:** ```bash <tap-anything> --state current_state.json | target-clickhouse --config config.json >> state.jsonl ``` **Docker:** *In this exemple, container reads config file in a `/config` directory* ```bash <tap-anything> --state current_state.json | docker run --rm -i -a STDIN -a STDOUT -a STDERR -v "$(pwd):/config:ro" ghcr.io/biron-bi/target-clickhouse --config /config/config.json >> state.jsonl ``` ### Config.json The fields available to be specified in the config file. #### Mandatory fields * `host` * `port` * `username` * `password` * `database` #### Optional fields * `logging_level` Default to `"INFO"` * `subtable_separator` Default to `"__"` * `translate_values`: Whether fields should be parsed again to allow conversion of specific values, e.g. `True` accepted as `true`. Default `false` * `batch_size`: Amount of records to read before sending to clickhouse. Default `100` * `finalize_concurrency`: Amount of concurrent stream ingestion finalisation. Default `3` * `extra_active_tables`: List of tables that are considered active even if not present in ACTIVE_STREAMS message. Default `[]` finalize_concurrency ## Singer specification extension Several features are supported that are not standard to the singer Spec: * **Update schemas** : Pass the repeatable CLI option ` --update-streams <stream>` to specify streams for which you want to recreate tables (root and children). * **Clean first** : Specify `clean_first: true` in SCHEMA messages to wipe table content before each ingestion. * **Cleaning column** : Specify `cleaning_column: "<column_name>"` in SCHEMA messages to wipe table content that matches column value during ingestion. For instance, if column "date" is specified as cleaning column, and the value "2022-01-01" is encountered in a record, all rows with values "2022-01-01" are replaced with those contained in the stream * **All key properties** : Specify `all_key_properties: {props: [], children: {}}` in SCHEMA messages to specify primary keys for all children of a root table. This will allow children to create a foreign key to their parent (with the format `_parent_<column>`) ## Sponsorship Target Clickhouse is written and maintained by **Biron** https://birondata.com/ ## Acknowledgements Special thanks to the people who built * [singer](https://github.com/singer-io/getting-started) * [target-postgres](https://github.com/datamill-co/target-postgres) * [immutable-js](https://immutable-js.com/) ## License Distributed under the AGPLv3