target-clickhouse
Version:
A Singer target for Clickhouse
108 lines (70 loc) • 3.6 kB
Markdown
# Target Clickhouse
A [Singer](https://singer.io/) target for Clickhouse, for use with Singer streams generated by Singer taps, written in node js
using [singer-node](https://www.npmjs.com/package/singer-node).
## Usage
### Install
#### As npm package on host
`npm install -g target-clickhouse`
#### Docker image
`docker pull ghcr.io/biron-bi/target-clickhouse`
[Registry page](https://github.com/Biron-BI/singer-target-clickhouse/pkgs/container/target-clickhouse)
### Run
1. Create a [config file](#configjson) `config.json` with connection information and ingestion parameters.
```json
{
"host": "localhost",
"port": 8123,
"database": "destination_database",
"username": "user",
"password": "averysecurepassword"
}
```
2. Run `target-clickhouse` against a [Singer](https://singer.io) tap.
*In the following exemples:*
* *We echo state at the end of a 'state.jsonl' file*
* *The file current_state.json contains last line of state.jsonl*
* *The file config.json contains clickhouse connection informations*
**Npm package:**
```bash
<tap-anything> --state current_state.json | target-clickhouse --config config.json >> state.jsonl
```
**Docker:**
*In this exemple, container reads config file in a `/config` directory*
```bash
<tap-anything> --state current_state.json | docker run --rm -i -a STDIN -a STDOUT -a STDERR -v "$(pwd):/config:ro" ghcr.io/biron-bi/target-clickhouse --config /config/config.json >> state.jsonl
```
### Config.json
The fields available to be specified in the config file.
#### Mandatory fields
* `host`
* `port`
* `username`
* `password`
* `database`
#### Optional fields
* `logging_level` Default to `"INFO"`
* `subtable_separator` Default to `"__"`
* `translate_values`: Whether fields should be parsed again to allow conversion of specific values, e.g. `True` accepted as `true`. Default `false`
* `batch_size`: Amount of records to read before sending to clickhouse. Default `100`
* `finalize_concurrency`: Amount of concurrent stream ingestion finalisation. Default `3`
* `extra_active_tables`: List of tables that are considered active even if not present in ACTIVE_STREAMS message. Default `[]`
finalize_concurrency
## Singer specification extension
Several features are supported that are not standard to the singer Spec:
* **Update schemas** : Pass the repeatable CLI option ` --update-streams <stream>` to specify streams for which you want to recreate
tables (root and children).
* **Clean first** : Specify `clean_first: true` in SCHEMA messages to wipe table content before each ingestion.
* **Cleaning column** : Specify `cleaning_column: "<column_name>"` in SCHEMA messages to wipe table content that matches column value during
ingestion. For instance, if column "date" is specified as cleaning column, and the value "2022-01-01" is encountered in a record, all rows
with values "2022-01-01" are replaced with those contained in the stream
* **All key properties** : Specify `all_key_properties: {props: [], children: {}}` in SCHEMA messages to specify primary keys for all
children of a root table. This will allow children to create a foreign key to their parent (with the format `_parent_<column>`)
## Sponsorship
Target Clickhouse is written and maintained by **Biron** https://birondata.com/
## Acknowledgements
Special thanks to the people who built
* [singer](https://github.com/singer-io/getting-started)
* [target-postgres](https://github.com/datamill-co/target-postgres)
* [immutable-js](https://immutable-js.com/)
## License
Distributed under the AGPLv3