UNPKG

@contiamo/dev

Version:

Dev environment for contiamo

434 lines (280 loc) 16.7 kB
# Contiamo Local Dev Environment Get the dev environment fast! ## Quick overview Get started: - `make docker-auth` - `make pull` Get the latest versions: - `git pull` - `make pull` Start everything in normal mode: - `make start` Stop everything: - `make stop` Stop everything and clean up: - `make clean` Prepare for Pantheon-external mode (only do this once): - `make build` - `sudo bash -c 'echo "127.0.0.1 metadb" >> /etc/hosts'` Start everything in Pantheon-external mode: - `make pantheon-start` - (In Pantheon directory) `env METADB_URL="jdbc:postgresql://localhost:5433/pantheon?user=pantheon&password=test" DATASTORE_TYPE=external sbt run` Enable TLS `verify-full` mode on port 5435: - Download the private key for `*.dev.contiamo.io`: `make get-pg-key` - `echo "127.0.0.1 pg-localhost.dev.contiamo.io" | sudo tee -a /etc/hosts` - `make build` - `make pantheon-start` - You may need to tell your local `psql` about the IdenTrust root we happen to be using: `curl https://letsencrypt.org/certs/trustid-x3-root.pem.txt > ~/.postgresql/root.crt` - `psql "user=lemon@example.com password=<token> dbname=<project UUID> sslmode=verify-full" -h pg-localhost.dev.contiamo.io -p 5435` ## Getting started ### Prerequisites Local development is supported via Docker Compose. Before you start, you must install [Docker](https://docs.docker.com/install/) and [Docker-Compose](https://docs.docker.com/install/). Additionally, the development requires access to our private docker registry. To access this ask the Ops team for permissions. Once permissions have been granted you must [install the `gcloud` CLI](https://cloud.google.com/sdk/docs/). Once installed, run ```sh make docker-auth pull ``` This will attempt to 1. authenticate with Google, 2. configure your Docker installation to use the new Google credentials, and 3. pull the required Docker images. ### Starting a fresh environment Finally, to start the development environment, run ```sh make start ``` Once the environment has started, you should see a message with a URL and credentials, like this ``` Dev ui: http://localhost:9898/contiamo/profile Email: lemon@example.com Password: localdev ``` ### Starting with the latest locadev snapshot The above section starts with a completely empty environment. A standard development environment with preconfigured data sources (the internal metadbs) is provided in the project and can be started with ```sh make load-snapshot ``` The existing environment (if any) will be stopped and _destroyed_, so be careful. It will then start the db, load the data, and then start the rest of the environment. The environment * contains two users `lemon@example.com` and `lemonjr@example.com` both with password `localdev` * has all of the datahub metadbs installed, `foodmart`, `alaska`, and `liftdata` * there are two virtualdbs with two views each. One that shows the maintenance tasks inside Hub and the other showing the use of PostGIS queries * Mr. Lemon is an admin for everything * Lemon Jr is not an admin and has various permission levels, `liftdata` is private and not available to Lemon Jr * There is a basic amount of metadata assigned to the datasources and tables including custom fields, descriptions, a mix of names, and even one with documentation This should allow for basic development and testing of most use cases. ### Overriding the service images The image for each service can be overridden using env variables | variable | value | |-----------------------|---------------------------------------------------| | `AUTH_IMAGE` | `eu.gcr.io/dev-and-test-env/idp:dev` | | `GRAPHQL_IMAGE` | `eu.gcr.io/dev-and-test-env/pgql-server:dev` | | `UI_IMAGE` | `eu.gcr.io/dev-and-test-env/contiamo-ui:dev` | | `HUB_IMAGE` | `eu.gcr.io/dev-and-test-env/hub:dev` | | `DATASTORE_IMAGE` | `eu.gcr.io/dev-and-test-env/datastore:dev` | | `PANTHEON_IMAGE` | `eu.gcr.io/dev-and-test-env/pantheon:dev` | | `HUB_IMAGE` | `eu.gcr.io/dev-and-test-env/hub:dev` | | `PROFILER_IMAGE` | `eu.gcr.io/dev-and-test-env/profiler:dev` | | `SYNC_INGESTER_IMAGE` | `eu.gcr.io/dev-and-test-env/sync-ingester:latest` | | `SYNC_AGENT_TABLEAU_IMAGE` | `eu.gcr.io/dev-and-test-env/sync-agent-tableau:dev` | You can manually override the image used by setting the required variable and the restarting the services ```sh export HUB_IMAGE=eu.gcr.io/dev-and-test-env/hub:v1.2.3 make stop start ``` ### Integrations The default environment runs only the core services required to support the Data Source integrations. To enable the `demo` sign-up service or integration sync-agents for other resource types (like Tableau), you need to enable the optional integration services. To do this, simply export this env variable ```sh export COMPOSE_FILES="-f docker-compose.yml -f docker-compose-extra.yml" ``` This will modify the `start` and `stop` commands to include the integration services. ### Testing PR images A helper make target is provided that will automatically pull and restart the local environment with PR preview image for the specified services. For example to test PR 501 for `hub` together with PR 489 for `contiamo-ui`, use ```sh make pr-preview services=hub:501,contiamo-ui:489 ``` All other services will use the default images. To reset to the original state, use ```sh make stop start ``` ### End-To-End API testing The project comes with a suite of end-to-end tests that use the API to verify that the backend services are working as expected. You can run this in any environment by using ```sh make test ``` This assumes that you have already started the localdev environment using `make start` or `make pr-preview`. ### Passing S3 credentials for the Federated mode / Datasets By default, the Datasets feature wouldn't work with external DWH systems (e.g. Redshift, Snowflake). Data transfer to these systems needs to go through mutually accessible object storage. There is `pantheon-datasource-test` bucket on S3, but this repo doesn't include credentials for it. If your scenario requires working with an external DWH you can pass S3 credentials by setting `DATASETS_AWS_ACCESS_KEY_ID` and `DATASETS_AWS_SECRET_ACCESS_KEY` environment variables. Additionally, the bucket name property should be set via `DATASETS_S3_BUCKET` variable. An easy way to set these variables is the `.env` file. ### Datasets for testing the Profiler Two pre-created datasets have been created that provide more interesting stats and entity detection profiles. These should be used to test the Profiler and the related UI components. The datasets are available in `./datasets` 1. [`pii.csv`](./datasets/pii.csv) contains PII columns that should be detected during the entity detection profile. 2. [`sales.csv'](./datasets/sales.csv) also contains PII data, but is a good sample for the stats report. ### Start and add an external data source We have a couple of data sets available on GCR for internal testing: * Postgres database that contains a single table `liftdata`. * Postgis (Postgres) database that contains geometry of Alaska regions. The purpose is to test geometry-related operations for Pantheon and PGQL server. #### Lift data After starting the local dev environment, run: ```sh docker run --name liftdata --rm --network dev_default eu.gcr.io/dev-and-test-env/deutschebahn-liftdata-postgres:v1.0.0 ``` In the Data Hub, you can now add a external the data source using: | field | value | |------------|---------------------| | `HOST` | `liftdata` | | `PORT` | `5432` | | `DATABASE` | `liftdata` | | `USER` | `pantheon` | | `PASS` | `contiamodatahub19` | when you are done, run ```sh docker kill liftdata ``` to stop and cleanup the database container. #### Postgis Alaska regions After starting the local dev environment, run: ```sh docker run --name alaska --rm --network dev_default eu.gcr.io/dev-and-test-env/alaska-postgis:1.0.0 ``` In the Data Hub, you can now add the data source using: | field | value | |------------|---------------------| | `HOST` | `alaska` | | `PORT` | `5432` | | `DATABASE` | `alaska` | | `USER` | `pantheon` | | `PASS` | `contiamodatahub19` | when you are done, run ```sh docker kill alaska ``` to stop and cleanup the database container. ## Stopping You can always cleanly stop the environment using ```sh make stop ``` Any data in the databases will be preserved between `stop` and `start`. ## Adding the metadbs as external data sources You can add the Data Hubs own metadbs to the Data Hub, meaning you can inspect the internals of the Data Hub from the Data Hub :) . Each of the following databases can be added as PostgreSQL data sources. | service | db name | host | port | username | password | |-------------|-------------|----------|--------|------------|------------| | `datastore` | `datastore` | `metadb` | `5433` | `user` | `localdev` | | `hub` | `hub` | `metadb` | `5433` | `user` | `localdev` | | `idp` | `simpleidp` | `metadb` | `5433` | `user` | `localdev` | | `pantheon` | `pantheon` | `metadb` | `5433` | `pantheon` | `test` | ## Accessing the metadbs with pgadmin Go to http://localhost:5050 (The link is on http://localhost:9898/lemonade-shop/configuration page) Login with the following credentials: - Email: `pgadmin4@pgadmin.org` - Password: `admin` Add the `metadb` server with the following connection info: - Host name/address: `metadb` - Port: `5433` - Username: `user` - Password: `localdev` - Save password?: ✅ ## Cleaning up If you need to reclaim space or want to restart your environment from scratch use ```sh make clean ``` This will stop your current environment and remove any Docker volumes related to it. This includes any data and metadata in the databases. As time goes on, Docker will download new images, but it does not automatically garbage collect old images. To do so, run `docker system prune`. On Mac, all Docker file system data is stored in a single file of a fixed size, which is 16GB or 32GB by default. You can configure the size of this file by clicking on the Docker Desktop tray icon -> Preferences -> Disk -> move the slider. ## Exporting and restoring the database state You can find `snapshot.sh` and `restore.sh` files in the `./scripts` folder. Both scripts have the only parameter — a filename. ### Snapshot To make an encrypted snapshot from your local dev environment use: ```sh ./snapshot.sh localdev.snapshot ``` this will ask you to set the encryption key, will export the database of each service applying compression. The snapshot is encrypted with a symmetric key (AES-128 cipher). ### Restore To **erase your local database for each service** and restore it to the state of the earlier exported snapshot use: ```sh ./restore.sh localdev.snapshot ``` **this will delete all the data you have locally** and will perform a reverse operation for `shapshot.sh`. **IMPORTANT**: do not move the scripts out of their `./scripts` folder, they use relative paths. The `make load-snapshot` uses the committed `localdev.snapshot`. You can use the script, as described above, to load any other snapsnots ## Tips * Run `make` or `make help` to see all available commands. * You can also run these commands from a different directory, with e.g. `make -C /path/to/dev start`. * The commands in the Makefile are very useful, but there's some extra stuff available if you use `docker-compose` straight. For instance, get all logs with `docker-compose logs --follow`, or only datastore worker logs with `docker-compose logs --follow ds-worker`. Refer to `docker-compose.yml` for the definitions of the services. * To use `docker-compose` without `cd`'ing to this directory, use e.g. `docker-compose -f /path/to/dev/docker-compose.yml logs --follow`. ## Custom Images The Compose file supports overriding the Docker tag used for a service by setting several environment variables: | Server | Environment Variable | Default | |-------------|----------------------|----------| | datastore | `DATASTORE_TAG` | `dev` | | idp | `IDP_TAG` | `dev` | | pantheon | `PANTHEON_TAG` | `latest` | | contiamo-ui | `CONTIAMOUI_TAG` | `latest` | ## Options to Postgres In environment variable `POSTGRES_ARGS`, you can pass extra arguments to the PostgreSQL daemon. By defaults, this is set to `-c log_connections=on`. To log modification statements in addition to connections, start the dev environment with env POSTGRES_ARGS="-c log_connections=on -c log_statement=mod" make start You can inspect these logs with `docker-compose logs --follow metadb`. The four acceptable values for `log_statement` are `none`, `ddl`, `mod`, and `all`. Further Postgres options can be found here: https://www.postgresql.org/docs/11/runtime-config.html . ## Setting up Pantheon Local Development Local Pantheon debug development is supported by port redirection. To set this up, you first need to run two extra steps. 1. Run ```sh make build ``` This builds the `eu.gcr.io/dev-and-test-env/pantheon:redir` Docker image, a "pseudo-Pantheon" that forwards everything to your local Pantheon on `127.0.0.1` port `4300`. _Do not push this image!_ 2. Modify your `/etc/hosts` file to add ``` 127.0.0.1 metadb ``` You can easily do this with `sudo bash -c 'echo "127.0.0.1 metadb" >> /etc/hosts'`. This ensures that Pantheon can correctly resolve the storage database service. ## Running the Pantheon Local Development Make sure you first set up the prerequisites, and also set up for Pantheon local development. To start the Pantheon dev environment use ```sh make pantheon-start ``` This will replace the Pantheon image with a simple port redirection image that will enable transparent redirect of - http://localhost:9898/pantheon/api/v1/* to http://localhost:4300/api/v1/* , - http://localhost:9898/pantheon/jdbc/* to http://localhost:8765/* . You can then start your local Pantheon debug build, e.g. from your IDE, and have it bind to those ports on localhost. To configure the meta-DB and enable data store from Pantheon, run SBT with ```sh env METADB_URL="jdbc:postgresql://localhost:5433/pantheon?user=pantheon&password=test" DATASTORE_TYPE=external sbt ``` or set the same environment variables in IntelliJ. You can also use `export METADB_URL="jdbc:postgresql://localhost:5433/pantheon?user=pantheon&password=test" DATASTORE_TYPE=external`, to set the environment variables in the current terminal. ---- The docker-compose configuration will expose the following ports for use from local Pantheon: - Nginx web server at `127.0.0.1` port `9898` <-- Use this to access Data Hub including UI, IDP, Pantheon, Datastore. - PostgreSQL meta-DB at `127.0.0.1` port `5433`, username `pantheon`, password `test`. - Datastore manager at `127.0.0.1` port `9191` - Minio (for ingested files) at `127.0.0.1` port `9000` When accessing Pantheon via Nginx on port 9898, you need to pre-pend `/pantheon` to Pantheon URLs, for instance: http://localhost:9898/pantheon/api/v1/status . Nginx will strip off the `/pantheon`, authenticate the request with IDP, and forward the request to Pantheon as `/api/v1/status`. Using the `pantheon`/`test` credentials for Postgres, you also have access to - the `metadb` database, for datastore, - collection databases corresponding to a managed DB, - collection databases corresponding to materializations for a project, - the `simpleidp` database. ## Running a custom Pantheon in prod mode You can also run Pantheon in prod mode locally, as follows. 1. In `sbt` shell, run `dist`. 2. From a console, run `docker build -t eu.gcr.io/dev-and-test-env/pantheon:local .` This will download dependencies if they are not cached yet, build a Docker image for Pantheon, and tag it `local`. 3. Run `env PANTHEON_TAG=local make start`. Now datastore and metadb will still be available on the usual ports, but Nginx will proxy to a prod-mode Pantheon which runs inside Docker. Pantheon will automatically be run with appropriate environment variables (https://github.com/contiamo/dev/blob/master/docker-compose.yml#L81). **Warning!** Do not push this image to GCR. It may accidentally end up being deployed on dev.contiamo.io . ## Profiler Server The Profiler currently lives at http://localhost:8383. ## Enjoy!