Tool Setup
For tool development there are some optional steps that might make using the tool easier
- Postgres clients:
- Postgresql client: to install
psqlon Ubuntu runsudo apt install postgresql-client, or install the GUI client of your choice. - Redis client:
sudo apt install redis-toolson Ubuntu forredis-cli
- Postgresql client: to install
- Python Venv. run
sudo apt install python3-venvto install the virtualenv manager. This will allow you to run DBT code outside of airflow to test in isolation as explained below
Installation
-
Set the following environment variables and run
./setup.shPOSTGRES_DB_VOLUME_PATH: dbdata directory from prerequisites. Defaults to$HOME/dbdataif not setIMPORT_DATA_VOLUME_PATH: importdata directory from prerequisites. Defaults to$HOME/importdataif not setEXPORT_DATA_VOLUME_PATH: exportdata directory from prerequisites. Defaults to$HOME/exportdataif not set- If you get a permissions error try adding yourself to the docker group
sudo usermod -aG docker $USERand restart WSL by opening up Windows Powershell and runningWSL --shutdown. Reopen your terminal and try again.
This creates the necessary docker volumes and UID/GID .env file for Airflow to have access to the volumes.
-
In the repo directory, copy the files
src/custom/dbt/profiles-sample.ymlintosrc/custom/dbt/profiles.ymland edit all thehostlines. Replacepostgreswith the local IP address for your computer, eg.192.168.0.2(you can dohostname -Ito see your ip). This will allow you to connect to the postgres db and run DBT commands outside of an Airflow DAG. -
In the repo root, run
docker compose upand wait until Airflow is up and running. It will take a while the first time around, since it has to build the docker image and run initial migrations on the database. Some tips:- you can run docker compose with the
-doption. This will detach and run everything in the background. - Use
docker psto check on container status. You may also want towatch docker psto get an autoupdating view. You'll need to wait until all containers are marked as "healthy" - The docker compose is set to auto restart the containers. This means that they will start automatically when you reboot your computer. You can stop them by running
docker compose stopfrom the project root.
- you can run docker compose with the
-
Once all the containers are up and running you can access the admin interface at: http://localhost:8080. This is the landing page that contains links to Airflow and other reports. Default credentials for Airflow are
airflow/airflow- The first time you log to Airflow through the admin interface you will need to setup the 'pg_db' connection. Do this by going to Admin > Connections > +
connection_id: pg_db
connection type: Postgres
Description: <OPTIONAL>
Host: <IP_ADDRESS_FROM_PROFILES_YAML>
Database: airflow
Login: airflow
Password: airflow
Port: 5432 - NOTE: If you make changes to your connection at any point, make sure that the "Extra" field is completely cleared out. Sometimes airflow will add an empty set of brackets in there but that can cause errors when running setup_schemas later.
- The first time you log to Airflow through the admin interface you will need to setup the 'pg_db' connection. Do this by going to Admin > Connections > +
-
In order to run anything outside of the Airflow DAG context, you'll need to setup a virtual environment. In the project root, run
python3 -m venv env. This will install a virtual environment in theenvdirectory. -
Activate your virtualenv:
source env/bin/activate -
Install dependencies in your virtualenv:
pip install -e .
Troubleshooting
Database connecting in with PSQL CLI but not in Airflow or DB tool (dbeaver etc.):
- Check and see if you have a local postgres database running and make sure it is not running while trying to run this application
- Set the service to manual, and stop it when connecting to the docker version.
- The host in Airflow may need to be set host to host.docker.internal
Pipelines are broken in Airflow
- Sometimes it's a docker issue so you can run
docker compose downand thendocker compose up -d --no-deps --buildto try and rebuild it
VM Python errors
- Check that the correct version of python is installed with
python3 --version. If it isn't then install the correct version and rerun any python related installtions (such as venv and pip)