Skip to main content

Tool Setup

For tool development there are some optional steps that might make using the tool easier

  1. Postgres clients:
    • Postgresql client: to install psql on Ubuntu run sudo apt install postgresql-client, or install the GUI client of your choice.
    • Redis client: sudo apt install redis-tools on Ubuntu for redis-cli
  2. Python Venv. run sudo apt install python3-venv to install the virtualenv manager. This will allow you to run DBT code outside of airflow to test in isolation as explained below

Installation

  1. Set the following environment variables and run ./setup.sh

    • POSTGRES_DB_VOLUME_PATH : dbdata directory from prerequisites. Defaults to $HOME/dbdata if not set
    • IMPORT_DATA_VOLUME_PATH : importdata directory from prerequisites. Defaults to $HOME/importdata if not set
    • EXPORT_DATA_VOLUME_PATH : exportdata directory from prerequisites. Defaults to $HOME/exportdata if not set
    • If you get a permissions error try adding yourself to the docker group sudo usermod -aG docker $USER and restart WSL by opening up Windows Powershell and running WSL --shutdown. Reopen your terminal and try again.

    This creates the necessary docker volumes and UID/GID .env file for Airflow to have access to the volumes.

  2. In the repo directory, copy the files src/custom/dbt/profiles-sample.yml into src/custom/dbt/profiles.yml and edit all the host lines. Replace postgres with the local IP address for your computer, eg. 192.168.0.2 (you can do hostname -I to see your ip). This will allow you to connect to the postgres db and run DBT commands outside of an Airflow DAG.

  3. In the repo root, run docker compose up and wait until Airflow is up and running. It will take a while the first time around, since it has to build the docker image and run initial migrations on the database. Some tips:

    • you can run docker compose with the -d option. This will detach and run everything in the background.
    • Use docker ps to check on container status. You may also want to watch docker ps to get an autoupdating view. You'll need to wait until all containers are marked as "healthy"
    • The docker compose is set to auto restart the containers. This means that they will start automatically when you reboot your computer. You can stop them by running docker compose stop from the project root.
  4. Once all the containers are up and running you can access the admin interface at: http://localhost:8080. This is the landing page that contains links to Airflow and other reports. Default credentials for Airflow are airflow/airflow

    • The first time you log to Airflow through the admin interface you will need to setup the 'pg_db' connection. Do this by going to Admin > Connections > +
          connection_id: pg_db
      connection type: Postgres
      Description: <OPTIONAL>
      Host: <IP_ADDRESS_FROM_PROFILES_YAML>
      Database: airflow
      Login: airflow
      Password: airflow
      Port: 5432
    • NOTE: If you make changes to your connection at any point, make sure that the "Extra" field is completely cleared out. Sometimes airflow will add an empty set of brackets in there but that can cause errors when running setup_schemas later.
  5. In order to run anything outside of the Airflow DAG context, you'll need to setup a virtual environment. In the project root, run python3 -m venv env. This will install a virtual environment in the env directory.

  6. Activate your virtualenv: source env/bin/activate

  7. Install dependencies in your virtualenv: pip install -e .

Troubleshooting

Database connecting in with PSQL CLI but not in Airflow or DB tool (dbeaver etc.):

  • Check and see if you have a local postgres database running and make sure it is not running while trying to run this application
  • Set the service to manual, and stop it when connecting to the docker version.
  • The host in Airflow may need to be set host to host.docker.internal

Pipelines are broken in Airflow

  • Sometimes it's a docker issue so you can run docker compose down and then docker compose up -d --no-deps --build to try and rebuild it

VM Python errors

  • Check that the correct version of python is installed with python3 --version. If it isn't then install the correct version and rerun any python related installtions (such as venv and pip)