Skip to main content

Adding Custom Imports

The overall process of transformation with the tool framework is as follows:

  1. Import your data dump into the tool
  2. Write your own transformation logic (e.g. using DBT)
  3. Update scripts to hook up data importing/exporting to the tool/pipeline
  4. Run the transform data DAG

This document provides a high-level overview of what parts of the starter code need to be updated in order for you to write your own transformation.

Custom Schema

The schema of your object tables will be built using a tool called alembic. To get the schema of the tables you will need to run pipeline 01a_setup_sf_config this will pull the salesforce v12 schema from your target salesforce. After running that and configuring your custom object model run 01_setup_provider_schemas and using the meta data obj from salesforce your schema will be setup. You should not need to repeat these steps unless you make changes to the relevant objects and fields in your salesforce target.

Loading your data into landing

You will need to build a custom pipeline that loads your csvs into a raw schema. A simple version of steps to follow for building this pipeline is as follows.

  1. Move your raw CSV files into the import_data folder, this is a a mounted volume that gives access to the data in the runtime environment
  2. Move the data from your csvs into a schema in the database that can be used as a raw source for your transformations
  3. Create a custom dbt project in the custom volume that takes the data from raw and transforms it into the v12 data model
  4. Move the data from the dbt_transform schema into landing

Adding a New DAG for Import

If you're migrating from multiple sources, you will need more dags and dbt files.

  1. If you're using DBT, add another DBT project to src/custom/dbt to handle the transformation of the additional data source
  2. Add another python file to src/custom/dags to handle the loading and running of the other data source into the DBT project from step
  3. Note that you'll need to include a function with the DAG decorator in that file. See the Airflow Docs for more details