Adding Custom Imports
The overall process of transformation with the tool framework is as follows:
- Import your data dump into the tool
- Write your own transformation logic (e.g. using DBT)
- Update scripts to hook up data importing/exporting to the tool/pipeline
- Run the transform data DAG
This document provides a high-level overview of what parts of the starter code need to be updated in order for you to write your own transformation.
Custom Schema
The schema of your object tables will be built using a tool called alembic. To get the schema of the tables you will need to run pipeline 01a_setup_sf_config this will pull the salesforce v12 schema from your target salesforce. After running that and configuring your custom
object model run 01_setup_provider_schemas and using the meta data obj from salesforce your schema will be setup. You should not need to repeat
these steps unless you make changes to the relevant objects and fields in your salesforce target.
Loading your data into landing
You will need to build a custom pipeline that loads your csvs into a raw schema. A simple version of steps to follow for building this pipeline is as follows.
- Move your raw CSV files into the import_data folder, this is a a mounted volume that gives access to the data in the runtime environment
- Move the data from your csvs into a schema in the database that can be used as a raw source for your transformations
- Create a custom dbt project in the custom volume that takes the data from raw and transforms it into the v12 data model
- Move the data from the dbt_transform schema into landing
Adding a New DAG for Import
If you're migrating from multiple sources, you will need more dags and dbt files.
- If you're using DBT, add another DBT project to
src/custom/dbtto handle the transformation of the additional data source - Add another python file to
src/custom/dagsto handle the loading and running of the other data source into the DBT project from step - Note that you'll need to include a function with the DAG decorator in that file. See the Airflow Docs for more details