Skip to main content

V12 Monarch Overview

This project uses Apache Airflow as the orchestrator, and a combination of SQL. Python, and DBT Core to perform the data manipulation. You may want to familiarize yourself with the Airflow architecture and how DBT works at a high level.

DAG List

01a_setup_sf_config

This DAG downloads existing V12 data from Salesforce, and generates migrations for schemas

01_setup_provider_schema

This DAG sets up the schemas by running the alembic generated schemas. Also sets up elementary schemas and loads internal csvs into utilities tables.

02_import_data_from_salesforce

This DAG creates the needed DB schemas and runs migrations. You can rerun this if you need to recreate your db or if you add new tables/columns.

04_data_quality_and_mastering

This is the main DAG for executing the data flow. It reads from the landing schema to merge records and run checks

provider_merge

This is the main DAG for creating merge instructions and tracking marking non master hierarchy records for soft delete. Reads from merge_utilities schema tables top_level_matches and dupe_keys. Source Explorer is required to be configured and running for this DAG to work.

05_generate_and_export_reports

This DAG generates reports for data stewardship (Monarch Reports), UI, and other sources

smart_delete

This DAG builds a soft delete list from account trees using Salesforce Account IDs. This soft delete list is used in the staging copy to ignore this information from the source system. It then deletes these trees in Salesforce, taking care to not violate any foreign key constraints.

06_upload_data_to_salesforce

This DAG takes the models from the staging schema and uploads it to Salesforce. Requires two airflow variables for copy query:

  • The vdmu/base_config.json has the standard base configurations for Salesforce mapping that constitutes v12 core objects and fields.
  • Clients can add custom SF mappings to override this in the custom/config.json file.

07_export_reports_for_presentation

This dag exports reports to external DB for review and presentation.