Skip to main content

Config Schema

Overview

All Monarch configurations are stored in the monarch_config schema:

Setup

The Monarch config schema gets set up as part of DAG 01_setup_provider_schema. This dag will run DBT seed to create the base config. As long as that DAG has been run the config schema will be ready for use.

Updating the Configs

Updates can be made in Monarch's config interface.

Summary of Each Table's Purpose

This section describes the purpose of each table at a high-level.

⭐ Core Tables

⭐ source_ingestion

Used in: core

Defines the sources that Monarch will look at. They come from CSV files, JSON files, or Salesforce instances.

  • source_name: the name of the source
  • type: either salesforce, csv, or json
  • import_dag_id: The name of the DAG that will import the raw data and transform it into the common data model
  • sf_url: This is the url for the Salesforce instance, in the form MyDomainName.my.salesforce.com. Only applies to sources of type "salesforce".
  • sf_token: The token for Salesforce access. Generate one by following the steps here. If pushing to Salesforce, the token must grant read and write access. Only applies to sources of type "salesforce".
  • protected: If this flag is set, Monarch will not write (data load or delete) to this Salesforce instance. We recommend setting it for production sources. Only applies to sources of type "salesforce".
  • transformation_project: The name of the DBT project that will transform the data to the common data model
  • transform_schema: The name of the schema where the DBT transformation models will live
  • copy_script: The path to the script used by DAG 03_import_data_into_landing to copy data from the transform_schema to landing
  • bulk_query_max_records: The maximum number of records a salesforce query will fetch in a batch. Only applies to sources of type "salesforce".
  • target: The environment the data load and data seed will push to. This must be a Salesforce source and there must be exactly one target set
  • target_type: Determines the type of target system for data operations. Can be SF (Salesforce), FHIR (FHIR endpoints and resources), or NULL (not a target). This field helps determine whether the target source is directed to FHIR endpoints and resources or Salesforce uploads and deletes.

The following tables are used to track source status:

Raw refers to raw source data

  • raw_schema: The schema name for the raw source data
  • raw_tin_list_table: The table name containing raw TIN (Tax Identification Number) list data
  • raw_tin_column: The column name containing TIN values in the raw data

Transformed refers to the raw source data after it's transformed into the V12 data model

  • transformed_tin_list_table: The name of the table in the transform_schema containing the TIN list
  • transformed_tin_column: The name of the TIN column in transformed_tin_list_table

Landed means data that's been imported to the landing schema

  • landed_tin_list_table: The name of the table in the landing schema containing the TIN list
  • landed_tin_column: The name of the TIN column in landed_tin_list_table

Merged means records that have been merged and loaded into the staging schema

  • merged_tin_list_table: The name of the table in the staging schema containing the TIN list
  • merged_tin_column: The name of the TIN column in merged_tin_list_table

Reviewing means the data is in the staging_to_upload schema and is under final review before it gets loaded into Salesforce

  • reviewing_tin_list_table: The name of the table in the staging_to_upload schema containing the TIN list
  • reviewing_tin_column_name: The name of the TIN column in reviewing_tin_list_table

Monarch can handle incremental uploads so it's possible a given migration only focuses on a subset of TINs. In Scope refers to the subset of TINs relevant to a given migration round. This TIN list generally consists of a hand-picked set of TINs and is manually imported into the database.

  • in_scope_tin_list_schema: The schema name for the list of in-scope TINs
  • in_scope_tin_list_table: The name of the table in in_scope_tin_list_schema containing the TIN list
  • in_scope_tin_column_name: The name of the TIN column in in_scope_tin_list_table

⭐ object

Used in: core, data load, salesforce

Defines the objects in each source that Monarch will look at. If an object's enabled flag is set to false, it will not be pulled from Salesforce. This is useful if your source data doesn't include one of the Monarch base objects (e.g. fee schedule).

Similarly, if the data_load_enabled flag or smart_delete_enabled flag is set to false, this object will not be pushed to or deleted from the target Salesforce environment during data loading (DAG 06_upload_data_to_salesforce) and smart deleting (DAG smart_delete) respectively. This also applies to data seeding (DAG full_data_seed).

When sf_use_source_column is enabled, this object's ID and foreign keys will be suffixed with the name of the source. For example, if the salesforce id for a record in the source sandbox2 is 001E100001LX1gZIAT, then it becomes 001E100001LX1gZIAT_sandbox2. Generally you want to enable this for each object but there are a few exceptions. For example, you might not enable it for RecordType and User because those ID's are shared across environments.

⭐ object_field

Used in: core, data load, merge, salesforce

Defines the fields in each object that Monarch will look at. If an object's enabled flag is set to false, it will still appear in the pipeline but will be omitted during data loading (DAG 06_upload_data_to_salesforce) and data seeding (DAG full_data_seed).

The sort_order column defines the order in which the fields will be listed on the merge report.

The group_id column is used to define a set of fields that should be treated as one group. It's used during merge to ensure if one field in a given group gets overwritten, then the other fields must also be overwritten. For example, ShippingStreet, ShippingState, ShippingPostalCode, and ShippingCity belong to one group because it doesn't make sense to only pull in the ShippingCity field when merging two records. All four fields make up an address so they should be handled together as a group.

⭐ lookup

Used in: core, data load, salesforce

Similar to object field but defines what fields in each object are foreign keys and what tables they point to. Used to build tree structures (e.g. the DAG graph for data loading, the object hierarchy for smart copying)

A few key definitions:

  1. relation - the options are parent, child, and self. They refer to the order in which the objects are reached when traversing the hierarchy tree. parent means the other table is traversed before this one (higher node in the tree), while child means the other table is traversed after this one (lower node in the tree). self is a way to defer loading a specific field to a later point (see section below).

    • Note that this is different from table parentage relationships in a database. For example, when looking at Account, Contact, and AccountContactRelation (ACR for short), one might think that ACR has accountId and contactId set as parent relations because both both the account and contact records must exist before the ACR record can be created. When building the hierarchy tree however, the parentage order is Account -> ACR -> Contact, so therefore ACR has accountId set as parent and contactId set as child.
  2. defer - This flag is used when the target_object for a given lookup is the same object type as object_name. When this flag is set, the field will be left blank during the initial data load step for the object, then updated at a later step. For example, the account object has a parentId field which references another account. If a location account is uploaded before its parent account, it won't be able to populate the parentId field with the parent salesforce ID because that object won't exist yet. Deferring the field ensures all objects will be uploaded and given salesforce ID's before the lookup ID's are populated.

    • When defer is set and the relation is set to self, it means to defer loading the field on the object itself. The main example that uses this is the status field of the contract object. When a contract is created, the status can't immediately be set to Executed even if that's the value of the data due to a Salesforce limitation. To get around this, we set the status field to defer and give it a relation of self, which allows Monarch to upload the object without the status field, then update it in a later step.
  3. enabled - If this flag is set to false, this lookup will be ignored when building the DAG graphs for data loading (DAG 06_upload_data_to_salesforce) and smart deleting (DAG smart_delete).

Supplementary Tables

confidence_functions

Used in: merge

Defines which function a given source's object will use and what schema that function is in. DAG provider_merge uses this to generate confidence scores.

data_quality_exclusion

Used in: reports

Defines which tests are excluded from DAG 04_data_quality_and_mastering. This is useful if there are built-in Monarch tests that don't apply to your data.

default_airflow_vars

Utility

A previous version of monarch utilized airflow variables but we've since moved away from those to this new config setup. This table exists to house the variables previously stored there. It contains key-value pairs so it can be used as a general purpose table for any other config variables you might need to add.

export_db_config

Used in: db exports

Used by DAG export_project_database to export specific tables and schemas to a client's on-premise database so they have a record of data that was moved. See exporting project database for more details

fuzzy_match

Used in: core, reports

Defines what methods and fields we want to fuzzy match on for the specified object. The table_name column must be formatted to include the schema and table name like <schema>.<tablename>. The config column is a json string where the keys are methods and the values are field configurations for that method. The available methods are name, phone, npi_column, and string_match. Fuzzy matching happens during DAG 04_data_quality_and_mastering.

demographic_report

Used in: reports

Defines the configuration for the demographic report feature, which allows counting distinct records for a specified aggregate column while filtering by state and category. The report can point to any table in any schema and provides interactive filtering capabilities.

Key Configuration Fields:

  • schema_name: The database schema containing the target table
  • object_name: The table name to analyze
  • aggregate_col: The column to count distinct values (e.g., practice tax ID numbers)
  • state_col: The column containing state information for filtering
  • category_col: The column containing category information for filtering
  • name_col: The column containing names for navigation links to source explorer
  • city_col: The column containing city information
  • street_col: The column containing street address information
  • zip_col: The column containing zip code information

Advanced Usage with DBT Views: Migration developers can leverage their transformation DBT project to create custom views that join multiple staging object tables. This enables the use of categories that are stored on other object tables. For example, if you want to categorize accounts by their associated contact types or contract statuses, you can create a DBT view that joins the account table with contact and contract tables, then reference this view in the demographic report configuration.

Source Explorer Integration: The demographic report provides navigation links from the name column to source explorer for detailed record inspection. Important: To enable this navigation functionality, users must ensure the staging environment is properly indexed in source explorer. This requires configuring the source_explorer table to include the staging schema with the source name 'staging' and running the source_explorer_reload DAG to build the search index.

remap_table

Used in: merge

Defines the objects and fields that will need to be remapped during merge (DAG provider_merge). If a row's enabled flag is set to false, remap instructions will not be generated for that object and field.

report_master

Used in: reports

See custom reporting docs

se_object

Used in: source explorer

Similar to the object config table but used by source explorer. This table defines the objects that will show up for each indexed source. Indexing occurs in DAG source_explorer_reload. facet_fields are fields source explorer uses to cluster objects together like a sql "group by". It's used as an index to optimize searching. If a row's enabled flag is set to false, the object will not be indexed. Note that this can cause issues depending on the object.

se_object_field

Used in: source explorer

Similar to the object_field config table but used by source explorer.This table defines the fields per object that will show up for each indexed source. Just like se_object, if a row's enabled flag is set to false, the field will not be indexed, which can cause issues depending on the field.

se_object_lookup

Used in: source explorer

Similar to the lookup config table but used by source explorer. This table defines the lookups per object that will show up for each indexed source. They're called "refs" in solr. Note that if use_default_lookups is set to false in the source_explorer config table, this table will be ignored.

sf_fields_to_update

Used in: core, data load, salesforce

Defines how an object's field maps to a Salesforce field. The most common use of this is record types. In some data sources, the value of the recordtypeid field will be the developername rather than the id, so Monarch uses this table to update that column in DAG 04_data_quality_and_mastering. Also used in record_merge to auto-generate ui-unmap table (see ui_unmap for more details).

source_explorer

Used in: source explorer

Defines each source to index and display in source explorer. Indexing occurs in DAG source_explorer_reload. If use_default_lookups is enabled, source explorer will default to using the lookup config instead of se_object_lookup. See source-explorer-setup for more details.

table_options

Used in: db exports

Used in conjunction with export_db_config for db exporting. Defines which columns in a given table should be included or excluded. See exporting project database for more details.

test_to_report_mapping

Used in: reports

See custom reporting docs

ui_unmap

Used in: merge

This table is used to determine how to "unmap" fields that have been remapped during the record merge phase. These "unmaps" are done in the merge screen in order to make fields in the merge screen more human readable - for example, if a field contains data in the form of a salesforce ID, the user may wish to "unmap" this field in the UI so that it points instead to the name of the object the ID represents. It contains the following fields:

  • object - the object containing the field to unmap (ie account)
  • field_to_unmap - the field to unmap (ie recordtypeid)
  • reference_table - the table used to pull the new, "unmapped" data from. Should be in the format schema.tablename (ie staging.recordtype)
  • unmap_value - the value to be used in the UI (can either be a column from the table reference_table or a more complex sql statement. Any references to reference_table should be in the format <object>_<field_to_unmap>. A simple unmap example might be name, and a more complex example is concat(contact_ownerid.firstname, ' ', contact_ownerid.lastname))
  • match_value - the column in reference_tablethat matchesobject.field_to_unmap. This column will be used to join the two tables.
  • custom_unmap_value - false when unmap_value is a simple column name, true when it is a more complex sql statement
  • enabled - when false, the row will be ignored and not used to update the merge UI

When record_merge is run, ui_unmap rows will be auto-generated for every row in sf_fields_to_update. The mappings for this is as follows:

  • model_col_name as field_to_unmap
  • concat('staging.', sf_table) as reference_table
  • replace(regexp_substr(where_clause, 'sf.[\w]*[ =]'), 'sf.','') as unmap_value
  • 'id' as match_value
  • false as custom_unmap_value
  • true as enabled

Any existing rows will not be updated by this auto-generation. If you wish to delete an auto-generated row, simply mark the enabled column as false, and it will not be re-generated in the future.